Article

Offline Reinforcement Learning Improves Time in Range via Hybrid Closed-Loop Systems

Author(s):

This form of reinforcement learning was also shown to correct for control scenarios like irregular meal timing and compression errors.

Offline reinforcement learning (RL) in hybrid closed-loop systems can significantly increase time in the healthy blood glucose range for patients living with type 1 diabetes (T1D), new research shows. In addition, offline RL can also correct for common and challenging control scenarios like incorrect bolus dosing, irregular meal timing, and compression errors, authors wrote.

Findings of the proof-of-concept study were published in Journal of Biomedical Informatics.

Hybrid closed-loop systems allow patients with T1D to automatically regulate their basal insulin dosing.

“The majority of commercially available hybrid closed loop systems utilize predictive integral derivative [PID] controllers or model predictive controllers [MPCs],” the researchers explained.

Although these algorithms are easily interpretable, they can limit devices’ efficacy. PID algorithms can overestimate insulin doses after meals while MPCs typically utilize linear or simplified models of glucose dynamics, which are unable to capture the full complexity of the task, they added.

RL poses one solution to this problem, as “a decision-making agent learns the optimal sequence of actions to take in order to maximize some concept of reward.”

Current approaches typically use online RL that require interaction with a patient or simulator during training. However, this method has limitations, underscoring the need for methods capable of learning accurate dosing policies from obtainable levels of glucose data without associated risks.

In the current proof-of-concept study, researchers utilized offline RL for glucose control. Specifically, an RL agent learned without environmental interaction during training, instead learning from a static dataset collected under another agent.

“This entails a rigorous analysis of the offline RL algorithms—batch-constrained deep Q-learning, conservative Q-learning and twin-delayed deep deterministic policy gradient with behavioral cloning (TD3-BC)—in their ability to develop safe and high-performing insulin-dosing strategies in hybrid closed-loop systems,” the authors wrote.

Investigators trained and tested the algorithms on 30 virtual patients—10 children, 10 adolescents, 10 adults—using the UVA/Padova glucose dynamics simulator. They then assessed their performance and sample efficiency compared with the current strongest online RL and control baselines.

Data showed that when trained on less than a tenth of the total training samples required by online RL to achieve stable performance, offline RL can significantly increase time in the healthy blood glucose range from a mean (SD) 61.6% (0.3%) to 65.3% (0.5%) compared with the strongest state-of-art baseline (P < .001), the authors wrote.

This was achieved without any associated increase in low blood glucose events, they added.

Analyses also revealed:

  • The TD3-BC approach outperformed the widely used and clinically validated PID algorithm across all patient age groups with respect to time in range, time below range, and glycemic risk
  • The improvement was more significant when TD3-BC was evaluated in potentially unsafe glucose control scenarios
  • Further experiments on TD3-BC highlighted the ability of the approach to learn accurate and stable dosing policies from significantly smaller samples of patient data than those utilized in current online RL alternatives

The use of a T1D simulator marks a main limitation to the study, as these environments cannot capture factors like stress, activity, and illness, the authors emphasized.

“Future work could include validating the method on simulated populations with type 2 diabetes, building on offline RL methods to incorporate online learning for continuous adaption of control policies, or incorporating features such as interpretability or integration of prior medical knowledge, which may ease the transition from simulation to clinical use,” the authors concluded.

Reference

Emerson H, Guy M, McConville R. Offline reinforcement learning for safer blood glucose control in people with type 1 diabetes. J Biomed Inform. Published online May 4, 2023. doi:10.1016/j.jbi.2023.104376

Related Videos
Benjamin Scirica, MD, MPH, associate professor of medicine at Harvard Medical School and director of quality initiatives at Brigham and Women’s Hospital’s Cardiovascular Division
Glenn Balasky during a video interview
dr joseph alvarnas
Michael Lynch, MD, UPMC
dr alex jahangir
Fahad Tahir, MAS, MBA, FACHE, Ascension St Thomas
Leland Metheny, MD, University Hospitals Seidman Cancer Center
Andrew Cournoyer
Kelly Harris, APRN
Michael A. Choti, MD, MBA
Related Content
AJMC Managed Markets Network Logo
CH LogoCenter for Biosimilars Logo