Offline Reinforcement Learning Improves Time in Range via Hybrid Closed-Loop Systems


This form of reinforcement learning was also shown to correct for control scenarios like irregular meal timing and compression errors.

Offline reinforcement learning (RL) in hybrid closed-loop systems can significantly increase time in the healthy blood glucose range for patients living with type 1 diabetes (T1D), new research shows. In addition, offline RL can also correct for common and challenging control scenarios like incorrect bolus dosing, irregular meal timing, and compression errors, authors wrote.

Findings of the proof-of-concept study were published in Journal of Biomedical Informatics.

Hybrid closed-loop systems allow patients with T1D to automatically regulate their basal insulin dosing.

“The majority of commercially available hybrid closed loop systems utilize predictive integral derivative [PID] controllers or model predictive controllers [MPCs],” the researchers explained.

Although these algorithms are easily interpretable, they can limit devices’ efficacy. PID algorithms can overestimate insulin doses after meals while MPCs typically utilize linear or simplified models of glucose dynamics, which are unable to capture the full complexity of the task, they added.

RL poses one solution to this problem, as “a decision-making agent learns the optimal sequence of actions to take in order to maximize some concept of reward.”

Current approaches typically use online RL that require interaction with a patient or simulator during training. However, this method has limitations, underscoring the need for methods capable of learning accurate dosing policies from obtainable levels of glucose data without associated risks.

In the current proof-of-concept study, researchers utilized offline RL for glucose control. Specifically, an RL agent learned without environmental interaction during training, instead learning from a static dataset collected under another agent.

“This entails a rigorous analysis of the offline RL algorithms—batch-constrained deep Q-learning, conservative Q-learning and twin-delayed deep deterministic policy gradient with behavioral cloning (TD3-BC)—in their ability to develop safe and high-performing insulin-dosing strategies in hybrid closed-loop systems,” the authors wrote.

Investigators trained and tested the algorithms on 30 virtual patients—10 children, 10 adolescents, 10 adults—using the UVA/Padova glucose dynamics simulator. They then assessed their performance and sample efficiency compared with the current strongest online RL and control baselines.

Data showed that when trained on less than a tenth of the total training samples required by online RL to achieve stable performance, offline RL can significantly increase time in the healthy blood glucose range from a mean (SD) 61.6% (0.3%) to 65.3% (0.5%) compared with the strongest state-of-art baseline (P < .001), the authors wrote.

This was achieved without any associated increase in low blood glucose events, they added.

Analyses also revealed:

  • The TD3-BC approach outperformed the widely used and clinically validated PID algorithm across all patient age groups with respect to time in range, time below range, and glycemic risk
  • The improvement was more significant when TD3-BC was evaluated in potentially unsafe glucose control scenarios
  • Further experiments on TD3-BC highlighted the ability of the approach to learn accurate and stable dosing policies from significantly smaller samples of patient data than those utilized in current online RL alternatives

The use of a T1D simulator marks a main limitation to the study, as these environments cannot capture factors like stress, activity, and illness, the authors emphasized.

“Future work could include validating the method on simulated populations with type 2 diabetes, building on offline RL methods to incorporate online learning for continuous adaption of control policies, or incorporating features such as interpretability or integration of prior medical knowledge, which may ease the transition from simulation to clinical use,” the authors concluded.


Emerson H, Guy M, McConville R. Offline reinforcement learning for safer blood glucose control in people with type 1 diabetes. J Biomed Inform. Published online May 4, 2023. doi:10.1016/j.jbi.2023.104376

Related Videos
Matthew Crowley, MD, MHS, associate professor of medicine, Duke University School of Medicine.
Jennifer Sturgill, DO, Central Ohio Primary Care
Donna Fitzsimons
Ryan Haumschild, PharmD, MS, MBA, director of pharmacy, Emory Winship Cancer Institute
Tariq Cheema, MD, division chair of pulmonary critical care sleep and allergy medicine, Allegheny Health Network (AHN).
dr erin gillaspie
Tariq Cheema, MD, division chair of pulmonary critical care sleep and allergy medicine, Allegheny Health Network (AHN)
Kevin Davies, PhD, CRISPR Journal/GEN Biotechnology
Related Content
© 2023 MJH Life Sciences
All rights reserved.