Validating the Framingham Offspring Study Equations for Predicting Incident Diabetes Mellitus

The American Journal of Managed Care, September 2008, Volume 14, Issue 9

Diabetes mellitus (DM) prediction equations developed by the Framingham Offspring Study were validated, but the point score underestimated DM risk in an HMO population.

Background: Investigators from the Framingham Offspring Study (FOS) recently proposed a new simple point score for estimating 8-year diabetes mellitus (DM) risk.

Objectives: To validate prediction models and to compare DM risk estimated by the point score with observed DM incidence.

Study Design: Longitudinal observational cohort study.

Methods: We identified 20,644 members of Kaiser Permanente Northwest (KPNW) who had no prior evidence of DM and who had all data elements necessary to estimate the models recorded between July 1, 1999, and June 30, 2000. Patients were followed up through June 30, 2007. We reestimated the FOS logistic regression models in the total KPNW sample (cumulative DM incidence, 16.5%) and in a randomly selected subsample with incidence identical to that in the FOS (5.1%). We also compared DM risk predicted by the FOS point score with actual 8-year DM incidence observed in the KPNW samples.

Results: The prediction models performed similarly in the FOS and KPNW samples, with almost identical odds ratios for independent variables and areas under the receiver operating characteristic curves for the models. The FOS point score substantially underestimated actual DM incidence in the total sample. In the subsample with DM incidence identical to that in the FOS sample, the point score was extremely accurate.

Conclusions: The accuracy of the point score requires that the underlying incidence of the individuals to whom it is applied be similar to the population from which it was derived. Accurate adaptation requires recalculating DM incidence at each point level.

(Am J Manag Care. 2008;14(9):574-580)

Diabetes mellitus (DM) prediction equations developed by the Framingham Offspring Study were validated, but the point score underestimated DM risk in a health maintenance organization population.

  • Diabetes mellitus incidence can be accurately predicted from a small set of known demographic and clinical risk factors that are easily measured in a clinical setting.
  • Using the point score to estimate DM risk of individual patients must be done cautiously.

Although the prevalence and incidence of type 2 diabetes mellitus (DM) continue to increase at an alarming rate,1 many at-risk patients do not develop DM.2 Accurate prediction of DM onset may help target interventions to prevent DM and facilitate the efficient allocation of healthcare resources. A few DM prediction models have been developed,3-5 but none are widely used in clinical practice. Investigators from the Framingham Offspring Study (FOS)6 recently proposed a new simpler algorithm for estimating 8-year DM risk. The algorithm uses a point score derived from a small set of variables easily obtainable at a clinic visit. The authors’ methods followed those of the original Framingham Heart Study, which created a point score for predicting coronary heart disease (CHD).7 The now widespread acceptance and use of the FOS CHD point score coupled with the simplicity of the FOS DM risk score may lead to rapid adoption of the FOS DM risk score in clinical practice. However, as the FOS investigators recommend, the DM prediction models and the corresponding point score must first be validated in other population samples. We sought to replicate the FOS models using electronic medical records from a large integrated health plan and to compare the DM risk estimated by the point score with the observed DM incidence.


The baseline characteristics of 20,644 KPNW subjects are given in Table 1, along with the characteristics of 3140 persons in the FOS sample. The KPNW sample seemed to be somewhat older (mean age, 57.4 vs 54.0 years) and more obese (mean BMI [calculated as weight in kilograms divided by height in meters squared], 30.3 vs 27.1). Greater proportions of the KPNW sample were hypertensive (67.8% vs 44.2%) and had elevated triglyceride levels (41.8% vs 31.8%).

Reestimated Multivariate ModelsThe FOS and KPNW personal models for DM prediction using only categories of age, sex, family history of DM, and BMI are given in Table 2. Age younger than 50 years and BM I less than 25.0 were the referent categories. Although ages 50 to 64 years and 65 years or older significantly increased the risk of DM in both samples, the odds ratios were much greater for these variables in the FOS sample. Male sex produced a similar odds ratio in both samples but was not statistically significant in the FOS sample. In both samples, BM I of 25.0 to 29.9 approximately doubled the risk of DM, while BM I of 30.0 or higher increased the risk by about 6 times. The FOS model seemed to yield somewhat better discrimination than the KPNW model (area under the ROC curve, 0.724 vs 0.676).

Calibration AnalysisThe calibration analysis of the simple clinical model using categorical variables is given in Table 5. Applying the FOS function directly to the full KPNW sample produced a high χ2 statistic, indicating poor calibration. Recalibrating the model substantially lowered the χ2 statistic, but the value was still highly significant. Although still lower, the χ2 statistic for the reestimated equation was also statistically significant. In all 3 applications, calibration was better among the KPNW subsample.

FOS Point Score and DM IncidenceThe FOS algorithm to estimate DM risk assigns 10 points for a fasting glucose level of 100 to 125 mg/dL, 5 points for an HDL-C level of less than 40 mg/dL in men or less than 50 mg/dL in women, 2 points for a BMI of 25.0 to 29.9, 5 points for a BM I of 30.0 or higher, 3 points for parental history of DM, 3 points for a triglyceride level of 150 mg/dL or higher, and 2 points for blood pressure of 130/85 mm Hg or higher or receiving treatment. The 8-year risk of type 2 DM as projected by the sum of the points as developed by the FOS is plotted in the Figure, ranging from less than 3% among patients who score 10 or fewer points to greater than 35% among patients who score 25 or more points. Actual DM incidence for the total KPNW sample at each point level is plotted according to the points assigned from baseline values, ranging from 5% (<10 points) to almost 60% (>25 points). Actual DM incidence at each point level among the subset of the KPNW sample randomly selected to match the FOS proportion developing DM is also plotted, closely following the estimated risk of the FOS point score.

DISCUSSIONIn this observational study of the electronic medical records of 20,644 health maintenance organization members, we connfirmed the DM prediction equations developed by the FOS. In the simple clinical models, the odds ratios of the predictor variables were remarkably similar to those reported by the FOS, whether using categorical or continuous measures. The overall discrimination as measured by the area under the ROC curve was also similar. The validation of the FOS equations supports the conclusion that complex models are not needed to determine risk of type 2 DM. A set of clinical variables that is easy to collect (namely, BMI, the presence of hypertension, family history of DM, and HDL-C, triglyceride, and fasting glucose levels) produced an area under the ROC curve of 0.84.

However, we did not confirm the FOS point score predicting 8-year DM risk. Although we found that more points equated to greater risk, the accuracy of the point score risk estimate seemed to be contingent on the overall incidence of DM in the population being similar to the 5.1% found in the FOS data. In our study sample, which had a much higher DM incidence of 16.5%, the point score dramatically underestimated individual risk. When we randomly selected a subset of patients who developed DM at the same 5.1% rate found in the FOS data, the expected risk predicted by the point score and the actual proportion developing DM at each point total were almost identical. This finding has important ramifications for the applicability of the simple point scores to higher risk populations. Applying the FOS point score to an individual patient could lead to severe overestimation or underestimation of the patient&#8217;s risk if the underlying mean incident rate is substantially different from that in the FOS sample. However, because actual DM incidence increased with each point at a rate almost identical to that suggested by the FOS score, the number of points accurately represents increasing DM risk. Therefore, if managed care plans with data resources similar to those of KPNW plotted their actual DM incidence at each point level, the point score could be accurately adapted to their specific populations.

Race/ethnicity mix may explain some of the difference in the proportions of the FOS and KPNW samples that developed DM. The FOS sample was 99% white compared with approximately 90% of the KPNW sample. Selection bias likely explains much more of the difference. Although the FOS sample was population based, it was composed of volunteers, who may have been healthier than nonvolunteers. Because the KPNW sample was observational, we were limited to analyzing approximately 20% of KPNW patients for whom all the necessary clinical data were available. Among the total KPNW population aged 26 to 82 years (those with and without all the clinical measures), the 7-year incidence of DM was 6.5%. That 16.5% of our study sample experienced DM during the same 7 years suggests that these patients had the necessary measures because their clinicians may have been concerned about their DM risk. However, these are precisely the type of patients to which a clinician would likely apply the FOS point score. Although the FOS point score might accurately assess DM risk on a populationwide basis, it seems to substantially underestimate that risk among clinical attendees at all point levels.

In addition to the KPNW sample being composed of healthcare seekers, there were some specific differences between the KPNW and FOS samples that likely affected the underlying incidence of DM. In particular, the mean BM I of the KPNW sample was 30.3 compared with 27.1 in the FOS sample. Because obesity is one of the strongest risk factors for DM, it is not surprising that DM incidence was much greater in the KPNW sample. Nonetheless, if the predictive power of BM I is continuous, the FOS risk score will underestimate the effect of very high BMI because patients with a BMI of 30.0 receive the same 5 points as patients with a BM I of 35.0. This is an important consideration given the continuing trend of rising obesity prevalence.10-12 In addition, the KPNW sample had a much higher proportion of subjects with hypertension and high triglyceride levels, both well-known risk factors for DM incidence.13

To our knowledge, this is the first study to examine the applicability of the FOS prediction equations of incident DM. However, it is not the first to find Framingham equations to be inaccurate. Previous investigations of the Framingham CHD risk equations have shown that underestimation of risk occurs in patients with DM.14,15 This is due to an underlying risk of CHD in DM that is greater than that in the general population on which the Framingham equations were based. When such systematic underestimation occurs, a process known as recalibration can be used, which replaces the Framingham mean predictor values and mean incident rate with the mean values and mean incident rate from the new cohort. Recalibration has been shown to correct overestimation or underestimation in ethnic groups with different underlying CHD risk but does not produce a better model fit than reestimating the equations with each site&#8217;s data.9 We similarly found that recalibration greatly improved the fit of the FOS function but not more so than reestimating risk in our data using identical variables.

The FOS is not the first study to attempt DM risk prediction. Most of those efforts have focused on predicting prevalent but undiagnosed DM.3,16-18 A simple tool that relied on questionnaire-based data was developed in Finland, but that risk score predicted DM medication initiation, not incident DM per se.19 Another longitudinal risk score using 4 variables (age, sex, and triglyceride and fasting glucose levels) has been proposed, but the area under the ROC curve for that score was 0.71, considerably lower than that in the FOS model. The San Antonio Heart Study investigators developed an incident DM prediction model similar to the FOS model in the variables used and in performance but did not propose a risk score.4 Given the simplicity of the FOS risk score and the widespread use of a similar score for predicting CHD risk, the FOS tool might be rapidly adopted into clinical practice. Although the FOS equations seem valid for predicting DM incidence, the point score did not accurately predict incident DM in a population at much greater risk. Therefore, the accuracy of a point score requires that the underlying incidence of the individuals to whom it is applied be similar to the population from which it was derived. Because the clinician is unlikely to know whether this is indeed true, using the point score to estimate DM risk of individual patients must be done cautiously. Health plans that can recalculate the risk associated with each point level can accurately adapt the point score to predict DM risk in their populations.

Author Affiliations: Kaiser Permanente Center for Health Research, Portland, OR (GAN, JBB).

Funding Source: None reported.

Author Disclosure: The authors (GAN, JBB) report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.

Authorship Information: Concept and design (GAN, JBB); acquisition of data (GAN); analysis and interpretation of data (GAN, JBB); drafting of the manuscript (GAN, JBB); critical revision of the manuscript for important intellectual content (JBB); statistical analysis.

Address correspondence to: Gregory A. Nichols, PhD, Kaiser Permanente Center for Health Research, 3800 N Interstate Ave, Portland, OR 97227-1098. E-mail:

1. Mokdad AH, Ford ES, Bowman BA, et al. Diabetes trends in the U.S.: 1990-1998. Diabetes Care. 2000;23(9):1278-1283.

3. Herman WH, Smith PJ, Thompson TJ, Engelgau MM, Aubert RE. A new and simple questionnaire to identify people at increased risk for undiagnosed diabetes. Diabetes Care. 1995;18(3):382-387.

5. Eddy DM, Schlessinger L. Archimedes: a trial-validated model of diabetes. Diabetes Care. 2003;26(11):3093-3101.

7. Wilson PW, D’Agostino RB, Levy D, Belanger AM, Silbershatz H, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97(18):1837-1847.

9. D’Agostino RB Sr, Grundy S, Sullivan LM, Wilson P; CHD Risk Prediction Group. Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. JAMA. 2001;286(2):180-187.

11. Mokdad AH, Ford ES, Bowman BA, et al. Prevalence of obesity, diabetes, and obesity-related health risk factors, 2001. JAMA. 2003;289(1):76-79.

13. Nichols GA, Hillier TA, Brown JB. Progression from newly acquired impaired fasting glucose to type 2 diabetes. Diabetes Care. 2007;30(2):228-233.

15. Coleman RL, Stevens RJ, Retnakaran R, Holman RR. Framingham, SCORE, and DECODE risk equations do not provide reliable cardiovascular risk estimates in type 2 diabetes. Diabetes Care. 2007;30(5):1292-1293.

17. Baan CA, Ruige JB, Stolk RP, et al. Performance of a predictive model to identify undiagnosed diabetes in a health care setting. Diabetes Care. 1999;22(2):213-219.

19. Lindstrom J, Tuomilehto J. The Diabetes Risk Score: a practical tool to predict type 2 diabetes risk. Diabetes Care. 2003;26(3):725-731.