Standardizing Primary Care Physician Panels: Is Age and Sex Good Enough?

July 12, 2012
Sukyung Chung, PhD
Sukyung Chung, PhD

Laura J. Eaton, MD, MPH
Laura J. Eaton, MD, MPH

Harold S. Luft, PhD
Harold S. Luft, PhD

Volume 18, Issue 7

Primary care physician panels vary markedly in work effort needed. Age/sex adjustment is sufficient to account for this for children, but not for adults.


To determine if patient clinical conditions need to be considered when assessing primary care physician (PCP) workload in the context of standardizing panel sizes.

Study Design:

Work resource value units (wRVUs) were used to standardize PCP panel workload. Standardized panels were created using (1) age and sex— and (2) clinical condition–based risk indicators. Billing data were used for all patients, regardless of insurance, for PCPs in a group practice (n = 190). Weighting methods were assessed

for subgroups based on PCP specialty (family medicine, internal medicine, and pediatrics) and patient age (adults vs children) and for different levels of aggregation (patient vs PCP).


Groupwide weights based on wRVUs of all primary care services delivered during the year were applied to individual patients and then aggregated to PCP panels. For age/sex weighting, only patient age and sex were taken into account. For condition-based weighting, 1275 disease categories, based on a combination of episode treatment groups (ETGs) and age and/or sex, were used.


As expected, at the patient level, condition-based weights were far more discriminative than age/sex. At the PCP level, this discrimination was less important; panel weights varied 1.9- (age/sex—based) to 2.6-fold (condition-based) across PCPs. Correlations between the 2 weighting methods were high (r = 0.93) for child panels and moderate (r = 0.71) for adult panels (all P <.001).


The heterogeneity of PCP panels should be considered when assessing PCP workload for panel management. Panel variability in workload is well captured by age/sex—based weights for children, but for adults condition-based adjustment may be necessary.

(Am J Manag Care. 2012;18(7):e262-e268)There is a wide variation in patient demographic and clinical characteristics across primary care physician (PCP) panels, even within a single group practice organization. For panel-based productivity or performance assessment, standardization of PCP panels requires adjustment for such differences.

  • Child and adult panels should be assessed separately.

  • For child panels, age/sex—based adjustment is probably sufficient.

  • For adult panels, further adjustment reflecting patient clinical conditions is warranted.

  • In the setting studied, the adult and child panels of family practitioners differed from those seen in the panels of general internists and pediatricians, respectively.

  • Panel size standardization should account for unusual, but predictable, coding patterns.

Efforts to slow the healthcare expenditure growth include a potential shift from the current standard fee-for-service (FFS) or productivity-based model to compensating primary care physicians (PCPs) for the appropriate management of a patient panel.1-3 Many group practices and managed care organizations are exploring such panel-based payment models.4 The transition to panel-based compensation must address each physician’s concern that his or her panel is “sicker than average” and thus requires more physician time. In spite of many flaws with respect to its incentives, FFS-based payment roughly reflects physician work effort and thus is perceived by physicians as fair in this regard. An important first step in panel-based payment is determining the method, or even the need, to account for variations in case mix among physician panels. To facilitate acceptance of change, the new methods should be simple and transparent.

Historically, correcting for patient case mix has been performed using various risk-adjustment techniques. The vast risk-adjustment literature has typically focused on predicting individuals’ costs, including hospitalization and other very expensive services. Age and sex alone perform poorly in predicting individual healthcare costs,5,6 and sophisticated risk-adjustment measures, using indicators of clinical conditions and/or prior healthcare costs, are far superior.7,8 Physician-level risk adjustment focuses on profiling physicians based on all costs incurred by their patients. These studies found that after including clinical condition indicators, adjustment for condition severity did not change provider ranking9 and methods of dealing with cost outliers and attributing costs to a provider did not substantially influence provider ranking.10 This literature, however, does not address what is necessary when standardizing panels for the PCP’s anticipated work effort. Panel standardization for PCP payment may be a simpler challenge because we need only focus on the (1) primary care work effort and (2) work effort aggregated to the provider panel level. This led us to ask if age/ sex-based weights are good enough for such panel size standardization.

In our study, we seek a simple and robust approach to standardize PCP panel size that is easy to implement and understandable by clinicians. Age and sex categorization meets that requirement, but may be insufficiently accurate. We therefore consider standardized workloads for a panel using 2 weighting methods: (1) finely defined condition-based weights, and (2) simple age/sex—based weights. Poorly constructed conditionbased weights might be no better than age- and sex-based weights, so we compared the 2 for both individual patients and PCP panels to see if the expected higher discrimination of condition- based weights for individuals matters as much for panels. To determine whether panel weights would need to be recalculated each year as panel composition changes, we then examined the stability of adjusted panel sizes over time to see if values change much from year to year.

METHODSStudy Setting

We used data from 2007 to 2008 on services provided by PCPs in the Palo Alto Division of Palo Alto Medical Foundation (PAMF/PAD), California. We focused on 2008 data except for the cross-year correlations. The primary care departments were located in 7 clinics of varying sizes (number of PCPs ranged from 4 to 72). Most PAMF patients were insured and represented a multitude of plans (preferred provider organization [PPO] 57%, health maintenance organization [HMO] 26%, Medicare 10%, other or no insurance 7%). The average age of all patients with a PCP was 35.4 years, and 57.5% were female. Among adults (>18 years) (n = 281,842, 70.7%), the average age was 47.6 years and 60.1% were female; among children (<18 years), the average age was 6.1 years and 49.9% were female.

Definition of PCP Panels

Panels were created based on each patient’s identified PCP, as determined at registration and updated whenever a patient formally changes PCPs. On average, 87% of primary care visits were made to the patient’s own PCP. Patients who used any primary care service during the year were included in the panel estimates. We used 2 panel definitions: (a) overall panel, and (b) separate adult (aged >18 years) and child (aged <18 years) panels. After excluding PCP panels of fewer than 100 patients (n = 7), 183 panels (76 family medicine, 62 general internal medicine, and 45 pediatric) were included in the analyses for overall panels. For adult/child-specific panel analyses, 141 adult panels (77 family medicine, 62 internal medicine, and 2 pediatrics) and 110 children panels (65 family medicine and 45 pediatrics) were used, after excluding panels with less than 50 adults or children. Panel size varied widely across PCPs. “Trimmed” overall panels ranged from 203 to 2220 (average 959; standard deviation [SD] = 344) and adult/child—specific panels ranged from 50 to 2126 (average 693; SD = 456).

Scope of Service Use

Our focus was the standardized primary care service workload for each panel, assuming each patient received the average work resource value units (wRVUs) for “similar patients” across all PAMF/PAD PCPs. (Our key question is whether “similar patients” should be characterized by age/sex or by clinical conditions.) This essentially matched the current compensation model at PAMF in which salaries are proportional to wRVUs. Primary care wRVUs were predominantly for evaluation and management services (ie, consultation during office visits). Excluded are material or capital (ie, non-wRVU) costs associated with procedures, imaging, tests, immunizations and injections delivered in the primary care offices, or other services ordered by a PCP but delivered by non-PCP clinicians. A uniform Medicare resource-based relative value scale schedule (RBRVS) was used to eliminate differences in payer source across patients.

Age/Sex-Based Weights

For each age (0-90 years, in 1-year increments, top-coded at 90) and sex, the average annual PCP wRVUs per patient in the category was calculated. The resulting age/sex-based weights were then applied to individual patients in each category and further aggregated to the panel level. The distribution of wRVUs differed noticeably by sex after adolescence, with higher PCP-generated wRVUs for females (Figure 1), although care delivered by obstetrics and gynecology specialists was excluded.

Condition-Based Weights

The procedure to compute condition-based weights was similar, but with many more categories. Adjustment based solely on diagnosis codes may over-reward clinicians who code more conditions. To reduce this effect, we used Symmetry’s Episode Treatment Groups (ETG version 7.5) to aggregate codes to clinical conditions and attribute wRVUs. The Symmetry ETG system creates episodes of illness, ranging from simple, short-term acute problems to long-term chronic illnesses. The ETG grouping methodology uses data on each service provided and the associated diagnosis. Based on the diagnoses, it assigns each service to 1 of a set of mutually exclusive and collectively exhaustive episodes of care regardless of provider, treatment location, or duration. The Symmetry disease classifi cation methodology is a widely used and validated system for building clinically homogeneous episodes. This is important because: (1) physicians can relate to the illness groupings, allowing for meaningful communication regarding treatment; and (2) clinical homogeneity within an ETG provides the basis for substantive comparison and detailed drill-down analysis.11 Several other systems in the market produce comparable results.12,13

The following example illustrates the approach for a patient with diabetes who visits a PCP for a fall and laceration. During that visit an x-ray is ordered, the laceration sutured, a glycated hemoglobin (A1C) test ordered to check on the patient’s diabetes, and a flu shot given. As the primary diagnosis recorded for the visit is trauma, the evaluation and management component of the visit would be grouped into a trauma ETG. The suturing procedure would also be attributed to that episode. The x-ray cost and its interpretation by a radiologist would also be attributed to the trauma ETG, but in this analysis non-PCP costs are ignored. Likewise, although the A1C test will be assigned to a diabetes ETG, its costs are ignored because the test is provided by the laboratory rather than the PCP. The patient will probably have other visits that capture PCP offi ce visit services associated with diabetes. To examine potential misclassifi cation errors (ie, the episode cost for diabetes in this example does not capture the PCP’s effort in managing the patient’s diabetes), we repeated the entire analysis using episodes based just on the primary diagnosis in the billing data when defining condition-based weights, and found no noticeable difference in patient- and panel-level weights.

Using the 2008 data for PCP-provided services, we identifi ed about 450 outpatientfocused ETGs. We further divided these by age and sex group when there were significant differences in wRVus by age (and/or sex) within a specifi c ETG, using ANOVA, creating 1275 categories. For this step we used 14 age/sex groups defined as 0, 1, 2 to 5, 6 to 49, 50 to 69, 70 to 80, and greater than or equal to 81, multiplied by 2 for each sex. The age breaks refl ected the pattern of age-specific wRVu use observed in the data. For example, average wRVus for infants (6.4) fell after the first year, but were similar for ages 2 to 5 (range 3.0-3.4) (Figure 1). As an example of interaction effects, the “asthma” ETG was split into 14, 1 for each age/sex group defined above. ETG weights were computed for each “ETG cell” that a patient would be in and a patient could have multiple ETGs, so the weights were additive.


Figure 2

Assuming wRVus are a reasonable measure of PCP work effort, the expected panel management “burden” of a patient can be approximated by the average wRVu for that patient. (We are not arguing for the validity of the wRVu as a measure of PCP effort, but it is the current basis for most physician compensation.) We assigned each person a weight, refl ecting either age and sex or the sum of the weights associated with each ETG for that patient. To examine whether conditionbased weights added calibration power to age-sex categories, we plotted average condition-based weights across 20 categories (every fifth percentile of condition-based weight distribution) within each of the 7 age cells, for each sex ().

We aggregated weights for each patient in each PCP’s panel and compared average weights in a given year using the 2 methods and also assessed shifts in panel weights over time. Pearson’s correlation coeffi cient (r) was used to index the strength of linear association: strong if greater than 0.8, moderate if between 0.5 and 0.79, and weak if less than 0.5, and results were reported when signifi cant at P <.001.


Patient Level Variation in Age/Sex—Based and Condition-Based Weights

Figure 1

Primary care wRVus (per patient-year) vary substantially by age and sex (). With the average patient accounting for 3.2 primary care wRVus per year, the weights for males varied roughly 5-fold across the age spectrum from 1.9 (aged 20 years) to 9.6 (aged 0 years). For females, the variation was 4.3-fold, from 2.1 (aged 10 years) to 9.2 (aged 0 years).

Figure 2

Condition-based weights across all patients varied 12-fold, from 0.84 (patients in the bottom 5% of risk based on ETGs) to 10.2 (top 5% of patients) (). Even within the age/sex category, using ETGs allows one to discern wide differences in risk, which increases with age. For females 81 years or older, a 14-fold difference is observed across the 5% categories; for patients aged 0 to 1 years, there is only a 3.5-fold difference (Figure 2). The distributions for males (not shown) were similar. Conditions add substantial discriminatory value at a patient level.

PCP Level Variation in Panel Weights—Differences by PCP Specialty

At the panel level, the relative workload based on condition-based weights varied 2.6-fold across the 183 PCPs (from 2.2 to 5.5 wRVus per patient), with an average of 3.3. (This is the variability based just on patient mix assuming each patient in a “risk cell” receives the average PCP wRVus for all patients in that cell. Actual wRVus generated by PCPs for a panel vary more, but that may refl ect effi ciency, referral patterns, or quality factors that should not be incorporated in panel size standardization, but should rather be assessed separately.) A smaller, 1.9- fold, range was observed using just the age/sex weights (average 3.3, minimum 2.6, and maximum 4.9). The range, using either age/sex— or condition-based weights, was wider across pediatrician’s panels (minimum 2.7; maximum 5.5) than for family practitioners (minimum 2.2; maximum 3.7) or general internists (minimum 2.6; maximum 3.9) (Figure 3A).

Using separate child and adult panels, the variation based on condition-based weights across panels was wider (2.1 to 5.5) for children than adults (2.0 to 3.9) (Figure 3B). The ranges were slightly narrower using age-sex weights. Even with the adult versus child split, panel weights differed by specialty. For child panels, pediatricians had higher weights (average 4.0) than family practitioners (average 3.2). For adult panels, general internists had higher weights (average 3.2) than family practitioners (average 2.9).

Concordance Between Age-Based and Condition-Based Weighting Methods

Figure 3A

There was a strong correlation (r = 0.90) in overall panel weights using the 2 weighting methods (Figure 3A). The concordance was stronger for child panels (r = 0.93), but only moderate for adult panels (r = 0.71). Among the child panels, 2 were outliers with much higher weights based on condition than age/sex. Adult panels generally deviated further from the 45° line than children panels, with 3 notable outliers. Examination of the 5 outliers (marked in circles in ) revealed that patients in these panels had an above-average number of clinical conditions recorded in their 2008 visits. The 5 physicians were all recently trained, and joined PAMF between 2005 and 2007. None had a particular subspecialty. As new physicians, it is likely that a disproportionate share of their patients had “initial patient visits” in which more conditions would be noted than for patients with ongoing care.

Stability of Patient and Panel Weights Over Time

At patient level, year-to-year correlations (between 2007 and 2008) in weights were obviously very strong based on age/ sex for both adults (r = 0.91) and children (r = 0.99) (data not shown). In contrast, for individuals, condition-based weights had cross-year correlations of only 0.43 for adults and 0.61 for children.

Figure 4A

Figure 4B

Within-PCP correlations in panel weights between year 2007 and year 2008 were very strong for both adult (r = 0.99) and child panels (r = 0.93) using age/sex weights (). Cross-year within-PCP correlations were still strong using clinical conditions for both adult (r = 0.83) and child panels (r = 0.88) ().


Clinical condition measures discriminate far better than age/sex alone in capturing patient-level variation in service needs, even when focusing only on primary care service use. Our goal, however, was a simple and transparent approach reasonably accounting for differences in primary care panel workload across PCPs. This led us to ask if age/sex—based weights are good enough for such panel size standardization. For panels of children, they probably are suffi cient. PCP work effort for children is very high for newborns, and a panel’s age composition captures this well. For adult panels, however, age/ sex cells do not suffi ciently capture the heterogeneity across panels that may refl ect differences in physician expertise and patient selection.

Across PCPs, standardized workloads measured by wRVus per patient vary widely regardless of the weights used. While PCP panel weights—either age/sex—based or conditionbased—were fairly stable over time, this may refl ect the stability of our study population. Further, even with separate child and adult measures, panel demographic and clinical characteristics differed across pediatrics, family practice, and general internal medicine, suggesting that unadjusted productivity comparisons across PCP departments may be misleading.

These findings refl ect only those services measured by primary care wRVus. In results not reported here, we conducted the same analyses using primary care—provided total RVus, which include relatively expensive injections/drugs. Although the magnitude of variation in panel weights was proportionally larger, the results were very similar to those reported here. On the other hand, the pattern for costs based on services across all specialties is likely to be quite different, and even more so if including facility costs. Assessing specialist physician productivity or the costs of episodes of care by teams of providers consisting of PCPs and specialists will require much more sensitive tools to capture selective referrals to different specialists.

In applying these fi ndings, the representativeness of this study needs to be considered. Our patients are predominantly insured by private insurance or Medicare and receive care in a large multispecialty medical group. Within-physician and across-patient variations in service use may differ from those of other settings. We used Medicare-based wRVus, however, so differences in patient insurance across panels should not matter; although specifi c dollar values may differ across settings, the underlying lessons are likely to be applicable.

This study informs directions for future studies. First, in the few outlier panels with much higher weights based on conditions, the average number of comorbid conditions per patient was notably high. Whether these patients indeed had more problems, or whether the observed difference was attributable to different coding patterns (ie, given visit content, some physicians may document more conditions in the billing records than other physicians), merits further investigation. Most PCPs code more problems during the initial visits with a patient (which are significantly longer). These outlier PCPs probably had a higher proportion of such visits because their patients were “new” to them. Panels may also reflect selective referral, or attraction, to PCPs such that patients who are in better (or worse) health, or are less (or more) likely to describe problems, controlling for their age and sex, may gravitate to certain PCPs.

Second, we did not design our measures to be predictive in the sense of using conditions seen in one year to set payments for the next. Such prospective payment shifts the risk of occurrence to the physician, which is an even greater change than panel management—based payment in which PCPs would be paid average amounts for the patient problems they actually manage. Further studies are needed to understand the implication of prospective or predictive weighting methods in standardizing PCP panels.

Third, the performance of condition-based weighting for this type of standardization warrants further work. We used ETGs to blunt the effect of over-testing, but this too may cause a bias. A PCP who actively monitors a patient’s condition over a series of routine visits without ordering tests for a specific diagnosis would not accrue the service entries used to generate additional risk factors. The clinical conditions coded in the billing data, moreover, may not be as accurate as those drawn from actual medical records. Searching “problem lists” for additional conditions not recorded on claims would allow a test of the validity of billing data for panel workload assessment.


Even within a single large physician group, patient needs vary widely across PCPs, and this must be taken into account when developing standardized measures of panels. Workload, quality, or productivity measures of panel management require standardization to appropriately reflect patient heterogeneity. Adult and child panels should be separately assessed. For child panels, the variability in primary care service needs is well captured by a simple age-based adjustment. For adult panels, however, patients’ clinical conditions may need to be taken into account.Author Affiliations: From Palo Alto Medical Foundation Research Institute (SC, LJE, HSL), Palo Alto, CA; Department of Family and Community Medicine (LJE), University of California San Francisco, San Francisco, CA.

Funding Source: None.

Author Disclosures: The authors (SC, LJE, HSL) report no relationship or financial interest with any entity that would pose a conflict of interest with the subject matter of this article.

Authorship Information: Concept and design (SC, LJE, HSL); acquisition of data (LJE, HSL); analysis and interpretation of data (SC, LJE, HSL); drafting of the manuscript (SC, LJE); critical revision of the manuscript for important intellectual content (SC, LJE, HSL); statistical analysis (SC, LJE); obtaining funding (HSL); and supervision (HSL).

Address correspondence to: Sukyung Chung, PhD, 2350 W El Camino Real, Mountain View, CA 94040. E-mail: Institute of Medicine. Rewarding Provider Performance: Aligning Incentives in Medicare. Washington, DC: National Academies Press; 2006.

2. Davis K. Paying for care episodes and care coordination. N Engl J Med. 2007;356(11):1166-1168.

3. Goroll AH, Berenson RA, Schoenbaum SC, Gardner LB. Fundamental reform of payment for adult primary care: comprehensive payment for comprehensive care. J Gen Intern Med. 2007;22(3):410-415.

4. Minott J, Helms D, Luft H, Guterman S, Weil H. The group employed model as a foundation for health care delivery reform. Issue Brief (Commonw Fund). 2010;83:1-24.

5. Van de ven WPMM, Ellis RP. Risk adjustment in competitive health plan markets. In: Anthony JC, Joseph PN, eds. Handbook of Health Economics. Vol 1, pt 1: Amsterdam, The Netherlands: Elsevier Science BV; 2000:755-845.

6. Rice N, Smith PC. Capitation and risk adjustment in health care financing: an international progress report. Milbank Q. 2001;79(1):81- 113, IV.

7. Pope GC, Kautter J, Ellis RP, et al. Risk adjustment of Medicare capitation payments using the CMS-HCC model. Health Care Financ Rev. 2004;25(4):119-142.

8. Luft HS, Dudley RA. Assessing risk-adjustment approaches under non-random selection. Inquiry. 2004;41(2):203-217.

9. Thomas JW. Should episode-based economic profiles be risk adjusted to account for differences in patients’ health risks? Health Serv Res. 2006;41(2):581-598.

10. Thomas JW, Ward K. Economic profiling of physician specialists: use of outlier treatment and episode attribution rules. Inquiry. 2006;43(3): 271-282.

11. Ingenix. What are ETGs? Resources/Articles/What_are_ETG.pdf. Published 2008. Accessed June 25, 2012.

12. Thomas JW, Grazier KL, Ward K. Economic profiling of primary care physicians: consistency among risk-adjusted measures. Health Serv Res. 2004;39(4, pt 1):985-1003.

13. Thomas JW, Grazier KL, Ward K. Comparing accuracy of risk-adjustment methodologies used in economic profiling of physicians. Inquiry. 2004;41(2):218-231.