Understanding the Relationship Between Data Breaches and Hospital Advertising Expenditures

Sung J. Choi, PhD; and M. Eric Johnson, PhD

A data breach that exposes protected health information is a public relations crisis for hospitals. Healthcare data breaches include theft, loss, unauthorized access/disclosure, improper disposal, and hacking of protected health information. The Health Information Technology for Economic and Clinical Health (HITECH) Act mandates hospitals covered by the Health Insurance Portability and Accountability Act to report data breaches exposing more than 500 individuals to those affected, as well as HHS and sometimes the media, typically within 60 days of discovering a breach. The Office for Civil Rights investigates reported data breaches and enforces corrective action.1 The reported breaches have been published by HHS since October 2009 in a public database,2 and the Privacy Rights Clearinghouse (PRC) also provides information on reported health data breaches to the public.3

Managed care and market-based reforms have driven hospitals to compete for patients. In a competitive market, hospitals use advertising to market services and communicate information directly to patients. It has previously been found4 that hospital advertising increased with market concentration. In recent years, spending on hospital advertising has skyrocketed. The hospital industry spent $2.3 billion on advertising in 2014—a 38% increase from 2011, according to the Kantar Media survey.5

Cancer center advertising spending increased 3-fold from 2005 to 2014.6 Increased advertising reflects the efforts by cancer centers to attract patients in a competitive market, especially as the demand for cancer care increases with the aging population.7 For example, cancer center advertisements promote the benefits of cancer therapy with emotional appeals8; samples of award-winning hospital advertisements highlight positive experiences and emotions associated with care and improved quality of life.9,10

Breached hospitals incur significant costs associated with fixing the breach and protecting the affected individuals from further harm. Investigation of a reported breach by HHS usually takes about a year to complete. The investigation concludes with a settlement, including a penalty of hundreds of thousands of dollars and/or remedial action, which typically must be implemented within 2 to 3 years. Separate from HHS investigations, some breaches result in class-action lawsuits.11,12 The advertising expenditures that we investigated occurred subsequent to the breach disclosure and added to the remediation costs outlined above. Based on a survey of US firms from 2005 to 2014, Romanosky13 estimated that the median cost of a data breach was $200,000, including costs from investigating the breach, notifying the affected individuals, public relations, credit monitoring, litigation, and fines.

The Ponemon Institute estimated that in 2016, the healthcare industry spent an average of $402 per stolen record for direct and indirect costs associated with a breach.14 Costs vary depending on the size and type of the breach; accordingly, it has been found that large breaches place a larger financial burden on the organization.14 Kwon and Johnson15 found that data breaches in hospitals were associated with decreased outpatient visits and admissions in the long run. Their findings suggest that hospitals are vulnerable to patient loss after a breach.

Carefully crafted marketing campaigns can be launched by a hospital to build up its image and minimize patient loss after a breach. Several hospitals and health systems—serving Florida, Michigan, North Carolina, or Texas—that reported breaches between 2016 and 2018 made award-winning advertisements within a year of the breach. These observations motivated our investigation on how data breaches affect hospital expenditures. Together with expenditures related to disclosing and repairing the damage from a breach, such advertising is a potentially preventable burden to the healthcare system. The aim of our paper was to investigate the relationship between data breaches and hospital advertising expenditures by analyzing a national sample of nonfederal acute care inpatient hospitals from 2011 to 2014.

Source of Data

After a breach occurs, depending on the type, it may take weeks or months until it is discovered and reported. We used HHS and PRC data on breaches, which included the name and location of the breached entity, time of breach reporting, type of breach, and number of records exposed (data from both entities are available for public download).2,3 It should be noted that the HHS database does not include breaches that affect fewer than 500 individuals; thus, it is not an exhaustive record of all health data breaches.

Voicetrak provided data on hospital advertising expenditures based on surveys of media vehicles. Voicetrak conducts a quarterly survey of 9300 local media vehicles in 210 media markets across the United States. Media vehicles include television, cable systems and interconnects, radio stations, newspapers and business journals, out-of-home companies, and local magazines in a city/metropolitan area. Voicetrak data are available to the public for purchase.16 Voicetrak data did not capture online advertising expenditures; thus, their estimates capture only part of the total advertising cost. Quarterly advertising expenditures were aggregated to yearly expenditures. The advertising data do not distinguish between individual hospital expenditures and system-level expenditures.

The Healthcare Cost Report Information System (HCRIS) provided data on hospital revenues, expenses, discharges, beds, ownership status, teaching status, rural status, and meaningful use status (meaningful use of electronic health records as defined in HITECH). Medicare-certified hospitals are required to submit an annual cost report to HCRIS (data are available for public download).17 HCRIS data were the primary data set into which other data sets were merged. The data sets were joined by hospital name and year. HHS and Voicetrak data provided the business name of the hospital but no standardized identifier; therefore, hospital names in these 2 data sets were manually matched to the hospital names in HCRIS.

Market competition has been linked to hospital advertising expenditures.4 To control for market competition, we added proxies for the county-level supply and demand for health services by merging the 2014 Area Health Resources Files.18,19 The number of short-term general hospitals in a county was used to measure the supply of hospital care in a county, which represents a metric of hospital competition. The number of Medicare enrollees in a county was used to measure the demand for health services.

To maintain consistency in the financial data, we restricted data to include only nonfederal acute care inpatient hospitals using the CMS definition of facility type.17 Hospitals in the US territories and Maryland (which has a prospective payment system waiver) were excluded for consistency. The data were further restricted to hospitals that filed with HCRIS between 360 and 370 reporting days. When a hospital submitted multiple financial reports in a given year, the most recent report was used. Finally, observations with missing values in the dependent or independent variables were dropped from analysis. The study sample consisted of 3496 hospital-year observations before propensity score matching.


The breached hospitals and control hospitals had different observable characteristics. The breached hospitals were more likely to be large, teaching, and urban hospitals. Propensity score matching was used to adjust for potential sample selection bias due to observable differences between the breached and control hospitals.20-23

The propensity score for assignment into the breached group was predicted using a logit model. In the logit model, we first included all the control variables to the right-hand side, then narrowed down the predictors by inspecting the balance of the matched sample with standardized mean differences (SMDs). An SMD of less than 0.1 for the covariates between the 2 groups indicates a negligible difference in the mean.24 We generated the balanced sample using the following controls: operating revenue, hospital discharges, number of beds, occupancy rate, length of stay, number of general hospitals in a county, Medicare enrollment, ownership, teaching status, and year.

Hospitals were matched using the nearest neighbor matching approach allowing for ties, with replacement, with a caliper distance of 0.2 SD.25 If 1 breached hospital matched multiple control hospitals (n), resulting in a tie, the multiple matched control hospitals were weighted by 1/n. Matching was performed using the Matching package 4.9-3 in R.25 Of the 75 observations in the full sample of breached hospitals, 3 observations failed to match. Thus, the matching yielded 72 observations in the breached group and 915 unweighted observations in the control group. The matched sample was used for empirical modeling.

Hospital advertising expenditure was heavily right-skewed. Ordinary least square (OLS) regression fails to consistently model a skewed dependent variable. A generalized linear model (GLM) addresses the weaknesses of OLS and is a popular method for modeling healthcare costs.26 The dependent variable was hospital advertising expenditures, which were measured in 2 ways. First was the annual hospital advertising expenditure adjusted to 2014 dollars. The advertising expenditures captured by Voicetrak were conditional on a hospital having nonzero expenditures. Alternatively, to capture the increase in advertising expenditures subsequent to a breach, we also measured the 2-year hospital advertising expenditures by summing the current year’s and next year’s expenditures. The dependent variable was specified as a gamma distribution. The link function was set to log.

A dummy variable was set to 1 for a breached hospital; it was set to 0 for a nonbreached (control) hospital. The coefficient on the breach dummy estimated the difference in advertising expenditures between breached and control hospitals. A vector of hospital characteristics adjusted for confounders, including total revenue, total margin, operating revenue, operating margin, number of beds, length of stay, occupancy rate, total discharges, ownership, teaching status, rural status, meaningful use status, and year fixed effects. Standard errors are heteroscedasticity robust and account for within-hospital correlation. GLM was performed using Stata version 14 (StataCorp; College Station, Texas).

A descriptive scatterplot of advertising expenditures and bed size fitted with a second-degree polynomial curve (Figure) showed a positive correlation up to about 1000 beds. A few hospitals with more than 2000 beds spent less on advertising, and those extreme cases pulled the fitted line down as bed size increased above 1000. When stratified by breach status, breached and control hospitals followed a similar concave trend. Breached hospitals spent more than nonbreached hospitals; however, the fitted lines of the 2 groups were not significantly different for most of the range of the hospital beds.

The descriptive characteristics of the full sample of hospitals are summarized by breach status in Table 1. Note that each observation in the data set was a hospital-year, as a single hospital may be repeatedly observed over time. The number of hospitals in the breached group was 75, and the number of hospitals in the control group was 3421. The breached hospitals spent nearly 3 times more on advertising than the control hospitals (approximately $688,000 vs $238,000 for annual spending; $1,713,000 vs $551,000 for 2-year spending). The breached hospitals were more likely to be larger in bed size (565.60 vs 291.49), more likely to be a teaching hospital (77.4% vs 41.7%), and higher in occupancy rate (69.11% vs 57.62%). Breached hospitals were located in counties with significantly more hospitals and Medicare enrollees, suggesting that they were in more competitive areas.

As shown in Table 2, the number of propensity score–matched hospitals in the breached group was 72; the unweighted number of matched hospitals in the control group was 915. For continuous variables, means (SDs) are shown. The breached hospitals spent $817,205.11 ($1,379,037.92) on advertising expenditures in the year of the breach, which was higher than the $568,078.12 ($1,485,531.25) spent by the matched control hospitals (SMD >0.1). The breached hospitals spent $1,753,358.75 ($2,791,376.50) on advertising over 2 years, whereas the matched control hospitals spent $1,126,682.72 ($2,813,634.41) over 2 years (SMD >0.1).

Focusing on the breached hospitals, the total revenue was $1058.77 million ($853.84 million), with a total margin of 6.75% (9.03%). The operating revenue was $843.92 million ($670.12 million), with an operating margin of –14.22% (55.71%). Total discharges were 27,876.02 (17,333.74) patients. The number of beds was 592.93 (367.47), the length of stay was 4.92 (0.86) days, and the occupancy rate was 69.28% (15.79%). For categorical variables, percentages are shown followed by counts. Ownership was mostly nonprofit (69.4% [n = 50]; investor owned: 12.5% [n = 9]; public: 18.1% [n = 13]). Teaching status was dominantly teaching (76.4%; n = 55). Most hospitals were urban (91.7%; n = 66) and meaningful users of health information technology (IT) (59.7%; n = 43). The characteristics of the matched control hospitals generally had negligible differences from the breached group, with an SMD of less than 0.1 for most of the regressors.

The GLM estimates using the matched sample are shown in Table 3. A data breach was associated with a 64% (95% CI, 7.2%-252%; P = .023) increase in annual advertising expenditures compared with the matched control group, holding observable variables constant. Similarly, the 2-year advertising expenditures were 79% (95% CI, 16.4%-274%; P = .008) higher for the breached hospitals. Nonprofit hospitals were associated with 3.7 (95% CI, 1.8-7.6; P <.001) times higher advertising expenditures than public hospitals. Two-year advertising expenditures of nonprofit hospitals were 4.9 (95% CI, 2.3-10.8; P <.001) times higher than public hospitals. Two-year advertising expenditures of investor-owned hospitals were 2.5 (95% CI, 1.2-5.3; P = .019) times higher than public hospitals. Relative to nonteaching hospitals, spending was not significantly higher in either major or minor teaching hospitals. Urban hospitals were associated with 4.0 (95% CI, 2.3-6.8; P <.001) times higher advertising expenditures than rural hospitals. The count of Medicare enrollment in a county was positively correlated (P <.001) with 2-year advertising expenditures. The number of short-term general hospitals in a county was positively correlated with both 1-year (8.1% increase; P = .047) and 2-year (17.3% increase; P <.001) advertising expenditures.

Hospital data breaches were associated with a 64% increase in annual hospital advertising expenditures relative to control hospitals, independent of observed hospital and area characteristics, such as bed size, revenue, and number of hospitals in the county. Hospital advertising expenditures were proportional to bed size and also skewed to the right due to relatively few high spenders. The relationship between advertising expenditure and bed size was positive; as seen in the Figure, the slope was positive for bed size up to 1000, then it flattened for bed size above 1500. Larger hospitals may have more market power and, therefore, may not need to spend as much on advertising compared with hospitals in competitive markets.

The descriptive characteristics of the full sample of hospitals in Table 1 showed that the breached hospitals were more likely to be larger teaching hospitals. This is consistent with previous studies that have described breached hospitals.27,28 The risk of a data breach increases with the size of the organization, as larger organizations tend to have more points of entry that are vulnerable to attackers (ie, more health IT infrastructure and devices that could be hacked, lost, or stolen).29 Additionally, teaching hospitals serve as an environment for education and, therefore, may have more interactions among clinicians that involve patient data in that capacity.

Propensity score matching adjusted for the potential sample selection bias due to observable differences between the breached and control hospitals.20-23 The SMDs between the breached and control groups were mostly below 0.1, indicating a reasonable balance between the groups, yet the difference in mean advertising expenditures between the breached and control hospitals remained in the matched sample.

Using the matched sample, the GLM model estimated that a breached hospital spent 64% more on annual advertising expenditures than a control hospital. Similarly, a breached hospital spent 79% more on 2-year advertising expenditures than a control hospital. The estimated relationship is multiplicative, which means that the annual advertising spending of breached hospitals was 1.64 times larger (2-year spending was 1.79 times larger) relative to control hospitals, independent of hospital characteristics such as bed size. Given the negative operating margins of the hospitals in this study (Tables 1 and 2), increased advertising spending associated with a data breach may divert resources and attention away from patient care.

Market competition is likely to confound the relationship between data breaches and advertising expenditure.30 Each additional short-term general hospital in a county was associated with an 8.1% increase in annual advertising expenditures, or a 17.3% increase in 2-year advertising expenditures (Table 3).

The data breaches studied in this paper were reported from 2011 to 2014, when ransomware attacks were rare. These types of attacks on hospitals emerged in 2016 and have become a serious threat to care delivery systems.31 They are considered to be more disruptive to hospitals than the breaches considered in this study, and, thus, ransomware may be associated with even larger advertising spending.

It should be noted that the findings of this study are limited to reported data breaches that affected more than 500 individuals. Smaller breaches involving fewer than 500 individuals are not published in the HHS database; however, there is a nontrivial number of such breaches that are reported to HHS.32 Smaller breaches are not subject to reporting and remediation actions and, therefore, are less likely to draw patient attention or motivate increased advertising.

To our knowledge, this paper is the first step in studying the relationship between data breaches and hospital expenditures with empirical data. The costs associated with breaches are not readily captured in hospital financial disclosures. Subsequent to a data breach, remediation efforts and corrective actions usually take 2 to 3 years to implement.33,34 The long time span over which remediation efforts are implemented adds to the challenge of attributing the costs of a breach to quarterly or annual financial data. An effective public relations response to a data breach is likely to begin soon after the breach is disclosed to the public. The timeliness of advertising expenditure data allowed us to overcome measurement challenges.


We found that breached hospitals were associated with significantly higher advertising expenditures. Repairing the affected hospital’s image and minimizing patient loss to competitors are potential drivers of the increased spending. Regardless of the motivation, breach response adds financial burden to hospitals and the healthcare system. Advertising and the efforts to fix the damages from a data breach increase healthcare costs and may divert resources and attention away from initiatives to improve care quality. Advertising costs subsequent to a breach are another cost to the healthcare system that could be avoided with better data security.
Print | AJMC Printing...