In this post, I describe risk adjustment, an important tool in health analytics. I’ll cover what it is, how it is used, how it is done and how well it performs.
What it is
In the sport of boxing, matches are held between two boxers of the same weight class and gender. A match would not be fair to pit a featherweight (<126 lbs) and a heavyweight (>200 lbs).
Simply put, you need to compare apples with apples.
- it does not make sense to compare the clinical performance of a cardiologist to that of a primary care physician (PCP). These two doctors treat different medical conditions. Moreover, the cardiologist will mostly likely see patients who are more elderly and sicker than does the PCP.
- to compare two hospitals on clinical outcome and cost efficiency, you would need to account for the differences in the underlying patient and disease profiles.
In healthcare, risk adjustment lets you calculate the expected outcome or costs based on the disease profiles of a patient cohort, i.e. the case mix of that cohort. Equipped with a measure of case mix across different patient cohorts, you can perform apples with apples comparisons.
There are different types of risk adjustment tools, including
- Inpatient case mix adjustment – e.g. MS DRG (CMS model for hospital reimbursement)
- Patient population risk adjustment – e.g. HCC(CMS model for payer reimbursement)
- Clinical condition specific risk adjustment – e.g. a heart failure admission rate model based on logistic regressions
Let’s look at Diagnosis Related Groups (DRGs) a little more closely.
Each hospital admission has a primary diagnosis code and a primary procedure code. ICD10s number in the thousands, as do the procedure codes. The combination of both yield hundreds of thousands of possible permutations. As such, using these codes as they are, it is difficult to derive expected clinical outcomes and costs.
Also, using primary diagnoses and procedures alone do not account for the complexity of treating patients with comorbidities. E.g. a baby delivery for a healthy mother is less complex and less costly than for a mother with diabetes (a comorbidity) that developed major hemorrhage/bleeding (a complication). If we were to compare the birth delivery costs at two hospitals, we would need to adjust for the differences in complexity of the births that take place at these two hospitals.
DRGs are used to group hospital admissions into similar groups, taking into account the primary diagnoses, procedure and relevant comorbidities and complications. Instead of viewing hospitalizations as hundreds of thousands of distinct types, DRGs groups hospitalizations into approx. one thousand groups.
Table below shows the childbirth related MS DRGs:
|Surgical Medical||MDC||DRG||Description||Expected Cost||CMI|
|Surgical||14||765||CESAREAN SECTION With Complications Comorbidities/MCC||$11,358||1.1358|
|Surgical||14||766||CESAREAN SECTION Without CC/MCC||$8,100||0.81|
|Medical||14||774||VAGINAL DELIVERY With CC||$7,962||0.7962|
|Medical||14||775||VAGINAL DELIVERY Without CC||$6,094||0.6094|
- MDC – Major Disease Category: 14 = OBGYN related
- DRG – diagnosis related group: there are just under 1000 MS DRGs. The first 2 digits tell us what the admission was for and the last tells the level of complexity in terms of CCs (comorbidities and complications). DRG765 for example would consist of all cases that had ~100 cesarean related ICD10s as the primary diagnosis.
- CMI – case mix index is the relative resource intensity of each DRG. Higher means more resources/costs.
How it is used
Each DRG has an expected level of case intensity, or case mix index. A cesarean section with CCs or major CCs (DRG 765) has a case mix of 1.1358, which is nearly twice the case mix a vaginal delivery without CCs (DRG 775) 0.6094. This means the former is expected to require twice as much resources during the hospitalization and cost about twice as much.
Let’s say I have 2 hospitals’ costs for birth related admissions. Each of the birth related admissions at either hospital would be attached one of the birth DRGs. The CMI is thus the average across all admissions.
|Hospital||Admissions||Unadjusted average cost per birth||CMI||Risk adjusted average cost per birth|
Hospital A costs $1000 more than Hospital B on average. Without risk adjustment you might conclude Hospital B is more cost efficient than Hospital A.
BUT you would be wrong…
Hospital A does mostly complex cesareans and has a CMI of 1.20 while Hospital B does mostly simple vaginal deliveries and has a 0.85 CMI. After dividing the average costs by the CMIs, Hospital A now costs $2200 less than does Hospital B.
After risk adjustment, your conclusion has reversed! This is the reason why using risk adjustment to enable apples with apples comparison is very important.
How they are created
The goal of risk adjustment is to group large numbers of individual encounters or patients into clinically and statistically similar groups. Typically, there is a clinical hierarchical structure that leads such groupings and then statistical techniques are used to finetune the composition of individual groups. Based on the logic, a complex algorithm is then created. When claims are processed through the algorithm, the algorithm identifies which specific group an encounter belongs to.
- MS DRG algo looks for significant procedures first, allocating a hospital admission to surgical or medical.
- The algo then looks at the primary diagnosis to allocate the case to one MDC, say 14 (OBGYN related).
- Next the algo identifies which MDC14 DRG the case belongs to, say 76 a cesarean section.
- The aglo then identifies whether the case had CCs, and puts the case into say 765, a cesarean section with CCs.
The exact process of creating these tools is beyond the scope here. Let me know if you’re interested in learning more on this.
How well do they perform
These algos try to do a lot, simplifying complex medical treatments and conditions into a finite number of groups. However, there is a trade-off between easier interpretation and statistical fit. The more aggregating is done, the poorer the statistical performance is typically achieved in these algos.
There are lots of other goodness of fit measures. Let’s look at R-squared, which measures the % data variation an algorithm can explain away.
|Algo type||Use case||R-Squared|
|Adjusts hospital admissions||50-65%|
|HCCs||Adjusts at patient level, used to adjust health plan premiums||15-30%|
|Clinical condition specific predictive model||Predicting clinical outcomes usually||Varies|
If you think about the margin of error these R-squareds imply, you might think these algos are hardly impressive. Yes, these algos are far from perfect.
They are being continuously refined. Moreover, comparisons done on relatively large patient cohorts are less affected by the model errors and are usually considered acceptable in practice (whether for convenience or for lack of better risk adjustment tools). It is more important to be aware of systematic biases of a specific risk adjustment model, e.g. particular weakness when applied to data for children or rare conditions.
Subscribe to see future posts similar to this one.