Some of you may remember the 1998 movie The Horse Whisperer. In it, Robert Redford’s character, who had a remarkable talent to understand horses, was able to help a horse and its owner overcome fears and recover from a tragic accident. It’s a lovely slow paced movie, well worth enjoying a popcorn over 🙂
In some ways, getting health data to cooperate and reveal its secrets and hidden gems also requires the ability to understand the idiosyncrasies and features of health data. In addition to the sources, types and issues of health data, knowing the underlying systems that influence how the data was created gives you new insights and analysis methods.
Most health care data contain a hierarchical structure, where increasing levels of detail/specificity are recorded in the codes.
e.g. F11.10: Opioid abuse, uncomplicated
F11: Opioid related disorders
F10-19: Mental and behavioral disorders due to psychoactive substance use
F: Mental, Behavioral and Neurodevelopmental disorders
This kind of structure makes it easier to understand the data. e.g. anything starting with an F indicates mental disorders.
Sometimes the lowest level of codes are either not available or not accurate, so the ability to “roll” up to a higher level, aggregating over more data points, enables you to reduce the impact of individual errors and increase the predictive power of your input data.
Often you will need to make your own coding hierarchy, if you have a specific concept you’re modeling. E.g. if you want to know prevalence of uncomplicated opioids and alcohol abuse, you combine F10.10 with F11.10.
Medical conditions follow a path of development, usually starting from relatively simple conditions that are left untreated, which then worsen over time as various organs stress to respond and ultimately start to fail, e.g. in obesity, to diabetes, to kidney failure.
Analyses done along these paths enable finding insights that anticipate adverse developments, which are easier to understand for doctors AND give them a way to intervene.
A reminder, the data you typically receive will look nothing as clean as the path I described above, coming in as say messy lines level transactions. You the analyst would have to create markers/features of such etiologies as part of the analysis.
Medical events occur over time, thus the sequence can be informative. If the sequencing and patient pool is done well, you often can infer direction of causality, e.g. typically done in clinical trials.
If you have sufficiently large, complete and well-structured data, you can create time series models/analyses that enable identification of optimal intervention points. E.g. how long does a patient need to take opioids to develop dependence?
Temporal dimensions are however harder to handle, as health data can be sporadic and sparse over time especially if you focus on specific diseases.
Demand vs Supply
This economics concept drives a lot of incentives in healthcare. Simply, demand can be thought of as patients needing care while supply are doctors, hospitals, drug companies etc providing care.
This bifurcation allows you to think about what drove what interventions and spending. E.g. if you compare 2 doctors, knowing that they both treated diabetes patients only means you can compare their cost and outcome on a relatively like with like basis.
However, in practice, these two forces are not as clear cut. Supply-induced demand is where doctors/hospitals generate the need for their services where such demand was absent beforehand. E.g. you have all seen billboards for plastic surgery on the highway. In these instances, the analysis would have to account for the proportion of demand that is likely supply induced before comparing doctors.
That gives you a flavour of what being a health data whisperer is like. Best of luck taming that healthcare beast you’re working with… 🙂
Thanks for reading. Please Subscribe.
Let me know if you have specific topics you’d like to learn more about.