The common sources of healthcare data are: EHR, Insurance Claims, Research, Public Health, User Generated data.
Electronic Health Records: hospitals and doctors keep records of each patient visit through their electronic health records (EHR) systems. EPIC, Cerner are by far the largest vendors in the US. EHRs document both the clinical information, such as diagnoses and procedures, as well as health facility workflow related information, such as appointment times. EHRs can also contain clinical notes, radiology images, laboratory results. As such, EHRs are often the richest sources of clinical information.
Common data standards such as HL7 have made inter-operability across different EHRs easier over recent years. However, much EHR data still remains in silos across different organizations.
Insurance Claims: healthcare providers submit claims to and receive payment from insurance companies. Claims data is transaction based, in that they facilitate payment, which usually has a member ID claim ID, service date, high level clinical information and payment amounts.
Claims data are widely used in policy decision making, largely because of it’s ready linkage to $$$… EHRs usually contain far less financial information. Both are also protected in the US by HIPAA, so you should not find patient level information publicly.
Research: medical universities and pharmaceutical companies spent lots of resources conducting clinical research. The data generate tend to be scientifically rich, and specific to each study that makes aggregation across different studies/institutions difficult. But nevertheless, published research results are highly valuable and can often be the only sources for some data points.
Public health: government agencies, such as the CDC, Centers for Medicare and Medicaid, Medical Expenditure Panel Survey, gather public health information that span entire cities or regions. This type of data is usually obtained through regular data submission by healthcare facilities and population surveys. These are very useful sources of epidemiology data, such as disease prevalence, mortality rates.
User generated: the plethora of wearables and rise of “the empowered patient” means a wealth of new personal health metrics are being captured. You heart rate, blood pressure, pulse rate, temperature are all valuable pieces of information that could inform healthcare interventions.