Types of healthcare data – Diagnoses/ICDs

In my previous post, I described the common sources of healthcare data.

Here, I describe the common types of healthcare data you will come across, namely, diagnoses, procedures, demographic, drug, laboratory result data, clinical notes and financial data.


Diagnoses record the medical problem that a patient has, e.g. diabetes, headache, cancer. They explain why the patient is ill. The most widely used diagnoses codes are International Classification of Diseases 10 (ICD10). Its predecessor ICD9 has been phased on in most countries. The latest version ICD11 is available but not widely adopted yet. ICDs are created and maintained by the World Health Organization. These codes have a hierarchical structure that captures a large amount of clinical information

E.g. E11 is the code for Type 2 Diabetes Mellitus. The leading E indicates endocrine system, while the 11 specifies type 2 diabetes. E11.62 is the code for type 2 diabetes with skin complications. E11.621 is the code for type 2 diabetes with foot ulcer. You can see that the ICD10 code contains a hierarchical structure with increasing levels of specificity/detail.

ICD9 is the predecessor of ICD10 and have largely been phased out worldwide. However, some medical facilities in countries like the US had been using ICD9 up until 2016 so you will likely see ICD9 in historical data. 250 indicates diabetes mellitus. 250.72 indicates diabetes with peripheral circulatory disorders, type 2 or unspecified type, uncontrolled. As you can see, ICD10 has a more structured and detailed information system compared to ICD9. While crosswalks are available between the two, beware that there are many codes that do not map perfectly from ICD9 to ICD10.

In EHR as well as claims data, you will likely see a primary diagnosis (main medical problem) as well as secondary diagnoses (other medical issues, such as complications and co-morbid conditions). More on the structure of healthcare data in a future post…

I’ve compiled ICD10, ICD9 codes in this file that you can reference while learning how to analyze healthcare data.

I’ll describe procedure codes in the next post.

