Types of healthcare data – Drug codes

This is the third post in a series in which I describe the common types of healthcare data you will come across, namely, diagnoses, procedures, demographic, drug, laboratory result data, clinical notes and financial data. In previous posts, I described diagnosis and procedure codes. Here, I describe drug codes.

Drug codes primarily record clinical/dosage information of the drug taken by the patient and to facilitate billing related activities. For brevity, I’ll focus only on outpatient drug coding in the US (inpatient drugs have additional variations/complexities).

I’ll be posting specific use cases of these codes in future posts. Subscribe so you don’t miss out.

NDC (National Drug Code)

National Drug Codes are 10 or 11 digit codes that record exactly what a drug is, how it is administered, what the dosage form is and what the pack size is.

These codes are critical to enabling care delivery and payment, from the moment the doctor prescribes, to when the pharmacy fills the script, to when the insurance company adjudicate claims according to drug formularies then pays for the pills, to the pharmaceutical companies later dealing rebates etc. Without NDCs, the drug industry could not function.

lipitor NDCLet’s take for example Lipitor (Atorvastatin), Pfizer’s block buster drug used to treat hyperlipidemia, too much bad fat/lipids in the blood, E78.5 in ICD10 (see my previous post on ICDs).

The first 4 digits, the labeler, is the manufacturer, in this case Pfizer. The next 4 digits is the product code, Atorvastatin Oral Tablets of strength 10 mg each. The last 2 digits tell us the pack size, of 90. You can see that a lot of information is contained in this one code. NDCs can be found on all prescriptions/bottle labels.

You can search specific drugs here. The FDA is the custodian of NDCs. New drugs get approved and issued these NDC codes. Some old/expired codes find their way back into circulation, so you need to be careful with effective dates when analyzing drug data using NDCs.

NDC10 – NDC11The above 10 digit 4-4-2 format can have other deviations (yes healthcare data is messy…). You may see 11 digit versions like 00071-0155-23 or 0071-00155-23. Those could all mean the same item. So you have to be extra careful when trying to map these NDCs to the reference tables.

There are tens of thousands of these codes, in the purest form, the NDCs with the 4-4-2 or 5-4-2 structure can only allow you to analyze at a labeler level consistently, i.e. using the initial 4 or 5 digits. However, the clinical content is not obvious form the next 4-2 digits. There are other categories that facilitate clinically meaningful analysis, including Anatomical Therapeutic Chemical Classification (ATC) and Generic Product Identifier (GPI) for example.

ATC (Anatomical Therapeutic Chemical)

ATCs are maintained by the WHO, you can search here. These are hierarchically structured category codes. For example, C10AA05 atorvastatin.

  • C indicates cardiovascular system (N – nervous system, R – respiratory)
  • C10 lipid modifying agent
  • C10A lipid modifying agent, plain
  • C10AA HMG CoA reductase inhibitors
  • C10AA05 atorvastatin 20mg

As you can see these contain an enormous amount of structured clinical content. You can go to town with such rich data in analytics! Subscribe to receive future posts in which I will illustrate ways to use such data.

GPI (Generic Product Identifier)

GPIs are 14 digit hierarchical category identifiers.liptor gpi

The GPI for a atorvastatin 10mg is 39-40-00-10-10-03-10. See illustration from Medi-Span on the right.

These also provide a wealth of information on drugs. The preference for either is more a matter of familiarity. In practice, you will likely see either depending who your drug data vendor is. In the US, you might see GPI more than ATC.


Between pharmaceutical companies, pharmacy benefit management companies, insurance companies, the flow of funds related to drugs can be a convoluted, confusing, contrived pool of murkiness.

Aside from the retail price, you can have the usual insurance copays, coinsurance, which depending on your insurance company’s drug formulary, can be a few $ to hundreds if not thousands per year. There are also a lot of different kinds of pricing, like list price, average wholesale price, rebates, etc. This stuff is complex.

I hope for those of you who are not familiar with the drug industry and its pricing practice, this shed some light on the matter.

Word of Caution

I have always worked with pharmacists who know this stuff in depth. As always, quantitative folks tend to prefer the black and white world of numbers while in medicine and practice, these seemingly hierarchical, clear cut categories can still have a lot of overlaps and muddiness. So proceed with due caution, and consult a pharmacist who knows drug coding well.

Let me know if you have questions here.

I’ve compiled ICD10, ICD9, CPT/HCPCS codes in this file that you can reference while learning how to analyze healthcare data.

I’ll describe laboratory/LOINC codes in the next post.

10 thoughts on “Types of healthcare data – Drug codes

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s