This is No.2 of a series of blogs I’m writing on pharmaceutical analytics. Future pieces will cover cost structures, analytic metrics, drug plan benefit management and predictive analytics. Please subscribe to receive those.
Pharmaceutical/Drug data are used for prescription, dispensing, billing, utilization management etc. Almost all activities related to pharmaceuticals post launch utilize drug data extensively (at least in more developed markets around the world). Thus, it’s useful to learn what drug data are out there, what they look like and how to use them.
National Drug Codes (NDC)
These are the most common forms of drug data. Enormous amounts of these data are generated daily in the US (and beyond) to aid the prescription, dispensing, billing of prescription drugs. I previously wrote about this.
NDCs are managed by the FDA in the US. These are numerical drug IDs that can exist in 4-5-2, 5-4-2 (and other variants of these). The first section indicates manufacturer, second the drug, last 2 the dosage/pack size.

There are hundreds of thousands of these. As you can imagine, it’s difficult to analyse this volume of highly fragmented of data. To make things more complicated, new codes are being added all the time. Old NDC codes for drugs expired CAN be reused for future new and unrelated drugs…
Anatomical Therapeutic Chemical (ATC)
To add more structure to these codes, WHO created the Anatomical Therapeutic Chemical ATC classification. These are much more hierarchical, which makes analysis easier.
For example, C10AA05 atorvastatin.
- C indicates cardiovascular system (N – nervous system, R – respiratory)
- C10 lipid modifying agent
- C10A lipid modifying agent, plain
- C10AA HMG CoA reductase inhibitors
- C10AA05 atorvastatin 20mg
Generic Product Identifier (GPI)
These are also highly structured, hierarchical classifications that aim to aggregate individual NDC codes. There are 7 possible layers of these. Medi-Span owns this schema.
GPI for atorvastatin = 3940001010
RxNorm
RxNorms are a standardized taxonomy produced by the National Library of Medicine. The intention is to have a common taxonomy upon which different users can apply different sets of names. e.g. a hospital may create their own internal drug codes/names, to enable linkage to external sources, RxNorms can be mapped to each of the homegrown codes, thus enabling comparison to external sources, such as NDC or ATC codes.
The CUI (concept unique identifier) for atorvastatin is C0286651
There are other such classification structures, including GC3 and GCN. More on those on another day…
Homegrown codes/names
Hospitals, provider networks can and do create their own internal drug codes and names. To enable comparison to external benchmarks/sources these entities usually map their own codes to some common taxonomy, such as RxNorm or ATC. The large EHR systems tend to stick to widely available coding schemas. But this really is a function feature that any provider group can customize as they wish.
This stuff is complex!
Beware when you do analysis for a new client or on a new set of data. Factor in time to allow the creation of cross walks and cleaning up of the data. Always, consult a pharmacist who is familiar with drug coding when you analyze drug data for the first time.
Future pieces will cover cost structures, analytic metrics, drug plan benefit management and predictive analytics. Please subscribe to receive those.
One thought on “Pharmaceutical Analytics – 2”