Automating Healthcare Fraud Detection

Fraud costs the US health system a lot. As do wasteful spending and abuse of health services. In many organizations the process to detecting these are ad hoc, opportunistic, relying on tip offs, hunches and the curiosity and perseverance of investigators. While these are methods can be effective in isolated cases, they are manual, time consuming, and most importantly, likely missing many opportunities.

An automated fraud detection framework would enable more comprehensive and earlier detection, leading to proactive fraud prevention.

Such a framework would contain 6 components:

1 Data ingest engine

Collate multiple datasets, perform necessary normalization, transforms, and other ETL processes. Automate as many of these steps as possible.

Build in checks for data completeness, to reduce as much of the noise in data as possible.

2 Pattern structuring

This is the key step in the fraud detection pipeline. Construct patterns from data to enable outlier detection. These patterns should be built along different layers of the data structure, to enable detection of trends that lead to outliers.

The layers of data may include:

Line level – you can detect for example duplicate lines, patterns that indicate claims are being submitted deliberately to fool your claims adjudication engine.
Procedure, procedure cluster, drug level – identifying fast changing trends in claims for individual procedures or clusters of procedures is probably where you will find the most fraud prevention opportunities.
Provider level – wherever you see major upticks in claims by individual provider or groups of providers, that should raise alarm bells, justifying a deeper analysis.
Patient level – while patient level claims will almost always be sporadic, you can gauge fraud behavior like pill shopping only if your data was aggregated at the patient level.
Temporal – in all the above patterns, you will need temporal segments, e.g. quarterly or half yearly, to derive the trends. Beware of seasonality when comparing intra year.

Metrics: in these layers above, create metrics that help you identify sources of uptick. Here is a formula that helps you see the picture.

Total cost = Providers * patients/provider * claims/patient * $/claim

If total cost increased 80%, understanding the sizes of the components that make up the 80% helps you identify the true driver of the uptick, e.g. more patients per provider. This would focus your investigation on the key driver, leading to faster fraud discovery.

Rule based – build out fraud patterns already known to you. These could be opioid abusers, unbundling of procedure codes etc. Ideally your platform should allow non-analysts the ability to easily create these rules-based patterns so they can test their intuition independently and quickly.

Runtime considerations – Doing the above can be computationally intensive. Yes you can throw more money to AWS/Azure to speed things up, but exercising discipline in your pipeline design is good practice. To speed up the ETL process, automate as much as possible and aggregate as early as possible, so you’re not having to shuffle line level data around too many times. You can partition your data along known separations, e.g. geography, product line. This will speed up your number crunching. You can also lessen the frequency of very time-consuming analyses.

SUBSCRIBE to my newsletter so you don’t miss out future posts.

3 Fraud case finder

With the above patterns structured, finding outliers can be viewed 2 ways, cross sectional or longitudinally.

Cross sectional refers to finding outliers across providers, procedures, patients within one time frame.
Longitudinal refers to inspecting the trend over time and finding anomalies, points that uptick well out of the “normal” boundaries of variability.

What constitutes outlier will depend on your tolerance of false positives, bandwidth of your investigators and your general confidence of the quality of the data. Keeping it simple is a good idea, e.g. deciding a 50% increase or top 10 percentile are abnormal, often does as well as using complex statistical techniques, especially considering the time and resources saved. There are times when more sophisticated techniques are needed, e.g. cluster techniques, when more than 2 factors are involved in identifying the anomalies.

A collection of transgressions above the norm should be prioritized financially and extracted for next stage of deeper analysis.

4 Opportunity packaging

Once these possible fraud/abuse cases are collected, e.g. medical doctor tripling his income, additional information should be packaged to enable subsequent investigations to take place more efficiently. E.g. you should include payment summaries over the preceding 3 years, his credentials, and summary of medical utilization he generates but does not get paid for directly, such as referrals.

Abstracting above one specific outlier identified this way paints a 360 view of the situation. You often find the fraudsters are conducting more than one fraud activity.

The packaged information should than be passed, confidentially to investigators.

5 Manual investigation

These investigators would analyze the situation, identify concrete examples of fraud on paper/in data, and then proceed with subsequent steps of investigation, including calling patients, requesting additional documentation and even working with law enforcement to catch the fraudsters with sting operations.

There are often iterations through steps 3-5. Key thing is to retain clear records and methodologies and speak with non-analytical colleagues in the language they prefer. While data ingest should structure data as well as possible, using simple anomaly detection approaches often is preferred here also. You need to iterate fast and explain your method to non-statisticians. In healthcare, there are many many fraud savings opportunities, simple most often does it.

6 Operationalizing

Once you have confirmed the fraud, you have a number of options to operationalize the fraud prevention, including changes to your claim adjudication engine rules, denying claims from fraudsters, initiate remedial dialogue/action with provider, pursuing legal action against fraudsters etc.

Keep an open mind on the solution. Sometimes things don’t happen as you plan, that’s OK. Generally speaking, fraud prevention takes far less time to realize savings AND there are lots of opportunities. So if one does not work out, move onto the next.

Thanks for reading. SUBSCRIBE to my newsletter so you don’t miss out future posts.

Automating Healthcare Fraud Detection

1 Data ingest engine

2 Pattern structuring

3 Fraud case finder

4 Opportunity packaging

5 Manual investigation

6 Operationalizing

Published by Ed

Leave a comment Cancel reply

1 Data ingest engine

2 Pattern structuring

3 Fraud case finder

4 Opportunity packaging

5 Manual investigation

6 Operationalizing

Share this:

Published by Ed

Leave a comment Cancel reply