Blog banner

The Truven Health Blog

The latest healthcare topics from a trusted, proven, and unbiased source.

Truven Health Risk Model Ranks High in Actuarial Evaluation

By John Azzolini/Monday, November 28, 2016


Healthcare payers today are facing the complexities of reform, increased competition, and budget constraints — all while dealing with pressures to reduce costs and improve member health. Managing health risk has become a necessity. But to manage risk, payers must first understand their population. To do this well, they need reliable, robust risk and cost of care models.


Last month, the Society of Actuaries (SOA) released a study showing that Truven Health Analytics’ cost of care model outperformed other risk models in 18 out of 22 measures. SOA’s Accuracy of Claims-Based Risk Scoring Models compared health risk-scoring models, building on their previous studies with similar objectives (the most recent was in 2007). In the medical claims category (predictions based only on medical claims data), the current study showed that, in 21 of the 22 measures, the Truven Health model was ranked either first or second. No other model came close to matching this performance. (See Table 1 for a summary of how Truven Health’s model ranked relative to the competition).


How the SOA Evaluates Risk Models

The SOA evaluated Truven Health Analytics’ cost of care model against six others:


  • ACG® System
  • Chronic Illness & Disability Payment System and MedicaidRx
  • DxCG Intelligence
  • Milliman Advanced Risk Adjusters
  • Wakely Risk Assessment Model


The SOA assessed all models on their ability to predict costs using the Truven Health Marketscan® commercial claims dataset of 1 million members, and used three methodologies to evaluate their precision: R-Squared, the mean absolute error statistics, and predictive ratios. All three methodologies measure the statistical difference between the prediction and the actual results. All models produced both a concurrent and prospective cost prediction and were evaluated using both a capped data set (where patient costs were capped at $250,000) and a non-capped data set.


The SOA evaluated the models’ predictive ability using a number of scenarios (total medical costs, simulated random groups, condition-specific predictions, patient cost). In the simulated random group scenario, the SOA created groups of 1,000 and 10,000 patients to simulate the application of the model to subgroups of the population.


Table 1: How the Truven Health Cost of Care Model Performed

The Truven Health model ranked first or second for its ability to predict costs in 21 of the 22 measures studied.



Truven Health Model Ranking*


Mean Absolute Error





Total Medical Costs, Concurrent





Total Medical Costs, Prospective





Simulated Random Groups, Concurrent





Simulated Random Groups, Prospective







Predictive Ratios



Overall Condition Specific Prediction, Concurrent





Overall Condition Specific Prediction, Prospective





Very Low Cost Patients, Concurrent





Very Low Cost Patients, Prospective





Very High Cost Patients, Concurrent





Very High Cost Patients, Prospective





     * Compared with six other models.

** Capped at $250,000


Why Risk Models Are Important to Payers

Risk modeling is a very helpful tool for health plans and employers. It can provide valuable insights into member utilization patterns and risk– vital for benefit planning, disease management and wellness program management, and member communications. It can provide deep insights into provider performance, and aid in determining ideal reimbursement and premium rates. Such models are an integral part of a number of Truven Health databases and analytical tools. The SOA evaluation speaks to the high quality and reliability of the Truven Health solutions.

John Azzolini
Senior Consulting Scientist

How a Data Scientist Thinks about Risk Stratification

By Anne Fischer/Tuesday, October 25, 2016

“Risk”. It’s a word we hear every day in the healthcare industry. We want to avoid risk, we want to predict risk, we want to find patients that are high risk. We want to risk stratify populations (organize people into a set number of mutually exclusive tiers of increasing risk).

My recent blog posts have centered around the concept of Population Health. Clearly the idea of risk is particularly important in this world, where the goals are to keep well individuals healthy, avoid poor outcomes for those that are already sick, and minimize costs. Understanding, assessing, and predicting risk are all essential to this effort.

But what is “risk”? If you asked a physician, an insurer, and an average Joe on the street to describe “high risk” from a healthcare perspective, you would likely get very different answers. A physician might describe someone with high risk of developing a disease, high risk of a serious disease complication, or high risk of mortality. An insurer might describe someone at risk for a high amount of spending in the immediate future. The average Joe might describe someone at high risk for impairment/inability to function in daily life. Understanding the context-appropriate definition of risk is the first step toward building analytics to support risk analysis. And the appropriate definition is always dependent on the real world application.

Even when the application is understood, there is still considerable work to be done to identify the appropriate data and characteristics that lead to poor outcomes. Consider a discharge nurse who sees hundreds of patients a month as they prepare to depart from the hospital. Most knowledgeable hospital staff are aware that the most experienced discharge nurses will be able to tell you, with a high degree of accuracy, who is likely to show up back in the hospital in the near future. Multiple studies have tried to quantify the drivers of this type of “nurse’s intuition”. How do they know?

In 1964, United States Supreme Court Justice Potter Stewart used the now infamous phrase: “I know it when I see it” to describe his threshold test for obscenity in the case of Jacobellis v. Ohio. A discharge nurse might say much the same thing when asked to describe a patient at high risk for readmission. I know it when I see it. Characteristics such as illness burden, past behavior, social situation, self-care ability, home support, and others are often referred to, but the reality is that it’s the entire picture, and often a bit of an ambiguous “gut feeling” thrown in for good measure.

So how does Data Science fit into this picture? Our challenge as Data Scientists is to turn “I know it when I see it” into a measurable mathematical formula, so that everyone “knows it” even without seeing it in person. It involves extensive experimentation with different data sources, variables, and modeling techniques, as well as building in the capability for models to evolve and learn over time. At Truven Health Analytics, my team is exclusively focused on developing and testing new models, using various kinds of data that are readily available to us. In future blogs, we’ll describe some of these models including risk of developing diabetes and risk of admission. Truven Health, an IBM Company, now is positioned to move deeply into this space and develop these types of risk models by bringing together traditionally disparate data sources, clinical knowledge, and cutting edge modeling techniques.

Anne Fischer
Senior Director, Advanced Analytics

Why Implement Data Mining in the Medicaid Fraud Control Unit?

By David Hart/Monday, October 24, 2016


In 2013, the U.S. Department of Health & Human Services (HHS) Office of Inspector General (OIG) promulgated a rule enabling Medicaid Fraud Control Units (MFCUs) to receive federal funding for their Medicaid fraud data mining efforts provided certain requirements were met.  The rule adds to the toolset that MFCUs have at their disposal for fighting Medicaid fraud and abuse.  But most MFCUs have been reluctant to begin data mining efforts due to concerns over increasing caseloads and expanding federal reporting obligations.

Many MFCUs are now evaluating whether to implement data mining programs. There are several potential benefits to consider for MFCUs contemplating data mining.

Potential benefits:

  • Identify potentially fraudulent providers not previously suspected
  • Identify types of fraud that cannot be systematically detected without data mining
  • Identify additional schemes for a provider already under investigation
  • Prioritize the best cases and reduce wasted time
  • Improve case presentation
  • Ensure a balanced case mix and review of all providers
  • Increase return on investment (ROI)

MFCUs can increase their identification of fraudulent providers, better prioritize cases, and effectively support prosecution efforts by applying best practices, critical success factors, and innovations to data mining efforts. 

Questions to think about when considering data mining:

  • Doesn’t the state Medicaid agency already do this work for the MFCU?
  • Will data mining increase MFCU workload?
  • Will there be evidentiary challenges?
  • Is the cost of data mining prohibitive?
  • Will the application and reporting process be burdensome?

For more than 35 years, Truven Health has used data mining to help clients uncover possible fraud, waste, and abuse, and some of our experience is summarized in a new white paper. To learn more about strategies for implementing successful data mining in a MFCU and guidance to the benefits and questions reviewed in this blog, click here.

David Hart, JD
Vice President, Client Services, State Government

MarketScan Trends in Spending for Mental Health and Substance Abuse Care

By Truven Staff/Friday, September 30, 2016


The most recent MarketScan® infographic leverages data from our MarketScan Research Databases to give an overview of MHSA trends for 2014 to 2015. We found that between 2014 and 2015:

●     Mental health and substance abuse costs increased 10%

●     Total substance abuse spending increased more quickly, rising 24.8% on a per member per month allowed amount basis

●     Services for substance abuse outpatient treatment (non-office) increased 22%

●     Inpatient substance abuse days per 1,000 admissions increased by 13%

 Find out more about MHSA trends here.

What Data Will Be Available for Population Health Analytics?

Key Questions that Data Scientists Will Ask

By Anne Fischer/Thursday, September 15, 2016

This is the third in a series of three blogs that present key questions that must be answered before developing an analytic to support the business needs of Population Health Management (PHM) stakeholders or players, including health systems, practitioners, insurance companies, employers and government agencies.

The players agree they need cutting-edge analytics to make sense of their population, and the simplest definition of Population Health Management (PHM) that seems to be accepted by all the players is: Meeting the healthcare needs of a defined population of individuals, from the healthiest to the highest risk, with the right programs at the right time to ensure the best outcomes possible. On Tuesday I described the first question, who is the population that is to be managed; on Wednesday we turned to the “so what” question, what services can be offered to facilitate the management of the population.

The third important question is what data will be available on which to build the analytics?

Commonly utilized data sources for healthcare analytics include:

  • Information created for administrative purposes (administrative data)
  • Administrative data specifically created for reimbursement (claims data)
  • Information recorded to facilitate the process of delivering care (clinical data)
  • Self-reported information, such as survey data
  • Socio-economic data, either public or privately gathered
  • Device-generated data

Two aspects of this topic are important: what data will available to build the analytic and what data will the player have ongoing access to when applying the analytic?  In the ideal world of analytics development, each method is built using a comprehensive and representative data sample. In other words, the data should have a longitudinal view into a population’s healthcare experience using various data inputs, including administrative and EHR sourced content in addition to socioeconomic details; and, it should be inclusive of all types of individuals so that it is not biased toward certain demographics.

Answering questions about a population becomes more difficult when you don’t have all of the population’s information and need to infer certain aspects. Typically, the health systems or practitioners don’t have a comprehensive view of their patient population, but “they don’t know what they don’t know”.  On the other hand, typically, the insurers or employers do not have access to the clinical richness that lives within the medical records. And while many parties are optimistic about the value of socio-economic data, the process of obtaining that data and merging it into other data sources is not insignificant.

In summary, although on the surface it may appear that the same analytic solutions are desired by all the players, it’s highly unlikely that everyone can use precisely the same analytics due to different answers to three key questions: who is the population, what services can realistically be offered, and what data will be available. The job of Truven Health therefore becomes one of designing analytics that are specific to particular use cases, but with as much flexibility as possible to allow for applicability in various business and data situations. In later posts, I’ll discuss the various types of analytics that can be created once these three key questions are answered, along with some of the specific new analytics Truven Health is developing. 

Here are links to the two prior blogs on this topic: 

Anne Fischer
Senior Director, Advanced Analytics