2015: The State of Big Data for Patient Recruitment, Part 2

by Dan McDonald

Focus on: Study Feasibility, Site Identification, and Patient Recruitment
A 3-Part Series

The shift of the clinical research industry into a more data-driven type of approach towards the planning and execution of patient enrollment strategies has begun to show signs of broader acceptance and dare-I-say, maturation. Companies like Optum (UnitedHealth Group), IMS Health, HealthCore (Anthem), Truven Health Analytics and newcomers like DarkMatter2bd are all offering some variation of a solution. These companies offer – on the low end – some segment of health insurance claims data queried using ICD-9 codes, to – on the high end – a broad, cross-spectrum, layer-capable, databank of health insurance claims, electronic medical records, physician data, clinical investigator data and more.

I have been using and selling data resources like this for the purposes of accelerating clinical trials for more than a decade.  Typically, once sponsors see beyond the price tag for such services, they quickly realize the tremendous value that can be achieved through adoption – from both a direct out-of-pocket expense savings standpoint and through opportunity cost savings.  Faster trials means less days of operating costs and hopefully, more days of the product on the market and generating sales.

In my last blog post on big data, I shared a few of the different ways the industry is leveraging health insurance claims and electronic medical records to be more scientific and targeted. Specifically, we talked about the use of this data for protocol-feasibility and for geo-targeting. Now, we will highlight another way that this data is proving useful.

Site Identification:

What if the site feasibility process could go beyond the use of faxed, emailed or web survey templates? What if sponsors and CROs weren’t completely reliant on a sites honesty and or accuracy when it comes to their relevant experience and volume of the target patient seen at that site?

This is the thought process behind the use of big data for site identification. Essentially, it allows for the validation of the patient counts provided directly by the sites. Instead of relying only on the counts provided in site feasibility responses, we can now add another column of data – No. of Claims / Patients. If the two match up, you’re in pretty good shape. If the claims number is far below what the site has stated, then it’s time for a call to the site and a frank talk about their true volume of applicable patients. More often than not, sites are not intentionally misleading sponsors. Instead, they are too busy to do an actual count from their database, or to conduct the chart reviews necessary to develop a true count.

[pullquote]The benefit only becomes apparent when you begin to layer various types of data.”[/pullquote]

Ultimately, the devil is in the details when it comes to the industry’s ability to extract value from the use of big data from a site identification standpoint. The benefit only becomes apparent when you begin to layer various types of data. For example, insurance claims data will help you understand which physicians are actually seeing –and filing claims for – the patient targeted in your study protocol.  It will also provide you with contact information for that physician, as well as details about his area of specialty, hospital affiliation and more. It’s not going to tell you whether he’s a clinical investigator or anything about his experience with, or ability to conduct, clinical trials. This is where data layering becomes important.

When considering potential big-data vendors for site identification support, you want to make sure they not only have physician data and claims data, but that they also have investigator data.  A good data aggregator will be able to provide you with all of the above.  Here is a partial list of some of the key data points to be aggregated and sorted in your search for an investigator with the highest probability of enrolling the patient outlined in our protocol.

  • Full contact information
  • Hospital affiliation
  • Languages spoken
  • Group practice name
  • Birth date
  • Degree(s)
  • Medical school
  • Residency
  • Number of matching claims/patients
  • Date of last trial conducted
  • Trial count, last 5 years
  • FDA reports/FDA Adverse actions
  • ID Numbers: TIN – Taxpayer Identification Number, MLN – Medical License Number, DIA – Drug Enforcement Administration, NPI and UPIN from Medicare (National Provider Identifier, Unique Physician Identification Number)

When dropped into a spreadsheet, it becomes easy to filter and sort these data points to begin ranking investigators by the attributes you value most.  For example:

1. Area of Specialty (Does he have the expertise we need?)

2. Hospital Affiliation (Is this a site we trust?)

3. Volume of Matching Patients (Is he seeing the right patient?)

4. Date of Last Trial Conducted (How recent is his experience?)

5. Trial Count / Last Five Years (Does he have strong experience in clinical research?)

6. FDA Reports (Is he in good standing with the FDA?)


According to Optum Clinformatics, clinical investigators who are:

1. Seeing the highest volume of the target patient and

2. Have the highest amount of relevant experience;

Have the best probability of successfully enrolling patients in your study.


Next time we’ll cover how big data is being used to support various patient recruitment strategies. If you missed part 1 of this series, you can find it here.

You may also like

Leave a Comment