EMA Policy 70 requires sponsors to publish anonymized versions of clinical study reports (CSRs) and submission documents submitted in support of a marketing authorization application (MAA) after January 1, 2015. Sponsors must prevent re-identification of people named or referred to in those documents.
In March 2016, the detailed guidance document was published. This blog post will cover the challenges of anonymizing sensitive information. I will also share tips for maintaining the clinical utility of the documents and facilitating anonymization.
Anonymization of CSRs and submission documents
Complying with Policy 70 entails multiple challenges. Sponsors must now anonymize their documents for submission under Policy 70. The guidance document provides the pros and cons of anonymization techniques. Most techniques change documents and data sets. Creating a second anonymized report increases costs. Manipulating documents in Microsoft Word as opposed to managing all anonymization in a PDF also complicates QC.
While redaction (masking) is not the EMA’s long term preference for anonymization, it is the most likely short term solution. The guidance states, “the EMA understands that in an initial phase redaction techniques are likely to be used.” Redaction has the biggest negative impact on data utility of all of the proposed anonymization techniques. Yet, only this technique maintains integrity of the underlying document.
What are the implications of anonymizing CSRs and submission documents? Should we write our documents differently, knowing that post-opinion they will have to be anonymized? The primary purpose of submission documents is to support the agency’s review. Anonymization cannot impede that review. Yet, the versions submitted for redaction must be “true and complete” copies of the final version. These two principles seem to contradict. How do you use the preferred anonymization methods that alter the document and attest that the final modified document is a true and complete copy that will result in the same analytical accuracy?
The risk arising from regulatory requirements
Sensitive information in regulatory documents must be anonymized. Protected personal data (PPD) refers to patients and study administrators’ sensitive information. The guidance document also refers to CCI – company confidential information– which is beyond the scope of this blog post.
Many sponsors believe that their biggest challenge will be identifying CCI. But, our experience suggests that defining and identifying PPD is the trickier exercise. Which is easier – figuring out your secrets are OR figuring out what someone else considers secret?
Ensuring privacy is a “three legged stool.” Many people can be identified with three core pieces of information – location, date and demographics. An analysis of 1990 Census data showed that 87% of the U.S. population can be identified using zip code, date of birth, and gender.
Re-identification risks for study administrators
Sharing clinical trial data risks re-identifying study administrators and patients. The former group includes sponsor employees, investigators, committee members, and vendors.
The guidance states that “Personal data relating to all other clinical study personnel should also be redacted.” However, the guidance also requires retaining the “names of all CROs and vendors involved in trial related duties and function.”
Combining these two requirements, what should be redacted? The company name should remain public, but what about its address? Know the company name, and you can find its address. So should addresses remain? What about external consultants? Expert consultants often work from home or provide personal details about their consulting business. In this scenario, how do you distinguish between company and personal information?
Likewise, phone numbers are complicated. The Guidance Rejection code 1 bars redacting public information. But the Guidance also states to redact contact details for study administrators. Do your documents contain telephone numbers, and if so, how are they labeled? To facilitate anonymization, label telephone numbers to distinguish individual versus company contact numbers. Or, place named individuals other than the primary investigator and sponsor signatory in section 16.1.4, which does not have to be disclosed. This practice limits the amount of contact information provided in the CSR.
Another item for potential anonymization are dates. Most sponsors think about dates for patients, but study administrators also have dates of concern― dates of hire, termination/departure dates, and degrees earned. Handwritten signed dates may allow linkage through handwriting.
SAS footers and headers also risk identifying study administrators. Some sponsors include the statistician’s personal ID, initials, or initial last names in the SAS footer. Does your organization have a contractual relationship to allow sharing these individuals’ identities? If re-identifying a study administrator negatively impacts him, could the sponsor be liable?
Sponsor signatories and investigators comprise two trickier categories of study administrators. Their personal information should not be published, with some exceptions such as sponsor signatories.
To facilitate identifying the different categories of study administrators, regulatory documents should include clear titles for signatories and study administrators. Likewise, signatures of signatories often run over into the corresponding study roles and title information listed below the signature. Once signatures are redacted, this information is partially blacked out. Consider setting specific blocks for signatures to limit over-redaction into information that should be retained.
Investigators are one of the trickiest categories of study administrators. The EMA stated that investigators do not have a right to privacy. Organizations interpret this concept in different ways. Again, the Guidance states to redact personal data except for “The sponsor and coordinating investigator signatories of the clinical study report and the identities of the investigator(s) who conducted the trial and their sites.”
Some sponsors retain only the coordinating investigator information, while others retain all investigator information. Rejection code 3 states “names and addresses of investigator sites and the names of the principal investigators at each study site” will not be accepted for redaction.
You may rebel against the EMA and retain only the coordinating investigator information, but redact all other investigator information. Still, the challenge of information in the public domain remains. What have you posted for clinicaltrials.gov? FDAAA 801 requires publishing trials’ location and contact information. Some sponsors remove trial site information from clintrials.gov once the results have been posted. Yet, the audit trail shows that the information removed is still accessible. Thus, this information is still in the public domain and should not be redacted.
Protecting patient privacy
Clearly, we need to consider investigators’ privacy. Releasing investigation location information also risks identifying patients’ locations. Linking a specific patient with a specific investigator provides one point of location for that patient. What happens if we then reveal a patient in a lab or a hospital for non-study related care?
Per research mentioned in Opinion 05/2014, 95% of people can be identified with four location points. Location and date information—when combined with additional demographic information— risks re-identifying patients.
While investigator information should be retained, this discrepancy causes a conflict between the desire to be transparent and the need to protect patient privacy.
The CSR contains patient information in narratives, listings, and tables. The EMA requires sponsors to redact narratives. When considering patient information, what should be removed?
Challenges with Level I identifiers
Obviously, remove patient direct identifiers – names, subject IDs, telephone numbers. Quasi identifiers should also be removed because combining them could lead to re-identification. Level I quasi identifiers are personal identifiers that don’t change― i.e. demographic information. Level II quasi identifiers tend to change over time― i.e. lab values and other medical measurements.
Gender information— a Level I quasi identifier— must be redacted. A set of gender-related words must be redacted – he, she, him, her, herself, himself, male, female, etc. Yet, character length can reveal gender. To simplify anonymization, only include gender where clinically required.
Dates for patients are tricky as well. The guidance states, “Date of birth of trial participants should be redacted (month and day) with the exception of year. Other dates such as event or assessment dates can be offset.” But what about age? Most sponsors remove ages greater than 89. If birth year remains for those individuals, their age can still be derived. Since the number of patients in the 89+ category increases over time, this group is a moving target. Death dates and other patients’ dates are obviously considered for anonymization. The EMA suggests using offsetting― basing events in terms of a count of days offset from zero rather than a date.
Offsetting also presents challenges. Redacting month/day presenting the issue of offset information corresponding to dates. If the original zero value of the offset is provided within the CSR (and not redacted because not shown in conjunction with a specific patient), then the date can still be derived.
Challenges posed by Level II quasi identifiers
Level II quasi identifiers includes medical information such as individual outcomes― adverse events, medical history, current medical diagnosis. What Level II identifiers do you redact? You should also consider information on medications, lab values, and symptoms. Does leaving the symptoms or medications reveal the diagnosis? Do test names reveal the diagnosis?
What about verbatim text? Transcelerate recommends removing verbatim text but retaining medical coded terms. How do you differentiate between the two? Is your organization’s verbatim text labeled? To ease anonymization of reports, label verbatim text within documents.
Anonymization of tables, plots, and listings
Tables, plots, and listings often contain Level I quasi identifiers like height and weight. Most sponsors would remove this data in the text, tables, and listings. But what happens to information in plots? Do you remove the entire plot? Many sponsors believe that obscuring entire plots raises the suspicion that they are hiding something.
The guidance states that these listings must be retained:
- 3.1 Display of Adverse Events
- 3.2 Listings of Deaths, Other Serious and Significant Adverse Events
- 3.3 Narratives of Deaths, Other Serious and Certain Other Significant Adverse Events
What do you do with a table containing individual patient data? It’s not referred to as a listing within the title, subtitle or bookmark. And per EMA, only “listings” can be removed.
Our last consideration is IDs. In some organizations, ID types are used interchangeably and link to individual patients. How are IDs marked within your documents? Do you have a comprehensive list of IDs that can be linked to patients?
The responsibility for managing the risk of accidental disclosures
Disclosing clinical trial data per EMA Policy 70 poses numerous risks. Who owns the risk of patient or study administrator privacy breeches?
The EMA stated that, “This guidance is not intended to provide an exhaustive list of the techniques available or to mandate a specific methodology.” In addition, they state, “This guidance document is without prejudice to the obligations of the pharmaceutical companies as controllers of personal data under applicable national legislation on the protection of personal data.”
The EMA has expectations regarding information sharing. However, they do not accept liability for any privacy breaches or information misuse. Additionally, the EMA states that they do not “adopt” the anonymization report. While they will tell you what they do not like about sponsor methodologies, they do not endorse them.
EMA Policy 70 contains many nuances and issues. You should understand the risk of sharing clinical trial information, mitigate it through an educated process supported by management, and then own the deliverables. Hopefully, you now have a starting point of issues to consider as you begin preparing submissions under EMA Policy 70.
Learn more about transparency and disclosure of clinical trial data
For more information, please read this case study about how we helped a global pharmaceutical company publish anonymized CSRs for their marketed medications.