When it comes to disclosing data from clinical trials of investigational drugs to the public, the vernacular may seem just as confusing as the process! The terms “anonymization” and “redaction” are used interchangeably regarding transparency and disclosure of clinical trial data. But what are the differences in technique or method behind each? We may think of “anonymization” as assigning a patient a new subject ID, and “redaction” as black-boxing information we don’t want to disclose. But when do we use one approach over the other, and does Health Canada (HC) or European Medicines Agency (EMA) have a preference when it comes to meeting compliance requirements?
What is clinical trial data anonymization?
- Anonymization is the process of removing values for variables which allow direct or indirect identification of a clinical trial volunteer from the data. “Anonymization” is the umbrella term for transforming or masking data whereas “redaction” is just one method of anonymization.
- Clinical trial documents require anonymization. This can be performed retrospectively on a document that has already been published/submitted to a health authority and specific to the data contained in that document only. This can also be performed proactively on a document that is being submitted, what is required for European Union Clinical Trial Regulation (EU-CTR) submissions (read more about the challenges posed by EU-CTR in this blog).
- Clinical datasets, as either SDTM or ADAM data, require anonymization. These datasets are used to write the clinical documents.
- Anonymization requires a risk assessment to a predetermined threshold (often 0.09) to determine the probability of re-identification of a clinical trial volunteer. The number of data fields in the dataset requiring anonymization depends on the dataset’s risk score. Higher risk scores mean more fields must be anonymized. Often, a statistician will assist with determining this calculation.
- Anonymization (excluding redaction) provides high data utility for scientists in the research community! Common anonymization methods include:
- Generalization: Replace actual data values with a substitution value or numeric range. There are two types: character and numeric. Examples include banding on ages (i.e., 65 is replaced with 60-70) and generalization of countries (i.e., United States of America is replaced to North America)
- Offset: Sets all First Date Collected values to the Anchor Date, and then shifts all other date variables to maintain respective offsets in relation to the First Date Collected. The following parameters must be identified:
- Anchor Date (typically, a study milestone such as start date)
- First Date Collected Domain (the domain in which First Date Collected exists)
- First Date Collected Variable (the column representing First Date Collected)
- Recoding: Overwrites actual values with a randomly generated value
- Shuffling: Randomly moves values from one row to another. Examples include shuffling of patient IDs, with a priority on the relationship between original and anonymized values preserved across datasets for the same subject. Typically, the DM (Demographics) domain
- Redaction: Masking a data field in a document with a black box, such that it is irreversibly blocked out. Individual data fields may be redacted using a Word tool, or entire pages or sections of documents using a Box tool (i.e., scanned pages, figures, listings). This method offers little to no data utility.
The draft EU-CTR guidance on the protection of personal protected data (PPD) and commercially confidential information (CCI) in documents uploaded and published in the Clinical Trial Information System (CTIS) states the main anonymization techniques are randomization and generalization. Often, these two techniques can be quickly and consistently applied to most all patient-specific data fields in a document, contained within the SDTM DM domain: SUBJID, AGE, SEX, RACE, ETHNICITY, COUNTRY.
What is redaction?
- Redaction is an anonymization technique that masks data entirely with an overlay or black box. Think of redaction as like whiting out a word on a piece of paper.
- Redaction can be used on PPD, CCI, and in Sponsor data. Each redaction will use unique overlay text (i.e., PPD versus CCI).
- The overlay box must contain a regulatory-authority-specific text and color. For example, Health Canada requires that redaction boxes over patient data be light blue and contain the text “PPD.”
- Redaction is a common method for masking CCI in a document. This is information/data confidential to the Sponsor that disclosure may undermine their legitimate economic interest or competitive position. Examples include novel developments on products or intellectual property and drug chemical identity or exact composition (read more about how to identify CCI in this blog).
Do regulatory health authorities prefer one type of anonymization over the other?
A range of anonymization techniques can be used with preservation of data utility kept in mind! Under EMA’s Policy 0070 and 0043 and HC’s Public Release of Clinical Information (PRCI), there is no absolute direction as to which technique is preferred. At times, it is impossible to avoid redaction (i.e., CCI), and at other times, it may be more beneficial to anonymize with shuffle (i.e., patient IDs). In certain therapeutic areas or in sensitive patient populations, the preferred method is redaction to lessen the risk of reidentification. EMA and HC tend to push for the redaction of gender-specific concomitant medication or medical history that when retaining this information within a document (i.e., patient safety narrative) could reveal the patient’s gender. Nevertheless, the submission landscape continues to change; EMA and HC are more flexible in approving of anonymization techniques that maximize data utility.
Do I need technology to help me with anonymization?
The short answer is, yes! Technology can greatly reduce the manual efforts required to anonymize regulatory documents. Retrospectively anonymizing entire dossiers, for example, may involve working with tens of thousands of pages of documents, spanning across Clinical Trial Protocols, Clinical Study Reports, Case Report Forms, Statistical Analysis Plans, to name a few. The effort required to review these documents, page by page, is tremendous. Utilizing a technology backed by artificial intelligence, machine learning, and natural language processing will streamline this process by identifying data requiring anonymization for you. There are various technology platforms available that can perform advanced anonymization techniques as well as apply simpler redaction techniques to your documents with the click of a button. Technology enables an accuracy and consistency across a high volume of pages that is difficult to match through a fully-manual effort alone. It may also bring innovation and efficiency to your current processes by offering proactive data and document anonymization during initial drafting (learn how our Synchrogenix Writer technology can help). While the human review effort may not be eliminated entirely, technology can make the difference between meeting a regulatory authority submissions or request on time or missing the mark.
As a leading technology and services provider in the transparency and disclosure space, Certara can be your next partner for all your anonymization and redaction needs. Our years of experience in successfully supporting programs across regions, as well as our exclusive team of experts, are primed to help you meet and exceed your compliance requirements. Click below to learn more and to contact us.