New European and U.S. clinical trial data transparency initiatives—such as EMA Policy 70, which goes into effect this month—are creating additional disclosure compliance requirements for pharma and biotech companies. In this blog post, I’ll discuss the implications of these data transparency initiatives and present how Synchrogenix, powered by ClinGenuity and a Certara company, is addressing this emerging business challenge.
What do clinical trial transparency initiatives require?
These transparency initiatives have, at their core, the distribution of clinical trial data for public consumption. Clinical trial data typically are contained within regulatory documents such as Clinical Study Reports (CSRs), Marketing Application Submission Documents (NDAs, MAAs, BLAs, etc.) and others. To achieve compliance with these mandates, pharma and biotech companies will need to redact and de-identify data sets in their clinical study reports and submission documents, produce research summaries suitable for a lay audience, and publish their clinical study information publicly.
What type of sensitive information requires redaction?
Sensitive information typically falls into one of three categories:
- Personally Identifiable Information (PII) [such as names, phone numbers, email addresses]
- Patient Protected Data (PPD) [subject or patient ID numbers, event profiles, etc.]
- Company Confidential Information (CCI) [trade secrets and protected IP not intended for public consumption]
These three critical categories of sensitive information must be identified and then either redacted or removed PRIOR to releasing the documents to the public.
Manual redaction: a slow, error-prone process
Currently, there are three options for transparency compliance:
- Fully manual process
- Manual process aided by minimal technology
- Artificial Intelligence based automated redaction
The manual process involves humans reading through regulatory documents and manually redacting sensitive information either with black marker, or using Adobe® Pro redaction tools. This is the least effective option because of its high potential for accidental disclosure. CSRs, and other submission documents, can be up to 50,000 pages in length. On average, there are 600 words per page, or approximately 30 million words per CSR. Even a standard, small prospective transparency initiative would include roughly twenty five documents per year.
A human, in order to do this effectively, must review all 30 million words in each document to try and identify, on a case by case basis, sensitive information as defined previously. On an annual basis, that would mean 750 million words accurately reviewed by humans. This is simply not an option you should consider, and has proven to be fraught with errors.
Manual redaction aided by minimal technology is also plagued with problems
Several companies have attempted to approach this problem by assisting the manual process with rudimentary technology “tools”. These tools almost always consist of Adobe Pro Redaction tools with minor modifications. At best, they incorporate a pattern matching regular expression tool. They also tend to promote the ability to search and redact.
Even with these “tools,” the approach is still the same as the fully manual process. Pattern matching and/or search functionality WITHOUT artificial intelligence essentially reduces the tool to a manual process. For pattern matching and search functions to work effectively, the human driving the process would need to know every possible name, phone number, email address, subject ID, event term, etc. to enable pattern matching or search functions to accurately locate and redact this information. For example, with a search and redact mechanism, an individual user begins by entering the term for the tool to search for and then redact consistently throughout the document. This is a useful tool only if the human knows every possible name in the history of the world to search for. Or every possible phone number or email address. Obviously, this is not possible. Therefore, even manual redaction assisted with rudimentary tools is still essentially a fully manual process that is prone to the same pitfalls.
How Artificial Intelligence technology can help meet transparency and disclosure demands
Artificial Intelligence (AI) engines are built on natural language recognition. Thus, AI engines can identify individual words, parts of speech, word combinations, and phrasing combinations automatically to determine context. At ClinGenuity, we’ve configured our AI engines to automatically identify PII, PPD and CCI the same way a human would be trained on these definitions. The process of identifying and redacting sensitive information, therefore, becomes automated and significantly more accurate. AI-powered redaction technology can also handle significantly more volume than human options, and can also be deployed as a durable long-term solution without the need for significant oversight by the client.
Moreover, the automated redaction management capabilities that power Synchrogenix complement our existing transparency services, which include registration, disclosure documentation, and clinical lay summary development. The combination of cutting-edge AI technology and an expert team of medical writers results in a true end-to-end solution. Our unique approach to regulatory writing helps our clients meet global transparency requirements efficiently. Read this case study to learn how our team helped a client meet