The Do’s and Don’ts of Define.xml

June 27, 2025

Define.xml is “arguably the most important part of the electronic dataset submission for regulatory review,” according to The FDA’s Technical Conformance Guide. It helps reviewers gain familiarity with study data, its origins and derivations, as well as sponsor-specific implementation of CDISC standards.

As the importance of define.xml during the review process increases, careful consideration should be applied to ensure all the information within the define.xml is clear and concise. In many cases, CDISC standards are open to interpretation and sponsors often have their own internal standards that build upon CDISC theories and standards. The review team attempting to analyze the data package, who is not only unfamiliar with the study but also unfamiliar with the sponsors’ own internal standards, will need a define.xml that correctly and clearly describes all origins and derivations.

It can often be difficult to determine what content to include in a define.xml file. A file that is lacking or missing information will increase the amount of time it takes reviewers to familiarize themselves with the data package.

In common practice, the define.xml is created just prior to a submission, generally by someone who is very close to the data and derivations. In other cases, the define.xml is created while the study is still ongoing, where data and derivations often change as protocol amendments are made. Both cases require careful quality control and review processes to ensure common mistakes are avoided.

To help get you started, we have compiled a list of Define.xml “Do’s and Don’ts” as well as answers to Define.xml frequently asked questions.

Do’s and Don’ts of Define.xml

General

DO

Create codelists that describe the data collection process and include all planned terms for the variable. All CRF options should be included in the codelists, not just those present in the data.

DO

Keep it concise. Take time to ensure only relevant information is provided in the define.xml.

DON’T

Make your define.xml too complicated. Remember that review team who are not familiar with your data or mappings will need to navigate your define.xml.

DON’T

Assume the end users of your define.xml have CDISC knowledge. Taking the time to ensure your define.xml is clear, concise, and consumable for review team is important.

DON’T

Assume everyone reading your define.xml has programming knowledge. The define.xml is intended to be both machine and human readable and contains information that different members from review team might need to reference.

Derivations and comments

DO

Concisely define all the derivations used. Define.xml should provide derivations that are clear and concise to replicate the same results.

DO

Describe all internal standards. To avoid confusion, ensure that the review team are given the necessary information to leverage Sponsor standards during the review process.

DON’T

Cut corners and list a derived variable as Origin = Assigned. It is always best to provide the derivation.

DON’T

Have any raw data references in your derivations or comments. The review team does not have access to your raw data.

DON’T

Blindly copy out derivations and comments from the mapping spec into the define.xml. Often there is coding language and raw data references that may clutter your file. Instead, replace the technical jargon with human-readable language.

Codelists

DO

Create codelists for each variable populated by a list of pre-defined terms. Variables collected via drop down lists, or which have a pre-defined limited set of terms, should have an associated codelist within the define.xml.

DO

Create codelists that describe the data collection process and include all planned terms for the variable. All CRF options should be included in the codelists, not just those present in the data.

DON’T

Have one UNIT codelist for all unit variables/values across the data package. When a reviewer clicks on EXDOSU codelist, they want to see only the units in the EX domain, not units across EX, LB, VS, etc.

DON’T

Create codelists with all values from CDISC CT when many values are irrelevant to your data package – similarly, when a reviewer clicks on EXDOSU codelist, they do not want to see all 500+ units from CDISC CT.

DON’T

Complicate your define.xml with codelists meant for other data packages. Including ADaM codelists in SDTM define.xml, and vice versa, will lead to complex and large define.xml files to navigate.

Before you submit

DO

Look at your finalized define.xml file using the stylesheet. Incorporating this step into internal review process is sure to find some overlooked details.

DO

Create a separate PDF file for large derivations that require formatting. The define.xml standard does not account for formatting (e.g., new line characters, numbered lists, bullet points, etc.)

DON’T

Just hit the “generate define.xml” button in your software tool. Incorporating common-sense QC steps will go a long way. Spend time reviewing your define.xml through a web browser with the stylesheet applied.

Guide

Define.xml Submission Checklist

Download our helpful checklist for your copy of the essential Define do’s and don’ts.

Download now

Want to learn more? Check out our blog post containing answers to Define.XML Frequently Asked Questions or watch the webinar below.

Need more help with Define.xml?

If you have lingering questions or need additional support, we’re here to assist. Reach out to our team for expert guidance and ensure your submissions are seamless and compliant.

Trevor Mankus

Product Manager, Pinnacle 21 by Certara

As a Product Manager at Certara, Trevor Mankus helps deliver value to customers via improvements to Pinnacle 21 software. Trevor identifies problems and opportunities and then directs our engineering R&D efforts at these areas to help create the most value — optimizing complex trade-offs in the process. Product managers sit at the intersection of business, UX design, and technology. Prior to his time at Certara, he enjoyed working in the healthcare industry at various Pharmaceuticals and CROs as a clinical programmer.

Trevor has been an active CDISC member since 2009 focusing primarily in the ADaM project. He was selected to be the future CDISC ADaM team lead (2026-2027) and has been co-leading the CDISC ADaM Conformance Rules sub-team for many years. Notable accomplishments include multiple publications of the CDISC ADaM conformance rules catalog.

David Roulstone

CDISC Standards Governance Advisor at Boehringer Ingelheim

David Roulstone serves as CDISC Standards Governance Advisor at Boehringer Ingelheim. Prior to his current role, he was the Associate Director of Clinical Data Standards at Pinnacle 21 from September 2016 to April 2019.

This blog was originally written by David Roulstone in 2018, and was updated by Trevor Mankus in June 2025.

The Do’s and Don’ts of Define.xml