Cost effective Data Operations

The need for an E2E data standards ecosystem

Isabelle de Zegher

Isabelle de Zegher

Vice President, Integrated Solutions at PAREXEL INFORMATICS, Belgium

Michael Goedde

Michael Goedde

Vice President Global Data Operations, PAREXEL, US

Benedikt Egersdörfer

Benedikt Egersdörfer

Vice President Global Data Operations, PAREXEL, US


Successful organisations need to modify reactive operating model and develop an E2E data standards ecosystem with a different organisational structure.


The main focus of  Data Operations groups within biopharmaceutical companies is to ensure the highest possible quality of data for clinical studies. With the combined challenges of compliance to Data Standards, increasing diversity of data collection tools, near-real time visualisation of safety data, and the opportunity of in-silico studies, Data Operations must now produce high quality data throughout the entire execution of a study, not just at the end. There active operating model in place in many companies and focusing on data standardisation at submission time is unsustainable. Successful organisations will manage the burden of data standardisation proactively to solve these emerging challenges in a cost effective way, through the establishment of an End-to-End (E2E) Data Standards ecosystem, from protocol to clinical study report and beyond.

Data Operations is Faced with a Combination of Challenges

Over the last five years, the number of challenges has grown for Data Operations.

The FDA has mandated CDISC SDTM and ADaM standards1 for regulatory submission as of December 2016 [1]. Japanese authorities will require SDTM as of October2016, with a transition period of 3.5 years [2]. In addition, regulatory agencies expect full traceability of data, from source data to analytic data sets.

Personalised medicine and adaptive trial design increase the diversity of data points to be collected. New data collection modalities, and specifically patient mobile apps and wearables, are being introduced. They have all different implementation formats, yet they need to be integrated for analysis and reporting in the same way as classical Electronic Data Capture (EDC) systems.

Data surveillance and near-real time visualisation of safety profiles throughout execution of a clinical study have become regular practice. This requires periodic delivery of standardised data sets, integrated across all data collection tools, during the conduct of the study.

In-silico clinical studies are increasingly accepted as supporting evidence toward market authorisation and decrease the need for costly in-vivo studies. However, the predictive value of the models underpinning in-silico clinical studies is directly related to the quality of data used to generate and test these models. While Electronic Health Recorddatais predominantly used, availability of standardised clinical study data, comparable across studies, would contribute to higher quality predictive models.

In the reactive operating model in place in many organisations, clinical study data isstandardised at submission time. Contextual information on the study is frequently absent and implicit data is not collected. As a result, post data collection standardisation of study data is difficult, workload intensive, and the quality of transformed data is questionable, while traceability from data sources to analysis results issuboptimal.

To be cost effective, data operations need to implement data standardisation prospectively within an E2E data standards ecosystem

Going forward, data gathered for a single clinical study should be collected with the end in mind, for example continuous delivery of data during study execution, integrated across several data collection tools, and re-use of data beyond the clinical study report should all be considered. The current, reactive operating model must be replaced by a proactive approach: E2EData Standards, starting at data collection.

Data Operations most often consists of the following groups (Figure 1 -left).

• Data Management coordinates data collection and monitors data quality. This group is usually responsible for Data Collection Standards – such as CDISC CDASH – and management of electronic Case Record Form (eCRF) libraries for EDC

• Clinical Programming transforms collected data into the CDISC SDTM format required for submission ; they maintain the SDTM Standards and related mapping programs

• Statistical Programming and Biostatistics is responsible for analysis & reporting, and delivers submission ready material to the Medical Writing group. They typically maintain the Analysis Data Standards, CDISC ADaM, synchronised with a set of re-usable statistical programs managed within a “macro library”.

In this environment, Data Standards are maintained through siloed, disjointed governance processes and tools. Generation of submission data is done sequentially: data collection standards are defined first as part of study set-up; when this is approved SDTM mapping is specified; and finally ADaM is defined after SDTM mapping has been validated. This is inefficient, certainly in case of changes or error tracking, and results in workload intensive mapping exercises.

Successful organisations need to modify this operating model and develop an E2EData Standards ecosystem with a different organisational structure (Figure 1 – right).

• A central Data Standards group coordinates integrated Data Standards, from data collection to reporting: there is one single set of Data Standards across the organisation, with different “views” for data collection, submission and analytics, maintained within a central Metadata Repository (MDR).

• Clinical Programming focuses on the implementation of EDC specific eCRF libraries, synchronised with the standards defined within the MDR. This enables maintenance of different EDC specific eCRF libraries from the same central data specifications and allows for flexibility when selecting EDC systems across trials. In addition, as other data collection tools are being added (central laboratory, eCOA, wearables, mobile apps, etc.), it is possible to implement “eCRF library” equivalents for these other tools in synchronisation with the MDR.

• Finally, Statistical Programming generates SDTM and ADaM through re-usable programmes, managed within an integrated Statistical Computing Environment (SCE) and synchronised with the standards definitions contained in the MDR.

The E2E data standards ecosystem requires new technologies: MDR and SCE

The new Data Standards paradigm requires organisational adaptations but also technology upgrades in the MDR and SCE space.

AnE2E ClinicalMDRmust go beyond management of standard variables. It needs to support concepts, where each concept is a group of variables that must be managed together to ensure proper meaning [4].A concept is defined through a semantic group – or “hub” – and has different context-specific views– or “spokes”: the data collection spoke include CDASH and supports the definition of the eCRF forms synchronised with the EDC eCRF library, the SDTM spoke supports SDTM mapping and the ADaM spoke provides the template for ADaM derivation.

When the clinical programmer selects the data collection forms for a study, the Clinical MDR automatically identifies the related SDTM and ADaM spokes, and generates the SDTM and ADaM specifications, eliminating the need for sequential work and automatic traceability [5]. A Clinical MDR should also support structured entry of study design information such as arms and epochs, visit schedule, phase, indication, patient population  and make this information available in a standardised format, ready for re-use across all applications of the eClinical landscape enabling the“Enter once, use everywhere” principle.

The SCE[5], supported by a set of macro libraries, is another technology that must go beyond the currently fragmented statistical programming tools. A common SCE should be used for SDTM mapping and ADaM generation and should support the maintenance of macro libraries synchronised with the definition of the standards contained in the MDR. Whenever there is a change in SDTM or ADaM, the changes need to be propagated to the SCE where updates of the relevant macros are controlled through workflow.AnSCE should also support recurrent execution of macros for a specific study with version control and full traceability of the different runs required during study execution. Ultimately, an SCE should generate output that supports automation of the Clinical Study Report and electronic Common Technical Document (eCTD) publishing.

E2E data standards: impact on Sponsor / CRO partnership

The new E2E Data Standards approach impacts the way contract research organisations (CROs) collaborate with clinical trial sponsors.

Be it in a complete or partial outsourced model, sponsors must ensure central management of Data Standards, as “single source of truth” across the different stakeholders; this can be outsourced to a CRO with pre-existing Data Standards. Each partner CRO needs to ensure synchronisation with the “single source of truth” of the Sponsor Data Standards, and when delivering data back to the sponsor, CROs should issue a compliance report checking conformance of the data to the standards.

E2EData Standards require linkages across all standards. Many sponsors still maintain their Data Standards separately in EDC-specific eCRF library and SAS based tools. CROs will have to work with sponsors to enable “Hub & Spoke” definition of Sponsor Data Standards toward the E2Eecosystem.

The choice of EDC – or any data collection tool - for a CRO becomes irrelevant; what is important is conformance of the EDC/data collection tool output to the Sponsor Data Standards, independently of the technology used.


Successful organisations must transform the burden of data standardisation into an opportunity to address emerging challenges most effectively.