Pharma Focus Asia
Klöckner Pentaplast - Pentapharm® alfoil® films

Cost effective Data Operations

The need for an E2E data standards ecosystem

Michael Goedde, Vice PresidentGlobal Data Operations, PAREXEL, US

Isabelle de Zegher, Vice President, Integrated Solutionsat PAREXEL INFORMATICS, Belgium

Benedikt Egersdörfer, Vice PresidentGlobal Data Operations, PAREXEL, US

Successful organisations need to modify reactive operating model and develop an E2E data standards ecosystem with a different organisational structure.

The main focus of  Data Operations groups within biopharmaceutical companies is to ensure the highest possible quality of data for clinical studies. With the combined challenges of compliance to Data Standards, increasing diversity of data collection tools, near-real time visualisation of safety data, and the opportunity of in-silico studies, Data Operations must now produce high quality data throughout the entire execution of a study, not just at the end. There active operating model in place in many companies and focusing on data standardisation at submission time is unsustainable. Successful organisations will manage the burden of data standardisation proactively to solve these emerging challenges in a cost effective way, through the establishment of an End-to-End (E2E) Data Standards ecosystem, from protocol to clinical study report and beyond.

Data Operations is Faced with a Combination of Challenges

Over the last five years, the number of challenges has grown for Data Operations.

The FDA has mandated CDISC SDTM and ADaM standards1 for regulatory submission as of December 2016 [1]. Japanese authorities will require SDTM as of October2016, with a transition period of 3.5 years [2]. In addition, regulatory agencies expect full traceability of data, from source data to analytic data sets.

Personalised medicine and adaptive trial design increase the diversity of data points to be collected. New data collection modalities, and specifically patient mobile apps and wearables, are being introduced. They have all different implementation formats, yet they need to be integrated for analysis and reporting in the same way as classical Electronic Data Capture (EDC) systems.

Data surveillance and near-real time visualisation of safety profiles throughout execution of a clinical study have become regular practice. This requires periodic delivery of standardised data sets, integrated across all data collection tools, during the conduct of the study.

In-silico clinical studies are increasingly accepted as supporting evidence toward market authorisation and decrease the need for costly in-vivo studies. However, the predictive value of the models underpinning in-silico clinical studies is directly related to the quality of data used to generate and test these models. While Electronic Health Recorddatais predominantly used, availability of standardised clinical study data, comparable across studies, would contribute to higher quality predictive models.

In the reactive operating model in place in many organisations, clinical study data isstandardised at submission time. Contextual information on the study is frequently absent and implicit data is not collected. As a result, post data collection standardisation of study data is difficult, workload intensive, and the quality of transformed data is questionable, while traceability from data sources to analysis results issuboptimal.

To be cost effective, data operations need to implement data standardisation prospectively within an E2E data standards ecosystem

Going forward, data gathered for a single clinical study should be collected with the end in mind, for example continuous delivery of data during study execution, integrated across several data collection tools, and re-use of data beyond the clinical study report should all be considered. The current, reactive operating model must be replaced by a proactive approach: E2EData Standards, starting at data collection.

Data Operations most often consists of the following groups (Figure 1 -left).

• Data Management coordinates data collection and monitors data quality. This group is usually responsible for Data Collection Standards – such as CDISC CDASH – and management of electronic Case Record Form (eCRF) libraries for EDC

• Clinical Programming transforms collected data into the CDISC SDTM format required for submission ; they maintain the SDTM Standards and related mapping programs

• Statistical Programming and Biostatistics is responsible for analysis & reporting, and delivers submission ready material to the Medical Writing group. They typically maintain the Analysis Data Standards, CDISC ADaM, synchronised with a set of re-usable statistical programs managed within a “macro library”.

In this environment, Data Standards are maintained through siloed, disjointed governance processes and tools. Generation of submission data is done sequentially: data collection standards are defined first as part of study set-up; when this is approved SDTM mapping is specified; and finally ADaM is defined after SDTM mapping has been validated. This is inefficient, certainly in case of changes or error tracking, and results in workload intensive mapping exercises.

Successful organisations need to modify this operating model and develop an E2EData Standards ecosystem with a different organisational structure (Figure 1 – right).

• A central Data Standards group coordinates integrated Data Standards, from data collection to reporting: there is one single set of Data Standards across the organisation, with different “views” for data collection, submission and analytics, maintained within a central Metadata Repository (MDR).

• Clinical Programming focuses on the implementation of EDC specific eCRF libraries, synchronised with the standards defined within the MDR. This enables maintenance of different EDC specific eCRF libraries from the same central data specifications and allows for flexibility when selecting EDC systems across trials. In addition, as other data collection tools are being added (central laboratory, eCOA, wearables, mobile apps, etc.), it is possible to implement “eCRF library” equivalents for these other tools in synchronisation with the MDR.

• Finally, Statistical Programming generates SDTM and ADaM through re-usable programmes, managed within an integrated Statistical Computing Environment (SCE) and synchronised with the standards definitions contained in the MDR.

The E2E data standards ecosystem requires new technologies: MDR and SCE

The new Data Standards paradigm requires organisational adaptations but also technology upgrades in the MDR and SCE space.

AnE2E ClinicalMDRmust go beyond management of standard variables. It needs to support concepts, where each concept is a group of variables that must be managed together to ensure proper meaning [4].A concept is defined through a semantic group – or “hub” – and has different context-specific views– or “spokes”: the data collection spoke include CDASH and supports the definition of the eCRF forms synchronised with the EDC eCRF library, the SDTM spoke supports SDTM mapping and the ADaM spoke provides the template for ADaM derivation.

When the clinical programmer selects the data collection forms for a study, the Clinical MDR automatically identifies the related SDTM and ADaM spokes, and generates the SDTM and ADaM specifications, eliminating the need for sequential work and automatic traceability [5]. A Clinical MDR should also support structured entry of study design information such as arms and epochs, visit schedule, phase, indication, patient population  and make this information available in a standardised format, ready for re-use across all applications of the eClinical landscape enabling the“Enter once, use everywhere” principle.

The SCE[5], supported by a set of macro libraries, is another technology that must go beyond the currently fragmented statistical programming tools. A common SCE should be used for SDTM mapping and ADaM generation and should support the maintenance of macro libraries synchronised with the definition of the standards contained in the MDR. Whenever there is a change in SDTM or ADaM, the changes need to be propagated to the SCE where updates of the relevant macros are controlled through workflow.AnSCE should also support recurrent execution of macros for a specific study with version control and full traceability of the different runs required during study execution. Ultimately, an SCE should generate output that supports automation of the Clinical Study Report and electronic Common Technical Document (eCTD) publishing.

E2E data standards: impact on Sponsor / CRO partnership

The new E2E Data Standards approach impacts the way contract research organisations (CROs) collaborate with clinical trial sponsors.

Be it in a complete or partial outsourced model, sponsors must ensure central management of Data Standards, as “single source of truth” across the different stakeholders; this can be outsourced to a CRO with pre-existing Data Standards. Each partner CRO needs to ensure synchronisation with the “single source of truth” of the Sponsor Data Standards, and when delivering data back to the sponsor, CROs should issue a compliance report checking conformance of the data to the standards.

E2EData Standards require linkages across all standards. Many sponsors still maintain their Data Standards separately in EDC-specific eCRF library and SAS based tools. CROs will have to work with sponsors to enable “Hub & Spoke” definition of Sponsor Data Standards toward the E2Eecosystem.

The choice of EDC – or any data collection tool - for a CRO becomes irrelevant; what is important is conformance of the EDC/data collection tool output to the Sponsor Data Standards, independently of the technology used.


Successful organisations must transform the burden of data standardisation into an opportunity to address emerging challenges most effectively.

The introduction of the E2E Data Standards ecosystem is a challenge in itself, but is required to meet wider industry challenges in a cost effective and high quality manner.

• Compliance to Data Standards for regulatory submission - with traceability from data collection onwards – is the primary target and must start at data collection, in conformance with a central MDR. Additionally, conformance reports must be produced to measure compliance to FDA standards as well as conformance to sponsor-specific Data Standards.

• Generation of integrated SDTM data sets across a diverse set of data collection tools must be facilitated by implementing tool specific “eCRF libraries” conforming with the enterprise Data Standards managed within the MDR.

• Specification of SDTM must be automatically generated at study set-up, together with data collection, in order to enable near-real time data visualisation from first patient, first visit onwards.

• And finally, each clinical study must come with a rich layer of metadata. Clinical studies data should be stored in the format they have been generated, with the metadata layer enabling just in time transformation for modelling and simulation and in-silico studies or for other secondary data use.

The implementation and utilisation of an E2E Data Standards ecosystem is the most cost-effective and efficient way for Data Operations groups to remain competitive while providing the highest possible quality of data within each clinical study.


[1].“Providing Regulatory Submissions in Electronic Format — Submissions Under Section 745A(a) of the Federal Food, Drug, and Cosmetic Act”. Guidance for Industry. December 17, 2014.


[3]. Cantrell, Kusserow, James, Copeland, Patro, de Zegher. “Paper CD12. Pattern based Metadata Repository: toward high quality data standards”. PhUse Annual Conference, Barcelona 2016.

[4]. de Zegher, Gray, Sullivan, Goedde. “Paper DH01. Effective use of a Metadata Repository across data operations: the need for a machine readable form of (part of) the protocol”. PhUse Annual Conference, Barcelona 2016.

[5]. Hopkins, Duke, Dubman, “Statistical Computing Environments and the Practice of Statistics in the Biopharmaceutical Industry”, Drug Information Journal, Vol. 44, 2010.

Author Bio

Michael Goedde

Michael Goedde is Vice President Global Data Operations, PAREXEL, since 2014. He has more than 25 year of experience in the Pharma and CRO industry, working in all areas of Clinical Data Management and Programming.Michael is a Certified Clinical Data Manager and holds a BS in Computer Sciences.

Isabelle de Zegher

Isabelle de Zegher is Vice President, Integrated Solutions at PAREXEL INFORMATICS. She has 12 years of experience in pharma through PAREXEL, Novartis, UCB and Cap Gemini, and 10 years in Health Care IT. She served on the CDISC Board of Directors for three years.

Benedikt Egersdörfer

Benedikt Egersdoerfer is Vice President Global Data Operations at PAREXEL. He has 25+ years industry experience in the Pharma and CRO sector and developed/led data related functions from safety data processing through clinical DM, Medical Coding, DB Programming, Statistical Biostatics, Medical Writing to Data Science.

magazine-slider-imageBIOVIA from Molecule to MedicineMFA + MMA 2024CPHI China || PMEC China 2024Asia Healthcare Week 2024Advance DoE WorkshopCPHI Korea 2024CHEMICAL INDONESIA 2024INALAB 2024 Thermo Scientific - DynaDrive and DynaSpinRehab Expo 2024ISPE Singapore Affiliate Conference & Exhibition 20242024 PDA Cell and Gene Pharmaceutical Products Conference 2024 PDA Aseptic Manufacturing Excellence Conference2024 PDA Aseptic Processing of Biopharmaceuticals Conference3rd World ADC Asia 2024LogiPharma Asia 2024