OpenStudyBuilder - Automating the Veeva EDC study setup
Katja Glass Consulting
/@katjaglassconsulting8982
Published: November 24, 2025
Insights
This video provides an in-depth exploration of the initial steps in automating the Electronic Data Capture (EDC) study setup within Veeva systems, leveraging an open-source metadata repository called OpenStudyBuilder. The primary objective of this automation is to significantly reduce cycle times, align with the TransCelerate Digital Data Flow vision, and minimize manual tasks and potential human errors in clinical trial setup. The presentation details a proof-of-concept (PoC) implementation that serves as a foundation for a planned Veeva EDC integration release in 2026, emphasizing that while full automation is a future goal, the current PoC already delivers substantial benefits.
The core of the automation process begins with the synchronization of the Case Report Form (CRF) library between OpenStudyBuilder and Veeva EDC. Currently, this synchronization is semi-automated, utilizing existing Veeva APIs. The process involves a translation script that extracts Veeva EDC library content, converts it into the ODM (Operational Data Model) format, and then populates the OpenStudyBuilder library by creating new forms. While SDTM (Study Data Tabulation Model) and other annotations are presently added manually, the video highlights ongoing efforts to finalize the CRF model and link activity instances to CRF items. A crucial second step involves maintaining alignment between the two libraries, for which a script is employed to identify and highlight differences, currently outputting a CSV file with plans for a future UI within OpenStudyBuilder.
The video then moves to demonstrate the automation of the EDC study setup itself, again utilizing existing Veeva APIs. The automated tasks include the creation of study event groups and events based on OpenStudyBuilder's Schedule of Activities (SoA), the import of standard forms from the synchronized Veeva EDC library into the newly created trial, and the foundational setup of the study within Veeva EDC based on OpenStudyBuilder's operational data. It is noted that the study-level data collection module in OpenStudyBuilder is still under development, necessitating a workaround using Neo-Ash reports. This workaround facilitates the linking of activity instances, which represent CDISC (Clinical Data Interchange Standards Consortium) biomedical concepts, to specific CRF items. Following the selection of forms based on these activity instances, the automation sequence is initiated, culminating in the rapid population of content for the first draft of the study within Veeva. The presentation concludes by reiterating the vision for full automation once new Veeva API endpoints are released and OpenStudyBuilder's data collection module is finalized.
Key Takeaways:
- Strategic Importance of Automation: Automating the Veeva EDC study setup is critical for achieving significant gains in cycle time, reducing manual tasks, minimizing errors, and aligning with industry initiatives like the TransCelerate Digital Data Flow vision.
- Open-Source Metadata Repository: OpenStudyBuilder serves as a central, open-source metadata repository and study metadata solution, providing the foundational data and structure necessary for automated EDC setup.
- Phased Automation Approach: The current implementation is a proof-of-concept demonstrating semi-automated processes, with a clear roadmap towards full automation contingent on the release of new API endpoints and the finalization of specific modules within OpenStudyBuilder.
- CRF Library Synchronization: A fundamental step in the automation is the robust synchronization of CRF libraries between OpenStudyBuilder and Veeva EDC, ensuring consistency and accuracy of data collection instruments across systems.
- Technical Workflow for Synchronization: The synchronization process involves extracting Veeva EDC content via existing APIs, translating it into the ODM format, and then populating the OpenStudyBuilder library with new forms, highlighting the need for data transformation capabilities.
- Managing Library Alignment and Updates: Ongoing management of CRF library alignment is crucial. A script is used to identify differences between the OpenStudyBuilder and Veeva EDC libraries, facilitating the implementation of necessary updates and maintaining data integrity.
- Specific Automated EDC Setup Tasks: The PoC successfully automates key aspects of EDC study setup, including the creation of study event groups and events, importing standard forms from the synchronized library, and initiating the study structure within Veeva EDC.
- Leveraging Schedule of Activities (SoA): The automation relies heavily on the Schedule of Activities defined within OpenStudyBuilder, which dictates the sequence and structure of events and forms within the clinical trial.
- Addressing Development Gaps with Workarounds: The project demonstrates adaptability by implementing workarounds, such as using Neo-Ash reports, to bridge gaps where OpenStudyBuilder's study-level data collection module is still under development, ensuring progress despite ongoing development.
- Integration with Data Standards: The process involves linking activity instances, which are implementations of CDISC biomedical concepts, to CRF items, underscoring the importance of adhering to industry data standards for interoperability and data quality.
- Future Vision for Full Automation: The ultimate goal is complete automation of the EDC setup, which promises to further streamline clinical trial initiation and reduce the burden on clinical operations teams, once the necessary API enhancements and module developments are complete.
Tools/Resources Mentioned:
- Veeva EDC: The Electronic Data Capture system being automated.
- OpenStudyBuilder: An open-source metadata repository and study metadata solution.
- Veeva APIs: Application Programming Interfaces provided by Veeva, used for data extraction and system interaction.
- ODM Format: Operational Data Model, an XML-based standard for exchanging clinical trial metadata and data.
- Neo-Ash Reports: A specific reporting tool or system used as a workaround for an under-development module.
Key Concepts:
- EDC (Electronic Data Capture): A system used in clinical research to collect and manage patient data electronically, replacing traditional paper-based methods.
- CRF Library (Case Report Form Library): A repository of standardized forms used for data collection in clinical trials, ensuring consistency across studies.
- Metadata Repository: A centralized system for storing and managing metadata (data about data), crucial for defining and standardizing clinical trial elements.
- Schedule of Activities (SoA): A detailed plan outlining all procedures, visits, and data collection points for each participant in a clinical trial.
- SDTM (Study Data Tabulation Model): A standard developed by CDISC for organizing and formatting clinical trial data for submission to regulatory authorities.
- CDISC (Clinical Data Interchange Standards Consortium): An organization that develops data standards to support the acquisition, exchange, submission, and archival of clinical research data and metadata.
- TransCelerate Digital Data Flow Vision: An industry initiative aimed at improving the efficiency and effectiveness of clinical trials through digital transformation and seamless data exchange.