Medidata is a leader in powering smarter treatments and healthier people through innovative digital solutions for clinical trials. The Data Science Intern will work with the Platform AI Team to help digitize clinical trial protocol documents and improve data extraction processes using advanced technologies such as machine learning and large language models.
Protocol documents are the foundational document for running Clinical Trials. Medidata’s long term goal is to turn this document into something that is amenable to trial automation and prediction tasks. As such we are looking for individuals to help us improve the way we’re digitizing protocols
Goal: Digitizing the Schedule of Assessments (SoA), the master table in the clinical trial protocol that defines patient procedures and timings.Problem: Crucial information is often buried in unstructured, non-standardized footnotes.Solution: Engineering a pipeline using LLMs to extract these footnotes, resolve their references, and transform them into structured data
Qualification
Required
Current student at a US University in the quantitative sciences
Significant programming experience in a quantitative field such as Computer Science, Statistics, Computational Biology/Chemistry, Physics or Mathematics
Familiarity with standard software engineering practices such as using source control
Preferred
Benefits
Dassault Systèmes is a catalyst for human progress.