Corteva Agriscience is a company focused on building the future of agriculture through innovative technologies. They are seeking a Data Science Intern to work within their AI and Data Science organization, applying AI technologies to accelerate formulation development and collaborating with various scientific disciplines.
Build LLM Agents: Create specialized LLM applications using domain-specific fine-tuning and RAG systems trained on chemistry literature, patents, safety data sheets, regulatory documents, and internal research data
Apply AI/ML to Formulation Development: Develop and deploy generative AI and machine learning models to optimize formulations
Automate Workflows: Create AI-powered tools to automate data extraction from experimental reports, literature mining for chemical information, and analysis of analytical chemistry data (NMR, MS, HPLC)
Collaborate Across Disciplines: Partner with chemists, analytical scientists, formulation scientists, and agronomists to understand their challenges and develop AI solutions that address real-world crop protection problems
Present Impact: Deliver project reviews throughout the summer and a final presentation showcasing how your AI/ML work has advanced crop protection chemistry research
Qualification
Required
Currently pursuing a Bachelor's, Master's, or Doctorate degree in Computer Science, Machine Learning, Artificial Intelligence, Computational Chemistry, Chemical Engineering, Chemistry, Material Science, Polymer Science or a related technical field
Must have completed at least three years of undergraduate work before the start of the internship
GPA of 3.0 or better
Must be able to relocate to Indianapolis, Indiana for the duration of the internship
Must be able to work full-time (40 hours per week) for at least 10 weeks during the timeframe of May to August
Strong proficiency in Python with experience in AI/ML libraries (PyTorch, TensorFlow, scikit-learn, pandas, NumPy)
Hands-on experience with large language model APIs (OpenAI, Anthropic, Google, or open-source models) and understanding of LLM applications
Demonstrated ability in designing prompts for technical/scientific tasks and optimizing model outputs for domain-specific applications
Understanding of Retrieval-Augmented Generation concepts including vector embeddings, semantic search, chunking strategies for scientific literature
Proficiency with SQL and experience in data manipulation, cleaning, and preprocessing of complex technical datasets
Understanding of data structures, algorithms, statistical methods, and their application to experimental data
Strong ability to communicate with both AI/ML experts and chemistry/crop protection domain specialists
Analytical mindset with curiosity about both chemistry and AI, and willingness to learn domain-specific concepts
Preferred
Experience with LLM application frameworks (LangChain/Langgraph, LlamaIndex) and their application to scientific domains
Knowledge of fine-tuning techniques (LoRA, QLoRA) for domain adaptation
Familiarity with vector databases (Pinecone, Weaviate, Milvus, Chroma, FAISS)
Experience with cloud platforms (AWS, Azure, GCP) and their AI/ML services
Understanding of MLOps/LLMOps practices including CI/CD, version control (Git), and model deployment
Basic understanding of organic chemistry, analytical chemistry, or formulation science concepts
Familiarity with cheminformatics tools and libraries (RDKit, DeepChem, Open Babel, ChemPy)
Experience working with chemical structure representations (SMILES, InChI, molecular fingerprints, graph neural networks)
Understanding of chemical databases (PubChem, ChEMBL, SciFinder) and scientific literature repositories
Benefits
Corteva Agriscience provides agronomic support and services to help increase farmer productivity and profitability.