Catalent, Inc. is a leading global contract development and manufacturing organization dedicated to improving health outcomes. They are seeking a part-time AI Operations Intern to assist scientists in research activities, focusing on designing and developing a Retrieval-Augmented Generation (RAG) system to enhance internal knowledge management.
Under the guidance, mentorship, and oversight of Catalent Emeryville’s AI sub-team, an intern will design and develop a Retrieval-Augmented Generation (RAG) system using open-source AI tools such as Mistral AI, Ollama, and ChromaDB
Building a RAG using Python, ChromaDB, Ollama, and LLMs like Mistral AI are well-documented online from YouTube, Stack Overflow, and other developer communities. Recreating and evaluating each strategy will be a core part of the project, along with retooling it for our purposes
A RAG will be a solution to manually combing through our internal literature, documents, reports, ELNs etc. to look for answers, assays, and data. In short, it will solve a problem of internal knowledge management. Sometimes we search old emails and notebooks by keyword to find old data. A RAG will complete this process automatically
Since RAG’s knowledge base is found in the documents it references, confirming whether the information provided is accurate requires reading the referenced document to confirm it is there
The final deliverable will include a report and demonstration presented in the final week of the internship period
If there is time, further improvements on the RAG can be made, such as automated data mining and retrieval from online sources, integration of image processing, refined user interface, etc.
The AI council must approve the use of Mistral AI before the internship offer is made, and the final report will also be shared with the AI council
Other duties as assigned
Qualification
Required
High school diploma minimum required
Must commit to a set weekly part-time schedule with work hours between 8:00 a.m. to 5:00 p.m., Monday -Friday (no nights or weekends). This is an ideal position for students or anyone looking for a flexible part-time job supporting a research team
Proficiency in Python and R, with fluency in one or both, for implementing AI-driven retrieval mechanisms
Familiarity with embedding models and vector databases, such as FAISS or ChromaDB, for optimizing information retrieval
Strong problem-solving skills, particularly in algorithm design, optimization, and subprocess management. Self-motivated with a keen interest in AI-driven knowledge retrieval, capable of working independently and iterating solutions
Must have a high degree of personal and professional integrity, thrive in a fast-paced environment, have strong interpersonal and communication skills, and be able to adjust to changing situations, providing ideas and solutions. Strong enthusiasm for developing new skills and expertise
Physical Requirements: On an average day this position requires the ability to walk, sit and stand, use hands to handle or feel, reach with hands and arms at or above shoulder height and below waist height, climb or balance, stoop, kneel, crouch, or crawl; talk and hear, smell and lift up to 25 pounds. Specific vision requirements including reading of written documents, visual inspection of materials and use of computer monitor screen frequently
Preferred
Benefits
Awesome team collaboration
Defined career path and annual performance review and feedback process
Several Employee Resource Groups focusing on Diversity & Inclusion fostering an inclusive culture
Dynamic, fast-paced work environment
Positive working environment focusing on continually improving processes to remain innovative
Potential for career growth on an expanding team within an organization dedicated to preserving and bettering lives
Community engagement and green initiatives
Catalent - Blow-Fill-Seal Sterile CDMO Business is focusing on complex clinical to commercial stage formulation and manufacturing. It is a sub-organization of Catalent Pharma Solutions.