Ancestry is a web-based platform that helps its users to create their own family tree and help them preserve and share their family history. They are seeking a Data Science Co-Op to join their Content AI team, where the role involves developing innovative AI models for Document Understanding to extract and organize information from historical records.
Implement and experiment with cutting-edge transformer and generative AI solutions for key Document Understanding tasks, including OCR, handwriting recognition, transcription, Named Entity Recognition (NER), Relation Extraction (RE), Coreference Resolution, Summarization, and Knowledge Graphs working with diverse genealogical and historical collections spanning newspapers, city directories, family history books, and vital records (birth, marriage, death).
Evaluate the performance of multi-modal models in zero-shot and few-shot learning scenarios for comprehensive document understanding.
Partner closely with ML Ops and Data Science Engineers to seamlessly deploy datasets, truth sets, models, and pipelines for training and inference in cloud environments.
Clearly and confidently present your findings, deliverables, and proposed solutions to technical and non-technical audiences, including teams, stakeholders, and executives.
Qualification
Required
Currently pursuing an advanced degree in Computer Science, Data Science, Statistics, Mathematics, Linguistics, Engineering or related quantitative field with a strong data focus.
Specialization in generative AI & LLMs, embeddings, LoRA, QLoRA, vector databases, transformer models, Natural Language Processing (NLP), with software development expertise including data structures, distributed model training, and inference optimizations.
Exhibit strong proficiency in Python and relevant tools and libraries, including those for transformer models, multi-modal models, and general NLP (e.g., Hugging Face Transformers, agentic frameworks and workflows, LangChain, LangGraph, NLTK).
Preferred
Master's or PhD preferred in Computer Science, Data Science, Statistics, Mathematics, Linguistics, Engineering or related quantitative field with a strong data focus.
Familiarity with cloud platforms and related AI/ML services such as Google Gemini API, Vertex AI, AWS EC2, S3, SageMaker, Model Registry, and Bedrock.
Benefits
Ancestry is a web-based platform that helps its users to create their own family tree and help them preserve and share their family history.