Software Engineer, Fleet Health Instrumentation Intern - Fall 2025
US, CA, Santa Clara
Internship
Onsite
$18/hr - $71/hr
Intern
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. They are seeking an intern to design, prototype, and ship high-impact features that ensure the reliability of NVIDIA's GPU-accelerated platforms.
Responsibilities
Design and build software that collects, transforms, and publishes health data about our global GPU fleet.
Develop micro-services and data pipelines in Go or Python that ingest and normalize data from many diverse sources—routing millions of records per day (Kafka, Airflow, Kinesis).
Instrument production infrastructure and workloads running on Kubernetes and bare-metal clusters; add tracing and metrics hooks for deeper insights.
Automate deployments and testing with CI/CD (GitLab, Argo) and IaC (Terraform), ensuring repeatable, low-touch releases.
Participate in the full lifecycle of cloud services—from design docs and code reviews through deployment, monitoring, and continuous improvement.
Collaborate with other engineers to debug live issues and turn post-incident insights into durable code fixes.
Contribute to internal tooling and dashboards that help engineers visualize fleet health, utilization, and capacity trends.
Qualification
Required
Actively pursuing a BS or MS in Computer Science, Computer Engineering, or a closely related quantitative field (e.g., Physics or Mathematics).
Solid understanding of distributed‑systems fundamentals, modern software‑engineering practices, and data‑modeling principles.
Proficiency in at least one programming language—preferably Python or Go.
Working knowledge of Linux, basic networking concepts, and Kubernetes container orchestration.
Preferred
A systematic, analytical problem‑solving approach paired with clear written and verbal communication skills and a strong sense of ownership.
Demonstrated ability to debug, optimize, and automate code or workflows with minimal guidance.
Hands‑on experience building, deploying, and operating services in a public‑cloud or large on‑prem environment.
Benefits
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.