NVIDIA-logo
NVIDIA
·
May 16, 2025
Apply Now
This job has closed.

Software Engineer, Fleet Health Instrumentation Intern - Fall 2025

US, CA, Santa Clara
Internship
Onsite
$18/hr - $71/hr
Intern
NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI. They are seeking an intern to design, prototype, and ship high-impact features that ensure the reliability of NVIDIA's GPU-accelerated platforms.

Responsibilities

  • Design and build software that collects, transforms, and publishes health data about our global GPU fleet.
  • Develop micro-services and data pipelines in Go or Python that ingest and normalize data from many diverse sources—routing millions of records per day (Kafka, Airflow, Kinesis).
  • Instrument production infrastructure and workloads running on Kubernetes and bare-metal clusters; add tracing and metrics hooks for deeper insights.
  • Automate deployments and testing with CI/CD (GitLab, Argo) and IaC (Terraform), ensuring repeatable, low-touch releases.
  • Participate in the full lifecycle of cloud services—from design docs and code reviews through deployment, monitoring, and continuous improvement.
  • Collaborate with other engineers to debug live issues and turn post-incident insights into durable code fixes.
  • Contribute to internal tooling and dashboards that help engineers visualize fleet health, utilization, and capacity trends.

Qualification

Required

  • Actively pursuing a BS or MS in Computer Science, Computer Engineering, or a closely related quantitative field (e.g., Physics or Mathematics).
  • Solid understanding of distributed‑systems fundamentals, modern software‑engineering practices, and data‑modeling principles.
  • Proficiency in at least one programming language—preferably Python or Go.
  • Working knowledge of Linux, basic networking concepts, and Kubernetes container orchestration.

Preferred

  • A systematic, analytical problem‑solving approach paired with clear written and verbal communication skills and a strong sense of ownership.
  • Demonstrated ability to debug, optimize, and automate code or workflows with minimal guidance.
  • Hands‑on experience building, deploying, and operating services in a public‑cloud or large on‑prem environment.

Benefits

NVIDIA is a computing platform company operating at the intersection of graphics, HPC, and AI.
Glassdoor
4.6
Founded in 1993
Santa Clara, California, USA
10001+ employees
https://www.nvidia.com

Similar Job