Summary
Senior Site Reliability Engineer with 10+ years architecting and operating hybrid cloud and bare metal
infrastructure at scale. Deep experience building Kubernetes platforms across AWS, GCP, and private data
centers, with hands-on GPU/CUDA workload management including bare metal server provisioning via PXE boot
and Saltstack. Track record of direct collaboration with ML engineers and data scientists to build the
infrastructure foundation that accelerates AI development. Strong observability stack (Prometheus, Grafana,
Mimir), IaC discipline (Terraform), and an automation-first approach to platform reliability and
self-service.