Brett Michaelis

Summary

MLOps & Infrastructure Engineer with 10+ years building and operating cloud-native systems at scale, including hands-on experience designing data pipelines and GPU-accelerated CUDA compute infrastructure for production ML workloads. Skilled in Kubernetes, Terraform, and CI/CD automation across GCP and AWS, with a track record of close collaboration with ML engineers and data scientists to bridge research requirements and production infrastructure. Strong observability foundation (Prometheus, Grafana, Mimir) and automation bias—replacing manual processes with reliable, repeatable systems that let ML teams ship and iterate with confidence.

Core Skills

Cloud Platforms: GCP, AWS, Azure, Hetzner, UpCloud, Tier.Net
Containers & Orchestration: Kubernetes, Helm, Docker, Nomad
ML Infrastructure: GPU/CUDA compute provisioning, GCP ML pipelines, large-scale data lake architecture, canary deployments
Infrastructure as Code & Automation: Terraform, Bash, Python, Go, Saltstack
CI/CD: GitLab CI, GitHub Actions, Bitbucket Pipelines, Jenkins
Observability & Reliability: Prometheus, Grafana, Mimir, Alloy, TICK stack; SLO/SLI practices; incident response (Five Whys, AAR)
Languages: Python, Go, JavaScript, Node.js
Security & Compliance: Experience supporting SOC 2 and GDPR requirements in cloud-native deployments

Experience

Operations Engineer 03/2025 - Present

Smarty.com | Orem, UT

Leading migration of a legacy Grafana observability platform to a GitOps-managed deployment, auditing and rationalizing all alerting across production services as part of the initiative.
Operate observability stack using Prometheus, Grafana, Mimir, and Alloy for metrics collection, long-term storage, and dashboarding.
Implement canary deployments via Nomad for progressive production rollouts, enabling confident releases with automated rollback.
Driving company-wide migration from Bitbucket to GitHub, including full re-implementation of all CI/CD workflows in GitHub Actions.
Manage multi-cloud deployments (Tier.Net, UpCloud, GCP, AWS, Hetzner) with Terraform, Nomad, and Bitbucket Pipelines, improving uptime and deployment velocity.
Automate repetitive workflows with Bash and Go, reducing manual toil across operations.

Senior DevOps Engineer 04/2020 - 10/2024

Five9.com

Orchestrated multi-cloud deployments on GCP using Kubernetes, Helm, and Terraform to support high-availability SaaS workloads.
Built self-service deployment tooling and automation, enabling engineering teams to provision and release independently while preserving platform standards.
Partnered with product and engineering teams to define and track SLOs/SLIs, supporting customer-facing uptime goals.
Streamlined incident response with Five Whys, improving on-call processes and reliability through blameless postmortems.

Software Engineer / DevOps Engineer 03/2017 - 04/2020

Vivint SmartHome

Designed and operated GCP-based ML pipeline infrastructure supporting data science and ML engineering teams, including large-scale data ingestion, processing, and storage across a 1.5 PB data lake.
Provisioned and managed GPU-accelerated CUDA compute clusters for ML model training workloads, optimizing resource allocation and job throughput.
Collaborated directly with ML engineers and data scientists to translate research requirements into scalable, production-ready infrastructure.
Developed and deployed Golang-based microservices, optimizing performance and reducing latency.
Automated infrastructure operations with Saltstack and Jenkins, establishing observability via TICK stack across ML and application workloads.

Director, IT & Software Development 2011 - 2017

Unicity International

Led global infrastructure modernization, migrating legacy apps to containerized, cloud-native environments.
Standardized multi-cloud deployments (AWS EC2, S3) to improve scalability and global availability.
Introduced reliability practices, including error budgeting and deployment automation.

Assistant Director, Web Development 2005 - 2010

Utah Valley University

Directed university-wide web development projects, improving service reliability and scalability for mission-critical systems.

Counterintelligence Agent 1998 - 2006

U.S. Army – Utah National Guard

Conducted secure intelligence operations, leveraging structured incident response and AAR methods.