Dan Arbaugh

dan@danarbaugh.com · linkedin.com/in/danarbaugh · github.com/danarbaugh

Cloud Infrastructure · AI/LLM Integration · Observability · Hands-On Learning Environments

Summary

Senior engineer with 12+ years of experience and 7+ years building and operating multi-cloud provisioning infrastructure at O’Reilly Media. I design, build, and lead the systems that deliver thousands of ephemeral cloud environments per week across GCP, AWS, and Azure, spanning workflow orchestration, distributed state management, observability, and hands-on learning experiences powered by LLMs. Google Cloud certified. Open source contributor.

Experience

O’Reilly Media, Inc. | Senior Software Engineer | Dec 2018 – Present (promoted Apr 2023)
  • Integrated Katacoda’s hands-on lab technology into the O’Reilly learning platform after acquisition, then designed and built the Cloud Labs provisioning infrastructure on top of it: GKE, Terraform for organizational scaffolding, async Python with Celery task orchestration and cloud provider SDKs (GCP, AWS, Azure), Postgres and Redis for state management
  • Scaled Cloud Labs to deliver thousands of concurrent, session-based cloud environments per week across a multi-tenant platform, managing provisioning workflow concurrency, state consistency, and fault tolerance
  • Built the observability layer with Datadog: custom metrics, dashboards, alerting, and monitoring across the full provisioning pipeline. Regularly use observability data to triage issues and improve system reliability
  • Led the Cloud Labs engineering team through its GCP capability build-out: hiring, technical interviews, architecture decisions, and cross-functional collaboration with product and editorial teams
  • Conceived and shipped O’Reilly’s first AI-powered Cloud Lab, an embedded coding agent (aider CLI + AWS Bedrock) enabling learners to interact with LLMs within seconds of launch. Co-developed with Editorial as a published content piece
  • Built hands-on LLM learning experiences using LiteLLM, Azure OpenAI, and GCP Vertex AI, including prompt engineering tutorials designed for non-technical audiences
  • Designed and managed IAM policies, access controls, and secrets rotation across multi-cloud environments
Ciholas, Inc. | Software Engineer | Apr 2017 – Dec 2018
  • Built high-performance front-end interfaces for a real-time Ultra-Wideband (UWB) network monitoring system with millisecond-latency data visualization
Springleaf Financial Services (OneMain Financial) | Programmer Analyst | Sep 2014 – Apr 2017 (promoted Aug 2015)
  • Developed and deployed a hybrid mobile application for secure financial services, managing end-to-end development through app store deployment in a regulated compliance environment

Technical Skills

Cloud & Infrastructure: GCP (GKE), AWS (Bedrock), Azure (OpenAI), Docker, Kubernetes, Terraform, CI/CD, multi-tenant SaaS, provisioning workflows

Observability: Datadog (metrics, dashboards, alerting, custom integrations), logging, distributed tracing

AI/ML: LLM integration, prompt engineering, agent workflows, RAG, LiteLLM, AWS Bedrock, GCP Vertex AI

Languages: Python, Go, TypeScript, JavaScript, SQL

Backend: REST APIs, async Python, Celery, workflow orchestration, FastAPI, Django, Flask, Node.js, Postgres, Redis

Open Source: Contributor to aws-nuke (Go), cloud infrastructure automation

Education & Certifications

  • B.S. Computer Science, University of Southern Indiana (2014)
  • Google Cloud Associate Cloud Engineer, Issued Dec 2023, Valid through Dec 2026