- Integrated Katacoda’s hands-on lab technology into the O’Reilly learning platform after acquisition, then designed and built the Cloud Labs provisioning infrastructure on top of it: GKE, Terraform for organizational scaffolding, async Python with Celery task orchestration and cloud provider SDKs (GCP, AWS, Azure), Postgres and Redis for state management
- Scaled Cloud Labs to deliver thousands of concurrent, session-based cloud environments per week across a multi-tenant platform, managing provisioning workflow concurrency, state consistency, and fault tolerance
- Built the observability layer with Datadog: custom metrics, dashboards, alerting, and monitoring across the full provisioning pipeline. Regularly use observability data to triage issues and improve system reliability
- Led the Cloud Labs engineering team through its GCP capability build-out: hiring, technical interviews, architecture decisions, and cross-functional collaboration with product and editorial teams
- Conceived and shipped O’Reilly’s first AI-powered Cloud Lab, an embedded coding agent (aider CLI + AWS Bedrock) enabling learners to interact with LLMs within seconds of launch. Co-developed with Editorial as a published content piece
- Built hands-on LLM learning experiences using LiteLLM, Azure OpenAI, and GCP Vertex AI, including prompt engineering tutorials designed for non-technical audiences
- Designed and managed IAM policies, access controls, and secrets rotation across multi-cloud environments
Summary
Senior engineer with 12+ years of experience and 7+ years building and operating multi-cloud provisioning infrastructure at O’Reilly Media. I design, build, and lead the systems that deliver thousands of ephemeral cloud environments per week across GCP, AWS, and Azure, spanning workflow orchestration, distributed state management, observability, and hands-on learning experiences powered by LLMs. Google Cloud certified. Open source contributor.
Experience
- Built high-performance front-end interfaces for a real-time Ultra-Wideband (UWB) network monitoring system with millisecond-latency data visualization
- Developed and deployed a hybrid mobile application for secure financial services, managing end-to-end development through app store deployment in a regulated compliance environment
Technical Skills
Cloud & Infrastructure: GCP (GKE), AWS (Bedrock), Azure (OpenAI), Docker, Kubernetes, Terraform, CI/CD, multi-tenant SaaS, provisioning workflows
Observability: Datadog (metrics, dashboards, alerting, custom integrations), logging, distributed tracing
AI/ML: LLM integration, prompt engineering, agent workflows, RAG, LiteLLM, AWS Bedrock, GCP Vertex AI
Languages: Python, Go, TypeScript, JavaScript, SQL
Backend: REST APIs, async Python, Celery, workflow orchestration, FastAPI, Django, Flask, Node.js, Postgres, Redis
Open Source: Contributor to aws-nuke (Go), cloud infrastructure automation
Education & Certifications
- B.S. Computer Science, University of Southern Indiana (2014)
- Google Cloud Associate Cloud Engineer, Issued Dec 2023, Valid through Dec 2026