Hector Castro's CV

Email: hector@castro.io
Location: Philadelphia, PA, USA
Website: hector.dev
LinkedIn: hectcastro
GitHub: hectcastro

Summary

Senior platform and reliability engineer with nearly two decades of experience designing cloud-native systems, improving observability, and building AWS infrastructure that helps teams ship safely at scale. I work hands-on in application code, AWS, CI/CD, and operational tooling to improve reliability and reduce risk. I also bring a track record of team building, process transformation, and driving engineering efficiency.

Skills

Languages: Python, Go, TypeScript, Ruby, Bash
AWS Services: API Gateway, Batch, ECS, EKS, ElastiCache, IAM, Kinesis, Lambda, RDS/Aurora, S3, VPC
Infrastructure as Code: Terraform, Ansible, AWS CloudFormation, AWS SAM
Delivery & Automation: GitHub Actions, Concourse CI, Docker, Packer
Observability & Security: Datadog, Snyk, Dependabot, Renovate
Databases & Caching: PostgreSQL, MySQL, Redis, Memcached, Riak

Experience

Umbra Space, Senior DevOps Engineer

Sept 2025 – Present
Santa Barbara, CA (Remote)
Built a cross-team service health framework with Datadog Scorecards, using daily automated checks across 30+ services and four teams to surface conformance gaps and create a system the organization can extend as engineering standards continue to mature.
Rolled out Datadog Data Streams Monitoring across 20+ Kafka-enabled services, improving observability and giving engineers a clearer view of Kafka topology and service dependencies across the platform.
Improved the organization’s security posture by rolling out Renovate, rebuilding SAST and SCA checks in CI, publishing findings to Datadog for security-team self-service, and tightening container-focused scans to improve signal quality across all services and libraries.

NBCUniversal, Principal Engineer

Nov 2021 – Aug 2025
New York, NY (Remote)
Cut Mean Time to Recovery (MTTR) by an estimated 40% by introducing Datadog-based logs, metrics, and distributed tracing that gave engineers shared visibility into Lambda cold starts, database connection issues, and VPC misconfigurations.
Improved platform reliability and maintainability with a dedicated technical debt framework that enabled Angular upgrades off EOL versions and major Aurora MySQL database upgrades.
Created and rolled out an architectural decision-making framework across a 12-person engineering team spanning two squads, helping the team make faster, clearer decisions. Eventually, this framework was adopted by other teams in the broader organization.
Turned security adoption into an engineering capability by piloting Snyk across five repositories, resolving hundreds of existing findings, and adding Dependabot plus CI security gates to maintain the baseline.

Opentrons Labworks, Inc., Senior Site Reliability Engineer

Apr 2021 – Oct 2021
Long Island City, NY (Remote)
Scaled critical COVID-19 testing infrastructure during the pandemic from a single site to a bi-coastal serverless architecture, doubling automated PCR processing capacity.
Built container-based tooling via Amazon ECS to automate database schema migrations safely and repeatably across development, staging, and bi-coastal production environments.
Built a serverless AWS Lambda pipeline to unify thousands of daily operational events into a shared data lake, giving the analytics team direct access to testing throughput data for staffing and planning.

Azavea, Inc., Vice President of Engineering

Jan 2017 – Jan 2021
Philadelphia, PA
Established an ADR-based technical decision framework across a 40-person engineering organization, giving teams a reusable process for navigating complex architecture choices.
Redesigned engineering hiring systems through apprenticeships, standardized technical assessments, and stronger interview practices, improving candidate-role fit and converting over 60% of apprentices into full-time engineers.
Built organizational systems for growth and retention by introducing a clear engineering career ladder and stronger performance processes for engineering teams.
Launched cross-functional working groups that translated organizational friction into systems changes, including salary band recalibration for equity and sharper focus on high-potential market areas.

Azavea, Inc., Senior DevOps Engineer

July 2014 – Jan 2017
Philadelphia, PA
Founded and grew the company's first infrastructure team from one engineer to four, creating the internal platform capability that supported engineering growth from 20 to 50 engineers.
Built a reproducible delivery pipeline with Terraform and Docker used by 5+ teams across 10+ projects, cutting deployment lead time from hours to minutes and improving production deployment reliability.
Maintained open-source infrastructure modules for Papertrail, Spark, and AWS Certificate Manager that each earned hundreds of GitHub stars and strengthened both delivery practices and recruiting pipelines.
Replaced ad hoc incident response with a blameless post-mortem process adopted by all engineering teams, reducing incidents by approximately 50% and turning outages into repeatable operational learning.

Basho Technologies, Developer Advocate

Jan 2013 – July 2014
Cambridge, MA (Remote)
Led technical pre-sales engagements with Fortune 500 and startup engineering teams, contributing to an estimated 20% improvement in sales efficiency through proof-of-concepts, technical presentations, and direct engineering conversations.
Built and promoted adoption tooling for Riak through official Chef cookbooks and Docker, Vagrant, Datomic, and Omnibus integrations that lowered the barrier to distributed-database adoption.

Education

Temple University, B.S. in Computer Science

Sept 2003 – May 2007
Philadelphia, PA