Senior Systems Operations Engineer
Confirmed live in the last 24 hours
DistroKid
Compensation
$155,000 - $180,000/year
Job Description
At DistroKid, we help artists share their music with the world.
Location: Remote (USA, Canada, United Kingdom, Europe)
Sponsorship: Not available. We cannot support visas, work permits, or extensions in any country (including OPT/CPT, PGWP, Graduate Route, or similar programs).
Salary: Varies by region — see details below
Summary
We are seeking a highly skilled Senior Systems Operations Engineer with deep expertise in cloud infrastructure, Infrastructure-as-Code (IaC), and AI-enhanced operations. This role is a critical technical leadership position on the Systems Operations (SysOps) team, responsible for architecting and managing our cloud environment, driving IaC maturity, and integrating AI-powered practices that improve reliability, reduce toil, and scale our operational capabilities.
You will serve as a subject matter expert in infrastructure domains, own complex workstreams end-to-end, and partner strategically with peers, engineering teams, and guidance to deliver impactful outcomes across the organization. This is a fully remote position, and success in the role depends on clear, open, and proactive communication to keep distributed teammates informed, aligned, and unblocked.
What You’ll Do
Cloud & Infrastructure Architecture
- Design, deploy, and manage scalable and highly available cloud infrastructure on AWS, with deep expertise in core services (EC2, EKS, S3, RDS, IAM, VPC, and beyond).
Develop and maintain disaster recovery plans leveraging AWS capabilities for backup and replication to ensure business continuity.
Collaborate with engineering and security teams to improve infrastructure health, security, and long-term scalability.
Infrastructure as Code (IaC)
- Design reusable Terraform/OpenTofu modules following DRY principles and organizational standards; implement module versioning and lifecycle strategies.
- Direct the migration of manual infrastructure to code; establish patterns and best practices for IaC adoption across the team.
- Implement IaC testing strategies, including validation, linting, and integration testing, using tools such as Terraform-Compliance or Checkov.
- Architect and maintain complex pipeline configurations for multi-environment IaC deployments; implement pipeline security best practices.
AI-Enhanced Operations (AIOps)
- Implement AIOps practices, leveraging AI tools to enhance monitoring, incident response, and predictive alerting.
- Use AI-assisted development and operations tools to accelerate troubleshooting, code review, and documentation generation.
- Evaluate and implement AI-powered automation to reduce operational toil, improve repeatability, and scale platform capabilities.
Reliability & Observability
- Define and implement SLOs for services; guide and/or participate in incident response and conduct blameless postmortems.
- Implement chaos engineering practices to proactively identify system weaknesses before they impact production.
- Build and maintain comprehensive monitoring solutions using tools such as CloudWatch and Datadog to track performance and drive optimization.
Automation, Developer Experience & Internal Developer Portal
- Develop automation scripts and tools in Python, Bash, or similar languages to streamline operations and eliminate manual toil.
- Build self-service capabilities for development teams to reduce cognitive load and enable developer autonomy across the organization.
Cost Optimization
- Guide cost optimization initiatives; implement rightsizing recommendations, reserved-capacity strategies, and tagging standards for cost allocation.
- Monitor and optimize AWS resource usage; select appropriate services and configurations to meet performance requirements cost-effectively.
Technical Leadership & Collaboration
- Direct planning, decision-makin
Similar Jobs
CoreWeave
Senior Business Systems Engineer – Supply Chain Systems
Wells Fargo
Senior Systems Operations Engineer
Wells Fargo
Senior Systems Operations Engineer - Major Problem Management
Anduril Industries
Senior Procurement Engineer, Intelligence Systems & Space (Active Clearance)
MetroStar Systems
Sr. DevSecOps Engineer I (6490)
MetroStar Systems