Senior Platform Engineer (Cloud Workloads)
Confirmed live in the last 24 hours
Veeam Software
Job Description
Veeam is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale. As the market leader in both data resilience and data security posture management, Veeam is built for the convergence of identity, data, security, and AI risk. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 550,000 customers worldwide, who trust Veeam to keep their businesses running. Join us as we go fearlessly forward together, growing, learning, and making a real impact for some of the world’s biggest brands.
About the Role
We are looking for a Senior Platform Engineer to join the Workload team within the Veeam R&D Department. You will own critical observability infrastructure, drive incident response maturity, and help scale proactive support capabilities as operational accountability.
What You’ll Do
- Design, build, and maintain observability pipelines using the Elastic Stack (Elasticsearch, Kibana, Fleet) across Azure and AWS workloads
- Develop and own SLO/SLI dashboards and error budget reporting for BaaS platform services
- Respond to and lead incident response for distributed, multi-tenant cloud workloads; own runbook creation, maintenance, and continuous improvement
- Build and refine proactive support tooling, including pattern analysis, tenant correlation dashboards, and baseline deviation alerting, to reduce reactive support burden
- Manage and maintain Elastic Fleet agent policies, enrollment health, and log streaming pipelines across Azure and AWS worker fleets
- Partner with SRE, R&D, and Proactive Support teams to close observability gaps, including tenant identification workflows and admin portal integrations
Technologies we work with
- Elastic Stack — Elasticsearch, KibanaElastic Fleet, KQL, Query DSL
- Azure Kubernetes Service (AKS), Azure Container Apps, VMs
- Azure Security — Entra ID, Managed Identities (user/system assigned), App Registrations, Key Vault
- Infrastructure as Code — Azure Bicep, Terraform, or Pulumi
- CI/CD — Azure DevOps, GitHub Actions
- ITSM tooling — ServiceNow, Salesforce, Jira, Incident.io (for tenant and incident workflows)
What You’ll Bring
- 5+ years of experience in cloud platform engineering, SRE, or infrastructure roles supporting commercial SaaS products
- Deep hands-on experience with Elastic Stack: Building dashboards, writing KQL/Query DSL, managing Fleet
- Proven experience operating and troubleshooting distributed, multi-tenant workloads on Azure and/or AWS
- Strong understanding of Azure cloud services: AKS, Entra ID, Key Vault, Service Bus, Cosmos DB, Private Endpoints, etc.
- Experience with incident response in production cloud environments, including runbook development and post-incident review
- Experience with IaC tools (Azure Bicep, Terraform) and CI/CD pipelines (Azure DevOps, GitHub Actions)
- Strong scripting skills in Bash, Python, or PowerShell
- Ability to work cross-functionally with SRE, product, and customer-facing support teams
&l
Similar Jobs
Onto Innovation
Senior Systems Engineer
Nasdaq
Senior Specialist Software Engineer - Regulatory Reporting (Axiom Platform)
Nasdaq
Lead Software Engineer (Python/AI)
Northern Trust
Associate, Software Engineer
Nasdaq
Senior Specialist - Software Engineering (AWS & Java)
Nasdaq