About the role
Mattermost is seeking an experienced and visionary Lead Site Reliability Engineer (SRE) to guide the architecture, reliability, and operational excellence of the infrastructure powering our secure, mission-critical collaboration platform.
In this role, you will provide technical leadership across our SRE function, driving strategic initiatives for scalability, observability, performance, and automation across cloud and hybrid environments. You will mentor engineers, establish best practices, and collaborate closely with development, security, and operations teams to ensure our customers in defense, government, and critical infrastructure sectors experience exceptional reliability and performance.
Responsibilities Include:
- Define the strategy, architecture, and roadmap for Mattermost’s site reliability engineering function, aligning infrastructure initiatives with product and business goals.
- Lead the design, deployment, and optimization of production-grade containerized workloads, infrastructure-as-code, and compliant cloud environments for regulated domains (e.g., FedRAMP, DoD).
- Establish and evolve observability, monitoring, and alerting frameworks to ensure performance, reliability, and capacity planning at scale.
- Drive incident management processes, including on-call rotations, root cause analysis, and systemic reliability improvements.
- Partner with security and compliance teams to meet data sovereignty, security, and regulatory requirements.
- Champion automation and operational excellence to improve efficiency, reduce risk, and scale operations.
- Oversee cloud cost management and capacity planning to optimize infrastructure spending while meeting performance targets.
- Build and maintain a developer platform that enables fast, secure software delivery and improves application stability in production.
- Mentor and coach SRE team members, fostering a culture of learning, collaboration, and technical excellence.
Requirements:
- BS in Computer Science, Cybersecurity, Software Engineering, or a related technical field, or equivalent experience, with 5+ years of relevant experience in site reliability engineering, DevOps, or cloud infrastructure roles.
- Proven expertise in container orchestration platforms, ideally Kubernetes.
- Extensive experience with infrastructure-as-code, ideally Terraform.
- Strong background in cloud platforms, ideally AWS.
- Demonstrated experience designing and implementing monitoring, alerting, and performance optimization strategies.
- Exceptional troubleshooting and incident management skills for distributed systems.
- Proficiency in at least one scripting or programming language for automation.
- Excellent communication skills with a track record of influencing cross-functional teams.
- Experience leading globally distributed teams in a remote-first environment.
Preferences:
- Familiarity with observability stacks such as Grafana and Prometheus.
- Experience designing high-availability, disaster recovery, and scaling architectures.
- Exposure to GCP and Azure cloud environments.
- Leadership experience in highly regulated industries such as defense, finance, or critical infrastructure.
- Experience with U.S. federal compliance frameworks and authorization processes, including FedRAMP, DoD ATO, NIST 800-53, and related government standards.
- Experience preparing, delivering, and maintaining software offerings through AWS Marketplace and other cloud provider marketplaces (e.g., Azure Marketplace, Google Cloud Marketplace), including packaging, compliance validation, and ongoing operational support.
- Open-source contributions in reliability, DevOps, or infrastructure tooling.
- Certifications in cloud infrastructure, reliability, or DevOps engineering (e.g., CKA, CKAD, AWS Certified Solutions Architect).
Compensation
Salary range: $145,000 – $200,000
Mattermost takes a market-based approach to pay. Compensation is determined based on skills, experience, qualifications, and work location. Ranges may be updated as market conditions evolve..
U.S. Eligibility & Compliance
This role may require obtaining and maintaining a U.S. government security clearance. Candidates must meet federal eligibility requirements to be considered. For more information visit Security Clearances — United States Department of State
Applicants must meet eligibility requirements for access to export-controlled information as defined by U.S. export control laws, including EAR and ITAR. For more information visit the Bureau of Industry and Security and the Directorate of Defense Trade Controls.
Aplyr's read
Mattermost is an open-source collaboration platform attracting tech-savvy professionals focused on secure, customizable team communication and project management solutions.
What's promising
- •Mattermost's open-source nature allows for high customization and flexibility.
- •The platform is designed with a strong focus on security, appealing to defense and government sectors.
- •Mattermost integrates seamlessly with a wide array of tools, enhancing productivity.
What to watch
- •The open-source model may require more technical expertise from users.
- •Competition from established platforms like Slack and Microsoft Teams is intense.
- •Limited public information about the company's financial health and growth trajectory.
Why Mattermost
- •Mattermost offers a self-hosted option, ensuring data control and privacy.
- •The platform is particularly tailored for high-security environments, such as government and defense.
- •Mattermost's open-source community actively contributes to its development and innovation.
Aplyr’s read is generated by AI from public sources. Was it useful?
About Mattermost
Mattermost is an open-source collaboration platform designed for team communication and project management. It provides messaging, file sharing, and integrations with various tools to enhance productivity.
Similar roles
Senior Site Reliability Engineer (Auth0)
Okta
Senior Site Reliability Engineer (Auth0)
Okta
Staff Site Reliability Engineer- Splunk Expert
Okta
Site Reliability Engineer, Discovery
Anduril Industries
Site Reliability Engineer, Discovery
Anduril Industries
Senior Site Reliability Engineer, Production Engineering
Anduril Industries