About the role

Work Mode: Flex 1+ days per week in our Dublin office

Department: Engineering

About the Company

LearnUpon partners with over 1,600 organisations globally to unlock the potential of employees, customers & members through learning that’s easy, scalable and focused on results.

About the Team

Our Engineering organization is dedicated to building robust, scalable infrastructure that handles world-scale platform demands. As part of the Site Reliability Engineering (SRE) team, we focus on system architecture, absolute performance, and technical innovation. Operating with high ownership and technical expertise, we are responsible for the scale-out of the LearnUpon infrastructure, championing internal self-service tooling, and embedding a culture of observability and shared operational responsibility across all engineering squads.

About the Opportunity

As a Staff Site Reliability Engineer, you will be a principal technical leader and a key catalyst for our infrastructure's evolution. In this role, you will take ownership of our core platform resilience, driving the strategy to build out an advanced, cost-effective observability function spanning metrics, logs, and transaction tracking. This opportunity requires a strategic thinker who can design cross-team SLO/SLI frameworks, navigate complex distributed system requirements, and mentor talent to ensure LearnUpon scales efficiently to support our ambitious global goals.

In addition, you’ll be responsible for:

Infrastructure Optimization: Identify opportunities to improve and scale our infrastructure for performance, observability, maintainability, and cost, by creating innovative solutions.
Observability Function Strategy: Lead our efforts to build an observability function that incorporates application metrics, application transaction tracking, and event log management.
Resilience & Scaling: Drive the processes to maintain resilient, scalable, and cost-effective infrastructure while working with other Engineering teams to provide solutions that meet their ongoing requirements.
Tooling & Self-Service: Build tools focused on measuring, monitoring, and alerting, with an eye towards self-service in order to promote Engineers’ ownership of observability.
Operational Agility & Support: React quickly to changing customer and business needs and actively participate in the team's on-call rota.
Team Up-Leveling: Mentor junior talent and effectively communicate complex technical ideas to both technical and non-technical peers.

Skills & Experience

Must-Haves

7+ years of experience in a software or Ops role.
5+ years of cloud engineering experience, with at least 2 years of experience with AWS.
Experience deploying Microservice environments using containerisation technologies such as Kubernetes and Docker.
Experience designing and implementing Observability tech stacks, championing its benefits to Engineering teams, and managing the associated cost analysis of metrics gathering, effort, and tooling.
Ability to architect the design of SLO/SLI implementations that balance the needs of different teams.
Experience building and supporting large-scale distributed systems that back a consumer app or website with associated requirements of performance, security, and disaster recovery.
Experience with implementing IaC (e.g., CloudFormation, Terraform, etc.), automation tooling (e.g., Puppet, Ansible etc.), and CI/CD (e.g., Jenkins, Travis CI, GitLab, etc.).
Experience using AI tools to streamline tasks and improve efficiencies.

Nice-to-Haves

Experience with database scaling would be a strong plus.
Certification in AWS, any PaaS, and/or related technologies.

*If you don’t tick every box but believe this role is a mutually good fit, please don’t hesitate to apply. We’d love to hear from you.

Why choose LearnUpon?

From comprehensive rewards and generous time off to meaningful investment in your growth and development, LearnUpon gives you the support, trust, and opportunity to do the most impactful work of your career.

Learn more here.

Hiring Process

Qualified applicants may be invited to an initial screening call with a member of our TA Team.
Successful candidates will be invited to a series of practical interviews.
Finally, candidates will have an interview with our CTO.
Successful candidates will be contacted with an offer to join our team.

Note: At LearnUpon, we utilise AI to enhance the speed and quality of our screening and assessment practices, but our hiring decisions are always human.

If you need any accommodations during the hiring process, please reach out to us at peopleops@learnupon.com.

LearnUpon is an Equal Opportunities Employer.

We do not discriminate on the basis of gender, marital status, family status, age disability, sexual orientation, race, religion, membership of the Traveller community, or any other legally protected status.

Check out our Careers site and Instagram to learn more about working at LearnUpon.

By submitting your application, you agree to LearnUpon's Privacy Policy.

Skills & Tags

react node go rust aws kubernetes docker ai

Aplyr's read

LearnUpon is a dynamic player in the e-learning sector, attracting professionals keen on shaping innovative training solutions for organizations worldwide.
Synthesized from recent postings & public sources

What's promising

•Strong growth trajectory in the e-learning space with increasing demand for LMS solutions.
•Diverse roles indicate a commitment to expanding both technical and customer-facing teams.
•Focus on security and reliability with recent hires in information security and site reliability engineering.

What to watch

•Competitive market with numerous established LMS providers could pressure growth.
•Potentially high workload in fast-paced environment as the company scales.
•Limited public information about company culture and employee satisfaction.

Why LearnUpon

•Emphasis on customer success with multiple hires dedicated to enhancing client relationships.
•Investment in strategic events marketing suggests a focus on brand visibility and networking.
•Cloud-based platform offers scalable solutions, appealing to organizations of varying sizes.

Aplyr’s read is generated by AI from public sources. Was it useful?