Senior Platform Engineer (f/m/d)
Confirmed live in the last 24 hours
Moss
Job Description
At Moss, we give finance professionals the power to automate their day-to-day and make forward-thinking decisions.
Our team and culture make us unique — we’re driven by impact and growth, where every one of us strives to learn and excel. Recognised by Sifted’s Rising 100 and LinkedIn's Top Startups, we’re here to help propel your career and together, make Moss a lasting success.
As a Senior Platform Engineer, you will join our core Platform team that designs, builds, and maintains the infrastructure powering Moss. You will work on critical systems that must be updated without downtime, ensuring our services remain secure, scalable, and resilient. You’ll collaborate closely with product, data, and security teams, balancing planned initiatives with incident response, cloud engineering, and regular maintenance.
Your responsibilities
Design, build, and operate cloud-native infrastructure (GKE, Kubernetes, networking, databases) supporting a high-availability, low-latency FinTech platform processing real-time payments across Europe.
Own the reliability and scalability of 100+ microservices - including defining and enforcing SLOs, managing autoscaling strategies, and driving resilience patterns like circuit breakers, bulkheads, and graceful degradation.
Lead safe, continuous deployment practices across a fully automated CD pipeline - including rollout strategies, rollback mechanisms, and deployment observability at scale.
Drive observability across the platform - metrics, distributed tracing, and structured logging - with a focus on reducing MTTR and enabling engineers to self-serve incident diagnosis.
Manage and evolve infrastructure-as-code (Terraform, Helm) with a no-ClickOps discipline - every change peer-reviewed, version-controlled, and auditable.
Champion security and compliance practices including Zero Trust architecture, Workload Identity, dynamic secrets via Vault, network policies, and audit readiness (ISO27001, SOC2).
Own incident response across networking, load balancing, Kubernetes, and cloud services - and drive post-incident improvements that prevent recurrence.
Raise the engineering bar - actively contribute to architectural decisions, review platform changes, and help grow the early-senior engineers on the team.
About you
7+ years total experience with at least 4+ years in platform, infrastructure, or SRE roles in a cloud-native environment.
Deep Kubernetes expertise - scheduling internals, autoscaling (HPA/VPA/KEDA), pod lifecycle, network policies, PodDisruptionBudgets, and multi-zone topology. Not just operational familiarity, you understand what breaks and why.
Strong grasp of microservices operational challenges at scale - service mesh, inter-service resilience patterns, connection pool management, graceful shutdown, and database migration safety in a continuous deployment model.
Solid CI/CD experience - designing pipelines for 100+ services, immutable artefact management, Workload Identity Federation, and automated rollback. GitHub Actions experience is a plus.
Hands-on observability experience - building platforms covering metrics, logs, and distributed traces including across async boundaries (e.g. Kafka). Able to connect instrumentation to incident workflow, not just tooling setup.
Proficiency in infrastructure-as-code - Terraform and Helm as primary tools, with a strong IaC-first mindset.
Programming proficiency in Golang and/or shell scripting for platform tooling; familiarity with Java/SpringBoot operational characteristics is a plus.
Proven troubleshooting skills across distributed systems - latency contagion, cascading failures, connection exhaustion, and autoscaling lag under traffic spikes.