Lead Systems Operations Engineer - Platform Reliability Engineering, SRE, Observability and Monitoring, Platform Support
Confirmed live in the last 24 hours
Wells Fargo
Job Description
About this role:
Wells Fargo is seeking a Lead Systems Operations Engineer. Platform Reliability Engineering (PRE) within the CTO Platform organization. This role is aligned to modern Site Reliability Engineering (SRE) practices and is responsible for driving reliability, resiliency, observability, and operational excellence across critical platform and application services. The role is intended for senior engineers with deep expertise in one core platform domain, applying that expertise to proactively improve platform stability, scalability, and availability.
In this role, you will:
- Lead complex, broad impact initiatives including provision of high level systems consultation for the technology teams
- Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area
- Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems
- Make decisions on technical changes and enhancements
- Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives
- Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
Required Qualifications:
- 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
Desired Qualifications:
- 5+ years of experience in Systems Operations, SRE, Platform Engineering, or Production Support with deep expertise in at least one platform domain:
- Database, Cloud, Network, Compute/Storage, Middleware, or Enterprise Application Support
- Strong hands-on experience applying SRE practices, including SLI/SLO definition, error budgets, and reliability metrics.
- Proven experience troubleshooting and resolving large-scale, distributed production systems.
- Hands-on experience with observability and monitoring tools such as Grafana, Splunk, Prometheus, Cribl, ThousandEyes, AppDynamics, or equivalent, including dashboards, alerting, logs, and metrics.
- Strong scripting and automation skills using Python, Bash, and/or PowerShell to reduce operational toil.
- Experience building automation or reliability tooling using APIs, Git-based workflows, and modern engineering practices.
- Solid understanding of incident, problem, and change management in enterprise production environments.
- Strong communication and influencing skills across engineering teams and senior leadership.
- Experience with capacity management, performance engineering, and resiliency design (HA, fault tolerance, RTO/RPO).
- Experience operating in hybrid environments (on‑prem + cloud) with complex enterprise dependencies.
- Familiarity with infrastructure automation / IaC tools such as Ansible or Terraform.
- Ability to drive technical debt remediation for critical legacy platforms using structured backlogs.
- Experience mentoring or leading senior engineers in reliability, operations, or SRE-focused roles.
Job Expectations:
- Strong collaboration and partnering skills across platform, application, and support teams.
- Ability to manage multiple priorities in a fast-paced, high-impact production environment.
- Consistent delivery of high-quality reliability outcomes within expected timelines.
- High attention to detail, data-driven problem-solving, and operational rigor.
- Prior project or initiative leadership experience is highly desirable.
Posting End Date:
17 Apr 2026*Job posting may come down early due to volume of applicants.
We Value Equal Opportunity
Wells Fargo is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic.
Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit’s risk appetite and all risk and compliance program requirements.
Candidates applying to job openings posted in Canada: Applications for employment are encouraged from all qualified candidates, including women, persons with disabilities, aboriginal peoples and visible minorities. Accommodation for applicants with disabilities is available upon request in connection with the recruitment process.
Applicants with Disabilities
To request a medical accommodation during the application or interview process, visit Disability Inclusion at Wells Fargo.
Drug and Alcohol Policy
Wells Fargo maintains a drug free workplace. Please see our Drug and Alcohol Policy to learn more.
Wells Fargo Recruitment and Hiring Requirements:
a. Third-Party recordings are prohibited unless authorized by Wells Fargo.
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.
Similar Jobs
Wells Fargo
Systems Operations Engineer - L2 Support, UNIX, ITIL, SQL queries, Automation Scripting
Johnson & Johnson
Sr Engineer, Logistics Systems (m/f/d)
Anduril Industries
Technical Operations Engineer, Space Systems
Anduril Industries
Technical Operations Engineer, Intelligence Systems
Anduril Industries
Technical Operations Engineer, Intelligence Systems
Anduril Industries