Manager, AI Operations & Evaluation

Confirmed live in the last 24 hours

Chime

Remote, USA

Remote

Posted March 11, 2026

Job Description

About the Role

AI Operations (AIOPS) defines how AI is governed, evaluated, and continuously improved across OMX. We ensure every model in Operations is accurate, fair, and aligned with Chime’s standards for operational excellence and member trust.

As Manager, AI Evaluation & Insights, you’ll lead the team responsible for operationalizing and executing AI evaluation standards across OMX. You’ll run human and automated evaluation systems, manage model health monitoring, and apply testing and simulation frameworks that detect hallucinations, bias, or drift before they impact members or agents.

You’ll manage a team of TPM’s and evaluation specialists who measure AI performance across risk, compliance, agent experience, and bot experience domains. You’ll ensure AI deployments meet the standards set by the AI Governance pillar and deliver measurable value to Operations.

The base salary offered for this role and level of experience will begin at $150,000.00 and up to $208,000.00. Full-time employees are also eligible for a bonus, competitive equity package, and benefits. The actual base salary offered may be higher, depending on your location, skills, qualifications, and experience.

In This Role, You Will

Lead the AI Evaluation team, owning staffing, coaching, performance management, and delivery of evaluation and testing frameworks.
Manage the AI evaluation lifecycle — including pre-launch testing, simulation, and post-deployment health monitoring — ensuring alignment with governance standards and expectations.
Create domain-specific evaluation tracks (e.g., Compliance & Risk, Bot Experience, Agent Experience) to assess AI quality from multiple perspectives.
Operationalize human-in-the-loop testing, integrating reviewer feedback into continuous improvement loops.
Oversee simulation environments (3rd-party tools) for stress-testing LLMs and identifying hallucinations or performance regressions.
Partner closely with AI Platform & Governance to implement evaluation metrics, reporting, and health signals in alignment with Responsible AI principles.
Develop dashboards and reporting frameworks to track evaluation coverage, accuracy, and confidence scores across models.
Collaborate with Enablement, Speech Analytics, and Data Operations to ensure AI evaluation results inform retraining, policy, and member impact analysis.
Coach and develop TPM’s to become domain experts in responsible AI measurement. Foster a high-performing, collaborative team culture, ensuring career development and continuous skill enhancement for all team members.

To Thrive in This Role, You Have

7+ years in AI/ML operations, quality, or evaluation with at least 2+ years of people leadership experience.
Deep understanding of LLM behavior, prompt testing, and evaluation methodologies.
Familiarity with human-in-the-loop frameworks and prompt testing tools.
Strong program management and stakeholder communication skills.
Technical proficiency in SQL, Python (preferred), or data visualization platforms (Looker, Snowflake).
Experience collaborating with Engineering, Data Science, and Risk/Compliance partners on AI-related initiatives.
A passion for operational excellence and responsible innovation.

Why This Role Matters

This role creates the execution layer between AI experimentation and operational reality — ensuring governance standards are consistently applied and AI systems are safe, fair, and high-performing in production. You’ll lead the teams that deliver the evaluation signals Operations relies on to trust every AI model deployed.

#LI-EI1 #LI-Remote

A little about us

At Chime, we believe that everyone can achieve financial progress. We created Chime—a financial technology company, not a bank*—on the premise that core banking services should be helpful, easy, and free. Through our user-friendly tools and intuitive platforms, we empower our members to take control of their finances and work towards their goals. Whether it's starting a savings account, purchasing a first car or home, launching a business, or pursuing higher education, we're proud to have helped millions unlock their financial potential.

We're

pythongorustawsaidataanalyticsproductdesign