Data Engineer
Confirmed live in the last 24 hours
CAI
Job Description
Req number:
R7299Employment type:
Full timeWorksite flexibility:
HybridWho we are
CAI is a global services firm with over 9,000 associates worldwide and a yearly revenue of $1.3 billion+. We have over 40 years of excellence in uniting talent and technology to power the possible for our clients, colleagues, and communities. As a privately held company, we have the freedom and focus to do what is right—whatever it takes. Our tailor-made solutions create lasting results across the public and commercial sectors, and we are trailblazers in bringing neurodiversity to the enterprise.
Job Summary
We are looking for a motivated Data Engineer to join our dynamic team. As a Data Engineer, you will play a crucial role in data quality and data governance. This role ensures the team has the right data, with the right quality, with the right controls - so model outcomes are dependable and reliable. Own the end-to-end AI data lifecycle - from governed ingestion to training/evaluation datasets, data quality gates, lineage, reproducibility, and run-time monitoring - using AWS + Databricks as the production backbone and are looking for your next career move, apply now.Job Description
We are seeking a Data Engineer who will Own the end-to-end AI data lifecycle - from governed ingestion to training/evaluation datasets, data quality gates, lineage, reproducibility, and run-time monitoring - using AWS + Databricks. This position will be Full-time and Hybrid position.
What You’ll Do
AI Data Strategy & Ownership (Operating Model)
Translate AI use cases into data requirements
Features, labels, context documents, metadata, refresh cadence, retention rules.
Define the “AI data products” needed for each solution (training set, evaluation set, inference inputs, reference corpora)
Develop and maintain an AI data roadmap aligned to the data product roadmap – specific for TE Sensors BU.
Develop a data-strategy to transform from a data-dashboard oriented organization to an AI-first model
Collaborating with our DIA Dashboard organization (Philippine spoke team)
Develop a data-strategy for our TE Sensors internal databases (e.g. SBI)
Data Ingestion & Curation on AWS + Databricks
Build and operate robust ingestion pipelines from enterprise sources into AWS + Databricks:
Ensure data pipelines are:
Incremental (cost-aware)
Observed (metrics & logs)
Reliable (SLAs for freshness and completeness)
Establish BU-oriented AI Data Governance (Unity Catalog + AWS controls)
Leverage Databricks Unity Catalog for table, column, and row-level controls
Implement classification & handling standards
PII/PCI/Confidential tagging
Retention and deletion rules (e.g., right-to-delete)
Audit trails and access logging-
Define and maintain data contracts with source owners for schema, semantics, quality SLAs, and change processes
Data Quality Engineering (Hard Gates for AI Readiness)
Define data quality dimensions and SLAs (AI-specific):
Completeness, consistency, timeliness, uniqueness
Distribution stability (for drift-sensitive features)
Implement automated quality checks:
Schema validation (breaking changes)
Null/missingness thresholds
Referential integrity
Distribution checks (mean/variance, quantiles, KL divergence where appropriate)
Consider data quality dashboards & alerting:
Pipeline failures and/or data freshness breaches
Quality test failures (e.g. Block training or deployment when critical checks fail)
Performance & Cost Optimization (AWS + Databricks economics)
Optimize data storage and compute:
Partitioning strategies and file sizing
Delta optimization/compaction strategy
Cluster sizing, autoscaling, job scheduling
Ensure cost transparency
Production Operations & Support Readiness (Run Phase)
Provide operational artifacts and support
Runbooks (pipeline recovery, backfills, reprocessing)
On-call / escalation participation for data incidents
Root cause analysis for quality issues
Ensure observability via SLAs/health checks for critical pipelines
What You'll Need
Required:
5+ Yrs of Data Engineering & Data Management
AI / ML Data Foundations
Data Quality Engineering
Cloud & Platform Fundamentals
Platform-Specific Qualifications (Databricks + AWS)
Preferred:
Certifications (Optional but highly valuable)
Databricks Machine Learning Professional
Databricks Data Engineer Professional
AWS Solutions Architect (Associate/Professional
AWS Certified Data Analytics – Specialty
Physical Demands
Sedentary work that involves sitting or remaining stationery most of the time with occasional need to move around the office to attend meetings, etc.
Ability to conduct repetitive tasks on a computer, utilizing a mouse, keyboard, and monitor.
Reasonable accommodation statement
If you require a reasonable accommodation in completing this application, interviewing, completing any pre-employment testing, or otherwise participating in the employment selection process, please direct your inquiries to application.accommodations@cai.io or (888) 824 – 8111.
Similar Jobs
BlackRock
Vice President, Data Quality Lead Engineer
Autodesk
Senior Software Data Engineer
Brightfield
Senior Data Integration Engineer
Bristol-Myers Squibb
Senior Data & DevOps Engineer
Conagra
Platform Engineer – Analytics Platform (Databricks)
Conagra