About the role
Why RoboForce
RoboForce is an AI robotics company developing Physical AI–powered Robo-Labor for dull, dirty, and dangerous work. The company's robots are engineered for demanding industrial environments, with a focus on real-world deployment and scalability.
We are looking for a Senior / Staff AI Research Engineer, Data Infrastructure to build the data and learning engine behind RoboForce's Physical AI stack. In this role, you will own the full pipeline — from raw teleoperation and UMI device data collection through curation, annotation, and storage, to post-training infrastructure that scores demonstrations, identifies failure patterns, and closes the loop back into model retraining.
Responsibilities
-
Design and maintain end-to-end data collection pipelines ingesting multimodal demonstration data from teleoperation devices and UMI hardware, including synchronization, versioning, and distributed storage at scale.
-
Build annotation tooling and data curation workflows — quality filtering, deduplication, episode scoring, and domain reweighting — to produce high-quality training datasets for robot policy learning.
-
Develop post-SFT reinforcement learning infrastructure: implement reward scoring on demonstrations, mine and categorize failure patterns, and feed curated failure data back into the retraining loop.
-
Build evaluation and test infrastructure to log policy rollouts on-robot, capture structured results, and surface actionable diagnostics for the research team.
-
Collaborate with ML researchers to define data schemas, episode formats, and pipeline interfaces that support rapid iteration on VLA and manipulation policy training.
-
Architect scalable storage and retrieval systems for heterogeneous robot data (vision, proprioception, action, language) across both cloud and on-prem environments.
Requirements
-
Bachelor's or Master's degree in Computer Science, Robotics, or related field with 5+ years of experience.
-
Strong proficiency in Python and experience building production-grade data pipelines and ETL systems.
-
Hands-on experience with large-scale dataset management, including versioning, deduplication, quality filtering, and distributed storage (e.g., S3, GCS, HDF5, WebDataset, Zarr).
-
Experience building or working with post-training infrastructure — SFT pipelines, reward modeling, or RL training loops (e.g., PPO, DPO, rejection sampling).
-
Familiarity with deep learning frameworks (PyTorch, JAX) and ML training workflows sufficient to collaborate tightly with research teams.
-
Requires 5 days/week in-office collaboration with the teams.
Bonus Qualifications
-
Experience with robotics data collection hardware — teleoperation devices, UMI, GELLO, or similar — and the synchronization and preprocessing challenges they introduce.
-
Familiarity with robot learning pipelines: imitation learning, behavior cloning, or VLA/VLM fine-tuning workflows.
-
Experience building evaluation or experiment tracking infrastructure (e.g., Weights & Biases, MLflow, custom rollout loggers).
-
Proven ability to design annotation tooling or human-in-the-loop labeling systems for structured or multimodal data.
Benefits
-
Competitive stock options/equity programs.
-
Health, dental, and vision insurance, 401(k) plan.
-
Visa sponsorship and green card support for qualified candidates.
-
Lunches and dinners, a fully stocked kitchen, and regular team-building events.
Aplyr's read
RoboForce is a cutting-edge robotics company attracting top talent in AI and engineering, focusing on advanced robotic systems and real-time AI applications.
What's promising
- •RoboForce is at the forefront of AI-driven robotics innovation.
- •The company offers roles in advanced AI and robotics, appealing to specialists.
- •Strong focus on real-time inference and motion planning in robotics.
What to watch
- •High specialization may limit opportunities for generalists.
- •Potentially high-pressure environment due to cutting-edge project demands.
- •Limited public information about company culture and work-life balance.
Why RoboForce
- •RoboForce specializes in AI research for perception and manipulation.
- •Focus on foundation models and real-time data infrastructure sets it apart.
- •Emphasis on embedded software for robotics platforms and devices.
Aplyr’s read is generated by AI from public sources. Was it useful?
Similar roles
AIML Engineer, AI for Science
GSK
Principal Research Engineer, AEC Data, Generative AI
Autodesk
Senior Research Engineer Machine Learning, AI for Science
Microsoft
Senior Machine Learning Engineer - Fraud (Research Scientist)
Plaid
Research Engineer, AI for Science
OpenAI
Principal Research Engineer - Generative AI - AI Frontiers
Microsoft