About the role
Who We Are
OKX is a leading crypto exchange, and the developer of OKX Wallet, giving millions access to crypto trading and decentralized crypto applications (dApps). OKX is also a trusted brand by hundreds of large institutions seeking access to crypto markets. We are safe and reliable, backed by our Proof of Reserves.
Across our multiple offices globally, we are united by our core principles: We Before Me, Do the Right Thing, and Get Things Done. These shared values drive our culture, shape our processes, and foster a friendly, rewarding, and diverse environment for every OK-er.
About the Opportunity
We are seeking a highly skilled and hands-on Machine Learning Engineer specializing in large model post-training and alignment. This role focuses on designing, executing, and optimizing post-training pipelines to improve model performance, controllability, domain adaptation, and reasoning capabilities.
You will work across the full lifecycle of post-training—from data strategy and reward modeling to reinforcement learning–based optimization and production-grade inference deployment.
What You’ll Be Doing
- Lead and execute the full post-training pipeline for large language models (LLMs), including supervised fine-tuning, preference optimization, and reinforcement learning–based methods.
- Design and implement advanced training paradigms such as DPO (Direct Preference Optimization) and GRPO (Generalized Reward Policy Optimization).
- Develop domain-specific data recipes, curation strategies, and augmentation pipelines to optimize task performance.
- Conduct post-training of specialized small models from scratch, including architecture selection, dataset construction, and optimization strategy.
- Build and refine Reward Models to support alignment and downstream optimization.
- Design and implement RLAIF (Reinforcement Learning from AI Feedback) closed-loop systems.
- Optimize inference efficiency and deploy models using low-latency serving frameworks such as vLLM and SGLang.
- Evaluate model performance using both automated benchmarks and human/AI feedback loops.
- Collaborate with research and infrastructure teams to productionize training and deployment workflows.
What We Look For In You
- Bachelor's in Computer Science, AI, Machine Learning, or related fields with at least 8 years of industry experience.
- Strong hands-on experience across the full post-training pipeline for large models.
- Deep familiarity with preference learning and alignment techniques, including DPO, GRPO, and RL-based post-training methodologies.
-
Proven experience designing domain-specific data strategies and training methodologies.
- Experience training and post-training specialized small models from scratch.
- Solid understanding of reinforcement learning fundamentals and their application to model alignment.
- Experience deploying models in low-latency production environments using frameworks such as vLLM, SGLang, or similar.
Perks & Benefits
- Competitive total compensation package
- L&D programs and Education subsidy for employees' growth and development
- Various team building programs and company events
- Wellness and meal allowances
- Comprehensive healthcare schemes for employees and dependants
- More that we love to tell you along the process!
OKX Statement
Aplyr's read
OKX is a dynamic cryptocurrency exchange attracting tech-savvy professionals focused on digital finance innovation and security.
What's promising
- •OKX offers a robust platform for trading a wide range of digital assets.
- •The company emphasizes security, crucial in the volatile crypto market.
- •OKX is expanding globally, offering diverse career opportunities.
What to watch
- •Cryptocurrency markets are highly volatile, posing inherent risks.
- •Regulatory scrutiny on crypto exchanges can impact operations.
- •Competition among crypto exchanges is intense, requiring constant innovation.
Why OKX
- •OKX integrates advanced security measures to protect user assets.
- •The platform supports a variety of digital asset derivatives.
- •OKX's global reach includes roles in diverse regions, enhancing cultural diversity.
Aplyr’s read is generated by AI from public sources. Was it useful?
About OKX
OKX is a leading cryptocurrency exchange that provides a platform for trading various digital assets, including cryptocurrencies and derivatives. With a focus on security and user experience, OKX aims to empower users to trade and invest in the digital economy effectively.
Similar roles
Sr Lead, Solutions Architect - Infrastructure, Cloud, Automation & AI Engineering
Northern Trust
Specialist - Gen AI Development
Sun Life
Automation & AI Product Owner
Rolls-Royce
Senior Business Analyst- ServiceNow Artificial Intelligence
Takeda
Senior AI Engineer
Takeda
Senior/ Lead Generative AI Developer/engineer
Citigroup