About the role

Aplyr's Quick Take

This role is for an ML Research Scientist focused on developing advanced video understanding models. You'll be working on creating structured metadata from video content, collaborating closely with product teams to deliver customer-facing solutions. It's an individual contributor role that requires deep technical expertise in machine learning and video analysis.

Good fit

Ideal candidates will have a strong background in machine learning, particularly in video or multimodal AI, with several years of experience in research or applied settings. A collaborative mindset and a passion for innovative technology will help you thrive here.

Worth noting

The company is well-funded and has strong partnerships with major tech players like NVIDIA and AWS, which could provide access to cutting-edge resources. However, the spec is vague on day-to-day responsibilities, which may lead to uncertainty in the role's expectations.

Who we are

At TwelveLabs, we are pioneering the development of cutting-edge multimodal foundation models that have the ability to comprehend videos just like humans do. Our models have redefined the standards in video-language modeling, empowering us with more intuitive and far-reaching capabilities, and fundamentally transforming the way we interact with and analyze various forms of media.

With a $110+ million in Seed and Series A funding, our company is backed by top-tier venture capital firms such as NVIDIA’s NVentures, NEA, Radical Ventures, and Index Ventures, and prominent AI visionaries and founders such as Fei-Fei Li, Silvio Savarese, Alexandr Wang and more. Headquartered in San Francisco, with an influential APAC presence in Seoul, our global footprint underscores our commitment to driving worldwide innovation.

Our partnership with NVIDIA and AWS gives us access to the most advanced chips, including B300s, enabling us to push the boundaries of what's possible in video AI.

We are a global company that values the uniqueness of each person’s journey. It is the differences in our cultural, educational, and life experiences that allow us to constantly challenge the status quo. We are looking for individuals who are motivated by our mission and eager to make an impact as we push the bounds of technology to transform the world. Join us as we revolutionize video understanding and multimodal AI.

About Pegasus

Pegasus is TwelveLabs' core video understanding product, turning video into useful analysis by reasoning over visuals, speech, audio, and on-screen text. The team is not building a generic Video LLM in isolation; we build customer-facing video intelligence workflows that require temporal understanding, structured outputs, and production-grade reliability.

A key example is Segment, our time-based metadata capability. Instead of asking the model a broad question about a video, customers define the exact segment types they care about and the metadata fields they want back. Pegasus then finds the relevant start and end times and returns structured metadata for each segment, such as titles, summaries, topics, people, visual subjects, confidence, or domain-specific labels. This is designed for workflows where “what happened” is not enough; customers need to know when it happened and receive metadata that can flow directly into search, archive, editing, compliance, or content management systems.

For example, a news archive customer can define a segment type like editorial_narratives and ask Pegasus to split a long broadcast into individual stories. For each story, Pegasus can return a timestamped segment with fields such as segment_title, description, editorial_subjects, visual_subjects, names, and confidence. The output is not just a summary of the full video; it is a structured timeline of the video, aligned to the customer's schema.

This is the distinction that matters for Pegasus: general video analysis answers questions about video, while Segment turns video into time-based, structured data tailored to a specific business workflow.

Learn more about Pegasus!

Building Structured Video Assets: A Time-Based Metadata (TBM) Pipeline
Quick Shorts demo: YouTube Shorts

About the Team

The Pegasus team sits at the core of TwelveLabs’ video understanding capabilities and is responsible for driving Pegasus, our Video Analysis product. Our focus is on developing multimodal video analysis systems that are designed for high instruction following capability and producing highly complex, hierarchically structured outputs. We focus on shipping products with real-world value rather than doing research in isolation, and we work in a goal-oriented, cross-functional team that encompasses both ML researchers and engineers.

Our work covers a broad range of challenges: large-scale distributed training of multi-modal LLMs that span from pre-training to RL, accurate temporal segmentation and structured metadata extraction for real-world use cases, extending temporal context length to multiple hours, and data curation processes that enable well-aligned evaluation and performance improvements through training data enhancements.

Our team has access to the most advanced chips in the world, including NVIDIA B300s, to push the boundaries of video analysis systems—accelerating our research-to-production cycle as fast as possible.

In this role, you will

Define and drive research problems that advance Pegasus’s video analysis capabilities, from hypothesis formulation through experimentation and iteration.
Design and run rigorous experiments across model architecture, training strategy, data curation, and evaluation to improve the quality of our multimodal systems.
Build evaluation methods and data curation processes that translate real-world use cases into reliable research signals and measurable model improvements.
Work closely with ML Engineers to turn research outcomes into robust systems with real product impact.
Communicate research findings clearly and use them to inform technical direction across the team.

Even if you don't check every box, we encourage you to apply.

If you're a zero-to-one achiever, a ferocious learner, and a kind team player who motivates others, you'll find a home at TwelveLabs.

You may be a good fit if you have

Strong research experience in one or more relevant areas such as multimodal or unimodal LLMs, large-scale distributed training, data-centric model development, computer vision, or vision-language modeling.
A track record of independently driving research from ideation to execution, demonstrated through projects, technical contributions, or research outputs.
Strong proficiency in Python and PyTorch.
Strong experimental judgment, including the ability to design evaluations, run rigorous ablations, and draw clear conclusions from empirical results.
The ability to communicate effectively and collaborate closely with both researchers and engineers.

Preferred qualifications

Experience working on multimodal systems involving video, vision, language, or structured output generation.
Experience improving model quality through data curation, evaluation design, or training data enhancements.
Experience with large-scale distributed training in high-performance GPU environments.
Experience translating research advances into production ML systems.
MS, PhD, or equivalent practical experience in Machine Learning, Computer Science, or a related technical field.

Others

Work Location: Seoul Itaewon office + Pangyo satellite office
Additional Info: 전문연구요원 편입/전직 가능합니다.

Hiring Process

Application Review → Recruiter Interview (비대면/30분) → Loop Interview [Hiring Manager Interview&Live Coding Test Interview] (대면/약 90분) → System Design Interview(대면/약 90분) → Final Round Interview (비대면/약 30분) → Reference Check → Offer

Benefits and Perks

Growth & Tools
- 글로벌 B2B 고객과 함께 성장하는 Global Team
- 자율성과 협업을 모두 갖춘 하이브리드 근무
- 최신 맥북 및 70만 원 상당 재택근무 장비 지원, 3년 주기로 최신 장비 교체
- Tokens never sleep - Tech 직군 LLM 토큰 무제한 지원
- 강의, 컨퍼런스, 멤버십 등에 사용 가능한 연 140만원 상당 자기개발비 지원
- 영어 교육 프로그램 및 글로벌 버디 프로그램 운영
- 야간 및 주말 출퇴근 택시비 지원
Meal & Snack
- 식비·교통비 등 자유롭게 사용할 수 있는 연 720만원 상당 법인카드 제공
- 사무실 내 스낵바 운영 (간식, 커피, 제철 과일 등)
- 사무실 근무 시, 오후 7시 이후 저녁 식대 제공
Wellness & Family
- 연 1회 본인 및 가족 1인의 건강검진 제공
- 단체보험 가입 (상해보험/치아보험/가족 상해보험 중 택 1)
- 독감 예방접종비 지원
- 연말 2주간 유급 Holiday Break 운영

Skills & Tags

python go aws machine learning ai data product design

Aplyr's read

Twelve Labs is at the forefront of AI-driven video technology, attracting talent passionate about redefining how we interact with video content.
Synthesized from recent postings & public sources

What's promising

•Pioneers in AI video analysis, offering cutting-edge technology.
•Strong focus on innovation attracts top-tier talent in AI and video engineering.
•Rapidly expanding roles indicate robust growth and investment in new technologies.

What to watch

•Highly competitive industry with rapid technological advancements.
•Dependence on AI models requires continuous improvement and adaptation.
•Limited public information about financial stability and long-term viability.

Why Twelve Labs

•Specializes in enabling interactive video content search and analysis.
•Focus on AI-driven video understanding sets it apart from traditional video tech companies.
•Diverse roles in AI and video cognition highlight a comprehensive approach to video technology.

Aplyr’s read is generated by AI from public sources. Was it useful?

About Twelve Labs

Twelve Labs

twelvelabs.io

View company

Twelve Labs is a technology company specializing in AI-driven video analysis and understanding, enabling users to search and interact with video content in innovative ways.