Back to Search
Overview
Principal

Principal engineer, AI Serving Framework Architect (Software)

Confirmed live in the last 24 hours

Samsung Semiconductor

Samsung Semiconductor

Compensation

$219,000 - $351,000/year

San Jose, California, United States
On-site
Posted April 8, 2026

Job Description

Please Note:

To provide the best candidate experience amidst our high application volumes, each candidate is limited to 10 applications across all open jobs within a 6-month period. 

Advancing the World’s Technology Together

Our technology solutions power the tools you use every day--including smartphones, electric vehicles, hyperscale data centers, IoT devices, and so much more. Here, you’ll have an opportunity to be part of a global leader whose innovative designs are pushing the boundaries of what’s possible and powering the future. 

We believe innovation and growth are driven by an inclusive culture and a diverse workforce. We’re dedicated to empowering people to be their true selves. Together, we’re building a better tomorrow for our employees, customers, partners, and communities.

Job Title: Principal engineer, AI Serving Framework Architect (Software)

What You’ll Do    

The Architecture Research Lab (ARL) focuses on addressing fundamental system-level bottlenecks in modern AI, particularly in memory capacity/bandwidth and system-scale communication. By leveraging Samsung’s world-class memory technologies, ARL explores and defines next-generation AI system architectures that deliver step-function improvements in performance, efficiency, and scalability.
We are seeking a Principal AI System Architect who will play a key role in bridging AI workloads, system architecture, and hardware design. In this role, you will develop system-level performance models, drive architecture-level design decisions, and propose forward-looking AI system architectures that shape Samsung’s long-term AI platform strategy.

Location: Daily onsite presence at our San Jose office in alignment with our Flexible Work policy

Job ID: 42853

  • As a Tech Lead, leading research teams in Korea and proposing technical direction
  • Research on dynamic scheduling methodologies for maximizing AI inference performance in multi-rack scale memory-centric systems, comprised of heterogeneous compute-capable memory and hierarchical memory
  • Investigating methods to accelerate search operations in RAG’s vector DB and AI Agent’s knowledge-graph by leveraging compute-capable memory
  • Studying strategies for optimally placing KVCache and a vector DB in hierarchical memory to minimize frequent SSD accesses and reduce IO stalls
  • Proposing SW design for implementing the derived optimization algorithms on open-source platforms such as vLLM

 

What You Bring

  • PhD in Computer Science or a related field with 10+ years of experience in AI Serving Framework for large-scale computing, with focusing on the AI workloads. 
  • Led a project to build and optimize a Large Language Model (LLM) Inference Software Stack on a multi-rack scale system to deliver AI Inference services to over 100,000 users. 
  • Extensive experience in designing AI Inference Software Stacks for heterogeneous devices.In-depth understanding of the internal architecture and operation mechanisms of inference engines such as vLLM. 
  • Proficiency in AI Inference System Profiling and optimization. 
  • Knowledge and practical experience with future AI workloads, including reasoning models, multi-modal solutions, AI agents, and world models. 
  • Strong understanding of compute, memory, and networking bottlenecks in AI systems. 
  • Required skillsets: PyTorch, Python, and C++ 
  • A collaborative mindset, curiosity, and resilience in solving complex challenges. 
  • Excellent verbal, presentation, and written communication skills. 
  • (Nice to have) Native or fluent Korean speakers are preferred. 
  • You’re inclusive, adapting your style to the situation and diverse global norms of our people. 
  • You approach challenges with curiosity and resilience, seeking data to help build. Understand
pythongoaiiosdatadesign