Back

Member of Technical Staff - ML Systems & Inference

Gimlet LabsGimlet Labs·Artificial Intelligence

Apply effort

~7 min

Ashby

Posted

97 days

01

About the role

About Us

Gimlet is building the next generation of AI infrastructure: large-scale AI datacenters and the orchestration platform that coordinates them.

The future of AI will require vastly more compute than exists today. But as AI workloads become more complex and new hardware architectures emerge, simply deploying more GPUs isn't enough. The challenge is making increasingly diverse compute work together.

Gimlet's platform intelligently partitions and routes workloads across heterogeneous hardware, enabling step-function improvements in performance and efficiency. Customers deploy through production-grade APIs without needing to think about hardware selection, placement, or optimization.

We work with foundation labs, hyperscalers, and AI-native companies to power production workloads at massive scale and help define the infrastructure layer for the future of AI.

About the role

Gimlet Labs is seeking a Member of Technical Staff focused on ML systems and inference. In this role, you will design and build the inference systems that execute full models end-to-end under real production constraints. You will work at the intersection of model architecture, runtime behavior, and system performance to ensure inference is fast, predictable, and scalable.

This role is ideal for engineers who deeply understand how modern models execute in practice and who care about latency, throughput, and memory behavior across the full inference lifecycle.

What you will work on

  • Design and optimize end-to-end inference pipelines from request ingestion through execution and response

  • Build and evolve inference runtimes that balance latency, throughput, and concurrency under real-world load

  • Reason about batching, queuing, and scheduling tradeoffs, including their impact on tail latency and fairness

  • Manage KV cache allocation, placement, reuse, and eviction across models and requests

  • Optimize prefill and decode paths, including attention mechanisms and memory usage

  • Profile and debug inference performance issues across model, runtime, and system boundaries

  • Work closely with compilers, kernels, networking, and distributed systems to deliver end-to-end performance improvements

You may be a good fit if

  • Strong software engineering fundamentals

  • Experience building or operating ML inference or model serving systems

  • Comfort reasoning about performance, memory usage, and system behavior under load

Strong candidates may also have

  • Experience with inference runtimes such as TensorRT-LLM, vLLM, or custom serving systems

  • Deep understanding of modern model architectures and attention mechanisms

  • Experience with batching, scheduling, and concurrency control in inference systems

  • Familiarity with KV cache management and memory placement strategies

  • Experience profiling and tuning latency- and throughput-critical systems

  • Software development experience in Python and C++

What Makes Gimlet Different

At Gimlet, you will work on infrastructure problems that span the full stack of modern AI systems. Our team operates across datacenters, networking, distributed systems, compilers, runtimes, orchestration, and performance engineering to build the foundation for the next generation of AI infrastructure.

As an early member of the team, you will have significant ownership, work alongside highly technical engineers, and help shape both the systems we build and how we scale the company.

We value people who are excited to work across domains, take ownership of meaningful problems, and build technology that enables the next generation of AI.

02

Aplyr's read

Gimlet Labs is an AI-focused company pushing the boundaries of productivity and creativity, attracting talent passionate about cutting-edge technology and innovation.

Synthesized from recent postings & public sources

What's promising

  • Gimlet Labs is at the forefront of AI-driven productivity tools, promising significant industry impact.
  • The company offers diverse roles in AI research, attracting top technical talent.
  • Strong focus on innovation and creative solutions in AI applications.

What to watch

  • High competition in the AI sector may challenge Gimlet Labs' market position.
  • Rapid technological changes require constant adaptation and innovation.
  • Limited public information about company culture and work-life balance.

Why Gimlet Labs

  • Gimlet Labs specializes in enhancing productivity through AI, setting it apart from general AI firms.
  • Focus on creativity in AI solutions differentiates its product offerings.
  • Emphasis on technical roles indicates a strong commitment to research and development.

Aplyr’s read is generated by AI from public sources. Was it useful?

03

About Gimlet Labs

Gimlet Labs is a company focused on building AI-driven tools and solutions to enhance productivity and creativity in various industries.

04

Similar roles