Senior AI Inference Engineer

Confirmed live in the last 24 hours

Monks

LATAM; NAMER

Hybrid

Posted February 24, 2026

Job Description

Please note that we will never request payment or bank account information at any stage of the recruitment process. As we continue to grow our teams, we urge you to be cautious of fraudulent job postings or recruitment activities that misuse our company name and information. Please protect your personal information during any recruitment process. While Monks may contact potential candidates via LinkedIn, all applications must be submitted through our official website (monks.com/careers).

About the Role

As a Senior AI Inference Engineer, you’ll play a pivotal role in designing and delivering advanced agentic and visual AI systems for leading organizations across Media, Entertainment, Gaming, and Sport. You’ll partner directly with major professional sports leagues and global media brands, transforming ambiguous business problems into scalable, high-performance AI architectures that can “see” and reason about video in real time. From early-stage discovery and pre-sales through architecture, implementation, and optimization on modern GPU and cloud infrastructure, you’ll own the full lifecycle of complex AI inference solutions.

Responsibilities

Architect, implement, and optimize end-to-end AI inference services and agentic pipelines in Python.
Design autonomous agents that can interpret, reason about, and act on video and multi-modal content.
Integrate Vision Language Models (e.g., GPT-4o, Gemini Pro Vision, LLaVA) into robust, production-grade workflows.
Leverage LLM/agent orchestration frameworks (e.g., LangGraph, AutoGen, Semantic Kernel or similar) to coordinate complex visual AI tasks.
Deploy and operate services on Kubernetes (and potentially OpenShift or NVIDIA Holoscan), ensuring reliability and scalability under heavy media workloads.
Architect distributed systems on AWS, making informed trade-offs across performance, cost, and resilience.
Optimize workloads for modern NVIDIA GPU architectures (Ampere, Hopper, Blackwell), focusing on real-time and high-throughput media use cases.
Collaborate directly with clients in MEGS, including participating in pre-sales discussions to validate feasibility, shape solutions, and clarify the “why” behind requirements.
Create clear architecture diagrams and technical documentation that align both technical and non-technical stakeholders.
Provide technical leadership to project teams, guiding implementation to stay true to the intended architecture and product value.
(Nice to have) Work with video tooling such as FFmpeg, GStreamer, NVENC/NVDEC, and modern codecs (H.264/5), and explore emerging tools such as Mojo or NVIDIA Holoscan for Media.
(Nice to have) Design and deploy AI solutions to edge devices and on-premise or hybrid clusters.

About You

Qualifications & Skills

Significant professional experience (senior level) building and shipping AI/ML systems in production, with strong Python and a modern data/ML stack.
Proven track record taking models from notebooks or prototypes into robust, low-latency inference services.
Extensive hands-on experience building agentic systems, especially those involving computer vision or multi-modal inputs.
Demonstrated experience architecting autonomous agents that can “see” and reason about video content.
Practical experience integrating Vision Language Models (e.g., GPT-4o, Gemini Pro Vision, LLaVA) into complex workflows.
Familiarity with LLM/agent orchestration frameworks (e.g., LangGraph, AutoGen, Semantic Kernel or equivalents) applied to visual or multi-modal tasks.
Strong practical experience with Kubernetes in production.
Experience architecting distributed systems on AWS beyond simply provisioning basic instances.
Understanding of modern NVIDIA GPU architectures (e.g., Ampere, Hopper, Blackwell) and how to optimize workloads for them.
Product-minded and value-driven: able to align technical decisions with business outcomes and ROI.
Excellent communication skills, with the ability to explain complex architectures to both CTO-level and non