Infrastructure Monitoring Engineer (On Contract)
Confirmed live in the last 24 hours
PubMatic
Job Description
About the Role
At PubMatic, we are building the future of the digital advertising supply chain powered by large-scale, real-time systems. As an Infra Monitoring Engineer, you will be part of the Network Operations Center (NOC), ensuring high availability, performance, and reliability of critical infrastructure. In addition to core monitoring responsibilities, this role encourages the use of AI-driven tools and automation to improve incident detection, troubleshooting efficiency, and operational productivity. Candidates will have the opportunity to leverage modern AI assistants for faster RCA, log analysis, and operational insights
What You'll Do
- Monitor infrastructure, applications, and network systems using tools such as Grafana, Nagios, and internal dashboards.
- Handle alerts and incidents (P1/P2), perform initial triage, and ensure timely escalation and resolution.
- Provide Tier-1 support for production systems and services.
- Collaborate with Engineering, AdOps, and DevOps teams for troubleshooting and issue resolution.
- Support real-time systems involved in ad serving, bidding, and traffic flow.
- Leverage AI tools (e.g., log analysis assistants, alert summarization tools) to speed up debugging and root cause analysis.
- Use structured prompts to extract insights from logs, metrics, and incident data.
- Participate in deployment monitoring and post-release validation.
- Document incidents, contribute to RCA, and maintain operational runbooks/Wiki.
- Ensure effective shift handovers and adherence to NOC processes.
- Identify recurring issues and suggest automation or AI-assisted solutions.
We'd Love for You to Have
- 1–3 years of experience in NOC / Infra Monitoring / Production Support roles.
- Basic understanding of Linux (CLI, processes, memory, disk, networking basics).
- Familiarity with monitoring tools like Grafana, Nagios, or similar.
- Basic knowledge of networking concepts (TCP/IP, DNS, HTTP/HTTPS).
- Understanding incident management and alerting tools (Jira, Zenduty, etc.)
- Exposure to AI tools (ChatGPT or similar) for troubleshooting, documentation, or analysis.
- Basic understanding of prompt engineering (ability to ask structured questions to get relevant technical insights)
- Strong analytical and troubleshooting skills.
- Ability to prioritize incidents based on severity and impact.
- Good communication skills and ability to coordinate across teams.
- Willingness to work in 24/7 rotational shifts.
Good to Have (Optional but Preferred)
- Experience using AI tools for:
- Log analysis
- RCA summarization
- Writing scripts or queries
- Basic scripting knowledge (Python/Shell) with AI assistance
- Exposure to automation in monitoring or alerting systems
Qualification:
- Bachelor’s degree in Computer Science / IT or related field (B.E / B.Tech / MCA / BCA, etc.)
Additional Information
Return to Office: PubMatic employees throughout the global have returned to our offices via a hybrid work schedule (3 days “in office” and 2 days “working remotely”) that is intended to maximize collaboration, innovation, and productivity among teams and across functions.
Benefits: Our benefits package includes the best of what leading organizations provide, such as paternity/maternity leave, healthcare insurance, broadband reimbursement. As well, when we’re back in the office, we all benefit from a kitchen loaded with healthy snacks and drinks and catered lunches and much more!
Diversity and Inclusion: PubMatic is proud to be an equal opportunity employer; we don’t just value diversity, we promote and celebrate it. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
About PubMatic
PubMatic is one of the world’s leading scaled digital advertising platforms, offering more transparent advertising solutions to publishers, media buyers, commerce companies and data owners, allowing them to harness the power and potential of the open internet to drive better business outcomes.
Founded in 2006 with the vision that data-driven decisioning would be the future of digital advertising, we enable content creators to run a more profitable advertising business, which in turn allows them to invest back into the multi-screen and multi-format content that consumers demand.
Similar Jobs
Waymo
Senior Software Engineer, Fleet Monitoring & Platform
Grafana Labs
Senior Backend Engineer - Synthetic Monitoring | Spain | Remote
Grafana Labs
Senior Backend Engineer - Synthetic Monitoring | Sweden | Remote
Grafana Labs
Senior Backend Engineer - Synthetic Monitoring | UK | Remote
Grafana Labs
Senior Backend Engineer - Synthetic Monitoring | Germany | Remote
Grafana Labs