About the role

Aplyr's Quick Take

This internship focuses on evaluating AI agents used in marketing, specifically assessing their reasoning and tool usage. You'll be building evaluation pipelines, analyzing performance data, and designing metrics to improve AI reliability. It's an individual contributor role aimed at those interested in AI evaluation and infrastructure.

Good fit

Ideal candidates are likely to have a background in AI, data analysis, or software engineering, with a strong interest in machine learning and evaluation methodologies. A proactive, detail-oriented approach will help you thrive in this role.

Worth noting

This position offers a unique opportunity to work at the intersection of gaming and AI, but it may require a steep learning curve in a highly technical field. The internship could lead to future roles in AI safety and evaluation, which are emerging areas in tech.

About the Hiring Team

Level Infinite is Tencent’s global gaming brand. It is a global game publisher offering a comprehensive network of services for games, development teams, and studios around the world.

We are dedicated to delivering engaging and original gaming experiences to a worldwide audience, whenever and wherever they choose to play while building a community that fosters inclusivity, connection, and accessibility. Level Infinite also provides a wide range of services and resources to our network of developers and partner studios around the world to help them unlock the true potential of their games.

What the Role Entails

We are hiring an intern to work on evaluation and reliability infrastructure for a real-world LLM agent system in the UA performance marketing field. The agent performs multi-step reasoning, retrieves context, selects tools, executes actions, handles user confirmations, and interacts with external services.

The goal of this internship is to build transferable expertise in agent evaluation engineering: evaluating tool use, measuring trajectory quality, designing benchmarks, analyzing traces, comparing model and prompt variants, and improving the reliability of agentic AI systems.

This role is ideal for someone interested in future opportunities in LLM agent evaluation, AI safety evaluation, research engineering, LLMOps, or applied AI infrastructure.

Research the state-of-the-art agentic workflow evaluation frameworks in the industry and in the research field.
Apply the theory to build automated evaluation pipelines that can run agent scenarios, capture execution artifacts, score results, and detect regressions.
Evaluate tool-use behavior, including whether the agent selects the right tool, passes correct arguments, avoids unnecessary calls, and handles tool errors appropriately.
Analyze agent trajectories using traces, logs, intermediate steps, and final outputs to identify reasoning failures, context misuse, hallucinated assumptions, and brittle workflow patterns.
Design metrics for agent reliability, including success rate, tool-call precision, argument accuracy, recovery rate, retry count, latency, cost, and safety-related failure rates.
Create reusable evaluation datasets from synthetic cases, golden workflows, and real anonymized executions.
Support experiments comparing prompts, model providers, tool descriptions, memory strategies, context construction methods, and execution modes.
Help build human evaluation workflows and rubrics for judging agent correctness, faithfulness, usefulness, and risk awareness.
Work with engineers to translate evaluation findings into better tests, monitoring signals, tool interfaces, prompts, and guardrails.
Potentially compose research papers and publish in scientific conferences.

Who We Look For

Currently pursuing or recent graduates of a Master’s or PhD degree in Computer Science, Artificial Intelligence, Machine Learning, Software Engineering, Data Science, or a related field.
Strong Python fundamentals and interest in AI systems.
Curious about how LLM agents work, fail, and improve.
Interested in evaluation methodology, not just application building.
Comfortable reading logs, traces, test cases, and structured data.
Detail-oriented and able to define clear, measurable criteria for ambiguous agent behavior.
Prior experience with LLMs, LangChain-like agents, tool calling, pytest, data analysis, or observability tools is helpful but not required.

Equal Employment Opportunity at Tencent

As an equal opportunity employer, we firmly believe that diverse voices fuel our innovation and allow us to better serve our users and the community. We foster an environment where every employee of Tencent feels supported and inspired to achieve individual and common goals.

Aplyr's read

Tencent is a tech giant shaping digital landscapes with a diverse portfolio, attracting talent in gaming, AI, and internet services.
Synthesized from recent postings & public sources

What's promising

•Tencent's vast digital ecosystem offers diverse career paths in gaming, AI, and cloud services.
•Strong global presence provides opportunities for international career growth and collaboration.
•Investment in cutting-edge technology fosters innovation and skill development.

What to watch

•Regulatory scrutiny in China poses challenges for business operations and strategy.
•High competition in the tech sector may limit rapid career advancement.
•Complex organizational structure can lead to bureaucratic decision-making processes.

Why Tencent

•Tencent's WeChat platform integrates social, payment, and service functionalities uniquely.
•Pioneering investments in AI and gaming set it apart in tech innovation.
•Extensive partnerships and investments in global tech companies enhance its influence.

Aplyr’s read is generated by AI from public sources. Was it useful?

About Tencent

Tencent

tencent.com

View company

Tencent is a Chinese multinational conglomerate holding company with subsidiaries in various internet-related services and products, entertainment, AI, and technology.