Senior Data Scientist
Confirmed live in the last 24 hours
Roche
Job Description
At Roche you can show up as yourself, embraced for the unique qualities you bring. Our culture encourages personal expression, open dialogue, and genuine connections, where you are valued, accepted and respected for who you are, allowing you to thrive both personally and professionally. This is how we aim to prevent, stop and cure diseases and ensure everyone has access to healthcare today and for generations to come. Join Roche, where every voice matters.
The Position
Role Overview
We are seeking a highly motivated and talented Data Scientist with a passion for leveraging data to drive innovation and solve complex business challenges. This role is central to our digital transformation efforts, applying advanced statistical modeling, machine learning, and Generative AI (GenAI) techniques to analyze large-scale datasets, identify key insights, and build scalable AI solutions. The ideal candidate possesses a strong quantitative background, excellent problem-solving skills, and the ability to communicate technical findings to both technical and non-technical stakeholders.
What should you do:
Design, develop, and implement statistical models, machine learning algorithms, and GenAI techniques to analyze large and complex biological datasets (e.g., multi-omics, proteomics, imaging, clinical trial data, real-world evidence, and China-specific datasets).
Collaborate closely with RDT internal functions such as global/local AI teams also stakeholders to understand business challenges and translate them into actionable data science projects.
Develop and maintain data pipelines and infrastructure for efficient data processing, storage, and analysis.
Champion best practices for the development, validation, deployment, and lifecycle management of GenAI/LLM models within the Informatics organization, collaborating with global/local AI experts.
Perform exploratory data analysis to identify trends, patterns, and anomalies in biological data.
Evaluate the performance of models and algorithms, and iterate to improve their accuracy and robustness.
Visualize and communicate complex data insights and findings through reports, presentations, and interactive dashboards.
Stay up-to-date with the latest advancements in data science, machine learning, and relevant areas of biotechnology.
Drive the evaluation and adoption of novel GenAI/LLM tools, frameworks, and MLOps practices to maintain our competitive edge.
Contribute to the development of best practices for data analysis and interpretation within the Informatics and science and research organization.
Document methodologies, code, and results clearly and comprehensively.
Who you are:
Master's or Ph.D. degree in a quantitative field (e.g., Data Science, Computer Science, Statistics, Mathematics, Physics, or Engineering).
5+ years of proven experience applying data science techniques to real-world business problems.
Proven track record of leading and delivering end-to-end GenAI / LLM / NLP projects from conception to production.
AI & Machine Learning Expertise
Proven experience (typically 5+ years) applying data science techniques to real-world problems, preferably within the biotechnology, pharmaceutical, or life sciences industries.
Proven track record of leading and delivering end-to-end GenAI/LLM/NLP projects from conception to production, preferably within the biotechnology, pharmaceutical, or life sciences sectors.
Knowledge of LLM architectures and various fine-tuning methodologies (e.g., LoRA, full fine-tuning, instruction tuning) and their practical applications.
Extensive hands-on experience with prompt engineering, and developing robust RAG systems using frameworks like LangChain, LlamaIndex, or similar.
Strong proficiency in leveraging and customizing models from platforms like Hugging Face, OpenAI, Cohere, or equivalent, for specific NLP tasks in the biomedical domain.
Experience with the MLOps lifecycle for LLMs, including versioning, deployment (e.g., containerization, API development), scaling, monitoring, and continuous improvement in a production environment is a plus
Familiarity with vector databases technology and their application in semantic search and RAG pipelines.
Solid understanding of NLP fundamentals including tokenization, embeddings, attention mechanisms, and model evaluation metrics specific to GenAI/LLM (e.g., ROUGE, BLEU, perplexity, human evaluations).
Experience in addressing challenges unique to LLMs, such as hallucination mitigation, bias detection, data privacy concerns, and ensuring responsible AI practices.
Strong proficiency in at least one programming language commonly used in data science (e.g., Python, R) and relevant libraries (e.g., scikit-learn, TensorFlow, PyTorch, Bioconductor).
Solid understanding of statistical inference, machine learning algorithms (e.g., regression, classification, clustering, deep learning) and their application
Experience working with large datasets and cloud computing platforms (e.g., AWS, Azure, GCP) is a must.
Excellent data visualization skills using tools such as Matplotlib, Seaborn, ggplot2, or Tableau.
Excellent written and verbal communication skills, with the ability to effectively present technical information to both technical and non-technical audiences.
Experience with developing and deploying machine learning models in a production environment is a plus
Knowledge of data privacy regulations (e.g., GDPR, PIPL) and responsible AI practices is a plus
Publications in peer-reviewed journals or presentations at relevant AI/Data Science conferences is a plus
Soft Skills
Communication: Fluent in English and Mandarin (written and verbal), with the ability to present technical information to non-technical audiences.
Problem-Solving: Independent, self-driven person with a high sense of accountability and critical thinking skills.
Teamwork: Ability to work effectively in a collaborative, interdisciplinary, and fast-paced environment.
Who we are
A healthier future drives us to innovate. Together, more than 100’000 employees across the globe are dedicated to advance science, ensuring everyone has access to healthcare today and for generations to come. Our efforts result in more than 26 million people treated with our medicines and over 30 billion tests conducted using our Diagnostics products. We empower each other to explore new possibilities, foster creativity, and keep our ambitions high, so we can deliver life-changing healthcare solutions that make a global impact.
Let’s build a healthier future, together.
Roche is an Equal Opportunity Employer.
Similar Jobs
Johnson & Johnson
Technical Product Owner -Data & Analytics
Surfshark
Senior Product Data Analyst (VPN)
Dick's Sporting Goods
Group Product Manager – Business Intelligence, Athlete Data & Analytics (REMOTE)
Salesforce
Data & Analytics (Renewals Programs) Manager
Vanguard
Senior Data Analyst, Specialist
Johnson & Johnson