About the role
Responsibilities
Linux Systems & Automation (Core)
- Manage large-scale Linux environments: troubleshooting and root-cause analysis
- Write maintainable, hand-off-ready Bash / Ansible / Python automation
- On-call for infrastructure, CI/CD, and production service incidents
HPC Cluster & Storage
- Operate HPC clusters (Slurm) along with usage analytics, auditing, and monitoring tools
- Maintain and plan storage for compute environments (Lustre, NAS)
Cloud & Hybrid Infrastructure
- Manage multi-cloud environments (AWS, Alibaba Cloud, GCP) with Terraform / AWS CDK
- Build and operate Docker (ECS) / Kubernetes (EKS) environments and their deployment workflows
CI/CD & Developer Experience
- Operate self-hosted GitLab server and Runner fleet
- Operate CI/CD systems and design deployment pipelines for research and other projects
GenAI / Internal Platform
- Build internal AI platforms (LangChain / LangGraph / Bedrock, Elasticsearch RAG)
- Develop MCP servers, chatbots, AI agents, and similar services
Requirements
- **5+ years** of hands-on Linux systems administration and infrastructure operations experience
- Solid Linux internals knowledge (process / memory / filesystem / networking / systemd / cgroup); able to localize issues even without complete logs
- Strong Bash / Shell scripting skills — able to write maintainable scripts that others can pick up
- Programming ability for data processing, CLI tools, and API services; Python proficiency preferred
- Solid storage fundamentals with hands-on experience: RAID levels and rebuild trade-offs, filesystem selection, snapshot and backup planning; NAS / shared storage (NFS / SMB) operations experience
- Experience with at least one major public cloud (AWS / GCP / Alibaba Cloud) and IaC tooling (Terraform / CDK / Ansible)
- Familiar with containerization and orchestration (Docker, Kubernetes)
- CI/CD pipeline design and operations experience (GitLab CI / Jenkins / Airflow)
- Able to own a cross-service subsystem end-to-end: design, implementation, documentation, handoff
- **Strong autonomy**: can drive a problem from discovery, root-cause investigation, decision-making, to delivery with minimal supervision; able to make judgment calls under incomplete information and proactively communicate progress, risks, and rationale
- **Self-directed**: doesn't wait for tickets — identifies problems worth solving and prioritizes them independently
Nice to Have
- HPC scheduler experience (Slurm / PBS / LSF)
- Parallel filesystem operations experience (Lustre / GPFS / BeeGFS)
- Advanced Linux performance analysis (perf, eBPF, ftrace) and kernel parameter tuning
- DB operations experience (MySQL, ClickHouse)
- Low-latency network tuning and cross-datacenter link optimization
- LLM application development (LangChain, RAG, Agent, MCP)
- Self-managed Kubernetes experience (Kubespray, kubeadm)
- GPU server operations (single-node): NVIDIA driver / CUDA toolkit version management, `nvidia-smi` / DCGM monitoring, nvidia-container-toolkit integration, troubleshooting XID / ECC errors and thermal throttling
- Experience or familiarity with integrating GPU resources into Slurm: GRES configuration, cgroup-based GPU isolation, user/job-level resource limits
Aplyr's read
Kronos Research is a cutting-edge quantitative trading firm where data-driven innovation meets high-frequency trading, attracting analytical minds and tech enthusiasts.
What's promising
- •Kronos Research is at the forefront of algorithmic trading, offering exposure to advanced quantitative techniques.
- •The firm provides opportunities to work with diverse asset classes, enhancing professional growth and expertise.
- •Kronos Research invests in talent through programs like the Quantitative Research Graduate Program, nurturing future leaders.
What to watch
- •The high-pressure environment of quantitative trading may not suit everyone.
- •Market volatility can impact trading strategies and job stability.
- •Limited public information about company culture and work-life balance.
Why Kronos Research
- •Kronos Research specializes in high-frequency trading, requiring cutting-edge technology and rapid decision-making.
- •The firm focuses on both crypto and traditional asset classes, offering a broad market perspective.
- •Kronos Research's emphasis on machine learning positions it at the intersection of finance and technology.
Aplyr’s read is generated by AI from public sources. Was it useful?
About Kronos Research
Kronos Research is a quantitative trading firm that specializes in algorithmic trading and market making across various asset classes.
Similar roles
Senior Software Engineer, Data - Advertising Engineering
Roku
Senior Site Reliability Engineer, Production Engineering
Anduril Industries
Senior Site Reliability Engineer, Production Engineering
Anduril Industries
Senior Full Stack Engineer - Enterprise Systems
Astranis
Security Engineer
Fireworks AI
Staff Software Engineer - Data Processing & Execution Platform
Dotmatics