About the role
Agency Notice: We are not currently working with recruiting agencies for this role. Please do not contact Vizcom employees regarding this position. Any resumes submitted without a prior agreement will be considered unsolicited.
About Vizcom
Vizcom is a visual creation platform that combines modern web tooling with AI-powered workflows. Our stack includes React/TypeScript frontend, Node/Koa + PostGraphile API services, PostgreSQL, Redis, BullMQ queues, and Kubernetes-based production infrastructure.
We’re hiring a senior owner of stability and infrastructure to ensure the platform is reliable, fast, and resilient as we scale.
Role Mission
Own service reliability end-to-end: prevent incidents, reduce blast radius when failures happen, and lead fast, high-quality recovery when production degrades.
This is a hands-on technical leadership role with authority to set reliability standards and enforce production guardrails.
Compensation
$200,000 – $250,000 base salary + meaningful equity
What You’ll Own
Reliability bar: Set and enforce SLIs/SLOs/error budgets for critical user flows.
Production architecture resilience: Drive failure isolation across API, workers, queues, and dependencies so one subsystem cannot take down core access.
Kubernetes runtime reliability: Define probe contracts, rollout/rollback standards, graceful shutdown behavior, scaling/resource policies, and startup safety.
Queue + job safety (BullMQ/Redis): Own poison pill containment and workload isolation.
Incident command quality: Lead Sev1/Sev2 response end-to-end (containment, communications, technical direction, RCA, corrective action execution).
Reliability operating system: Own observability quality (signals over noise), on-call effectiveness, runbooks, and postmortem discipline.
Release safety authority: Gate risky deploys and enforce reliability guardrails when production health is at risk.
Traits We’re Looking For
Calm, structured incident commander under pressure.
Thinks in failure modes and blast radius by default.
Pragmatic: can stabilize quickly, then implement durable fixes.
High ownership and strong written communication.
First 90 Days
Establish baseline reliability metrics and identify top platform risks.
Tighten incident response mechanics (roles, comms cadence, runbooks, status updates).
Deliver high-impact hardening fixes across probes/startup paths/queue safety.
Publish a prioritized 6–12 month reliability roadmap with clear ownership and milestones.
If possible please include one incident you personally led and send to Jordan@vizcom.com :
1) what failed,
2) how you contained it,
3) what permanent fixes you shipped, and measured.
Skills & Tags
Aplyr's read
Vizcom is at the forefront of AI-driven visual communication, ideal for tech enthusiasts passionate about design innovation.
What's promising
- •Vizcom is pioneering AI tools that transform design processes, offering cutting-edge solutions.
- •The company is actively hiring for senior technical roles, indicating growth and investment in talent.
- •Employees work on innovative projects that push the boundaries of visual communication technology.
What to watch
- •The niche focus on AI for visual communication may limit broader industry applicability.
- •Rapid technological advancements could outpace current AI solutions, requiring constant adaptation.
- •Limited public information about company culture and employee satisfaction.
Why Vizcom
- •Vizcom uniquely integrates AI with design, setting it apart in the tech industry.
- •The company emphasizes innovation in visual communication, a relatively specialized field.
- •Vizcom's focus on AI-enhanced design processes offers distinct career opportunities for tech-savvy creatives.
Aplyr’s read is generated by AI from public sources. Was it useful?
About Vizcom
Vizcom is a technology company focused on leveraging artificial intelligence to enhance visual communication and design processes.
Similar roles
Principal, Data & AI Platform Engineer
Fiserv
Sr. Lead AI Engineer (Inference Optimization, FM hosting, AI Platform)
Capital One
Software Engineer and Senior Software Engineer - AI Platform
Workday
Machine Learning Engineer III / Senior Machine Learning Engineer - AI Platform
Workday
Software Developer - Cloud/AI/Big Data Platform Team (Automation Focus)
Airbus
Software Engineering MTS (Full Stack) - AI Generalist (Demo Tools & Platform Engineering)
Salesforce