Inference Engineer
Adaption
Location
San Francisco
Employment Type
Full time
Location Type
Hybrid
Department
Platform
About Us
We believe the future is adaptable, and not one-size-fits-all. We will lead in real-time efficient adaptation that combines algorithm with innovative interface design. Our global team—based in SF and beyond—brings together top talent in AI innovation. Backed by world-class investors, we're building Adaptable Intelligence.
The Role
You'll be one of our first engineering hires, working directly with our founders to set up the core inference systems that will power our product, including deployment pipelines, model serving, observability, and cloud infrastructure for large language models. You thrive in zero-to-one environments and enjoy owning the full lifecycle of LLM inference design and implementation. You’re comfortable wearing multiple hats, making pragmatic technical decisions, and laying down scalable, secure foundations for future growth of our LLM inference capabilities.
Responsibilities
Build from zero to one: design and implement our entire LLM inference infrastructure, making critical architectural decisions for scalability and performance.
Own the inference stack: deploy, optimize, and maintain high-throughput, low-latency inference systems serving millions of requests.
Framework expertise: leverage frameworks like vLLM, SGLang, or similar to maximize inference efficiency and cost-effectiveness.
Performance optimization: fine-tune model serving configurations, implement batching strategies, and optimize GPU utilization.
Infrastructure scaling: design auto-scaling systems that can handle variable traffic patterns while controlling costs.
Monitoring & reliability: build comprehensive observability into our inference pipeline with proper alerting and incident response.
Cross-functional collaboration: work closely with our ML and product teams to understand requirements and deliver optimal serving solutions.
Qualifications
Proven 0→1 experience: You've previously built LLM inference systems from scratch in a production environment
Framework proficiency: Hands-on experience with modern inference frameworks (vLLM, SGLang, TensorRT-LLM, or similar)
Infrastructure expertise: Strong background in distributed systems, containerization (Docker/Kubernetes), and cloud platforms (AWS/GCP/Azure)
Performance mindset: Experience optimizing inference latency, throughput, and cost at scale
Production experience: You've deployed and maintained ML systems serving real users in production
Nice to have
Experience in a fast-paced startup environment
Contributions to open-source inference tools and frameworks
Experience with model quantization, pruning, or other optimization techniques
Knowledge of CUDA programming and GPU optimization
Experience serving multi-modal models (vision, audio, etc.)
What We Offer
Competitive salary + meaningful equity
Learning and development budget to support your growth as you adapt
Comprehensive medical benefits and generous PTO
Annual travel stipend to explore somewhere new—because building global technology means staying adaptable to new places and perspectives
Mission-driven team shaping the future of intelligence, where you'll enjoy high ownership and the opportunity to make a career-defining impact