Find a career with Emergence Capital Partners companies

Explore career opportunities across the Emergence Capital portfolio.
companies
Jobs

Inference Engineer

A

Adaption

Software Engineering
San Francisco, CA, USA
Posted on Nov 4, 2025

Location

San Francisco

Employment Type

Full time

Location Type

Hybrid

Department

Platform

About Us

We believe the future is adaptable, and not one-size-fits-all. We will lead in real-time efficient adaptation that combines algorithm with innovative interface design. Our global team—based in SF and beyond—brings together top talent in AI innovation. Backed by world-class investors, we're building Adaptable Intelligence.

The Role

You'll be one of our first engineering hires, working directly with our founders to set up the core inference systems that will power our product, including deployment pipelines, model serving, observability, and cloud infrastructure for large language models. You thrive in zero-to-one environments and enjoy owning the full lifecycle of LLM inference design and implementation. You’re comfortable wearing multiple hats, making pragmatic technical decisions, and laying down scalable, secure foundations for future growth of our LLM inference capabilities.


Responsibilities

  • Build from zero to one: design and implement our entire LLM inference infrastructure, making critical architectural decisions for scalability and performance.

  • Own the inference stack: deploy, optimize, and maintain high-throughput, low-latency inference systems serving millions of requests.

  • Framework expertise: leverage frameworks like vLLM, SGLang, or similar to maximize inference efficiency and cost-effectiveness.

  • Performance optimization: fine-tune model serving configurations, implement batching strategies, and optimize GPU utilization.

  • Infrastructure scaling: design auto-scaling systems that can handle variable traffic patterns while controlling costs.

  • Monitoring & reliability: build comprehensive observability into our inference pipeline with proper alerting and incident response.

  • Cross-functional collaboration: work closely with our ML and product teams to understand requirements and deliver optimal serving solutions.

Qualifications

  • Proven 0→1 experience: You've previously built LLM inference systems from scratch in a production environment

  • Framework proficiency: Hands-on experience with modern inference frameworks (vLLM, SGLang, TensorRT-LLM, or similar)

  • Infrastructure expertise: Strong background in distributed systems, containerization (Docker/Kubernetes), and cloud platforms (AWS/GCP/Azure)

  • Performance mindset: Experience optimizing inference latency, throughput, and cost at scale

  • Production experience: You've deployed and maintained ML systems serving real users in production


Nice to have

  • Experience in a fast-paced startup environment

  • Contributions to open-source inference tools and frameworks

  • Experience with model quantization, pruning, or other optimization techniques

  • Knowledge of CUDA programming and GPU optimization

  • Experience serving multi-modal models (vision, audio, etc.)

What We Offer

  • Competitive salary + meaningful equity

  • Learning and development budget to support your growth as you adapt

  • Comprehensive medical benefits and generous PTO

  • Annual travel stipend to explore somewhere new—because building global technology means staying adaptable to new places and perspectives

  • Mission-driven team shaping the future of intelligence, where you'll enjoy high ownership and the opportunity to make a career-defining impact