Machine Learning Researcher, Audio
Bland AI
Software Engineering, Data Science
San Francisco, CA, USA
USD 140k-260k / year + Equity
Location
San Francisco
Employment Type
Full time
Department
Engineering
Compensation
- $140K – $250K • Offers Equity • Offers Bonus
Machine Learning Researcher / Engineer, Multimodal LLMs
Location: San Francisco, CA or Remote (US)
About Bland
At Bland.com, our mission is to empower enterprises to build AI phone agents at scale. Voice is quickly becoming the primary interface between businesses and their customers, and we are building the models and infrastructure that make those interactions feel natural, reliable, and genuinely human.
We’ve raised $65M from leading investors including Emergence Capital, Scale Venture Partners, Y Combinator, and founders of Twilio, Affirm, and ElevenLabs.
The Role: Research Engineer
We are looking for someone to spearhead the development of our next-generation multimodal LLM stack, combining speech, text, tools, and real-time reasoning into a single unified system. You’ll be responsible for building industry-leading conversational AI models that power Bland's agent, and taking them all the way from idea to production.
At Bland, we're not just thinking about text modeling. You will define how our agents listen, think, and act in real time, integrating streaming audio, tool execution, and dynamic context into a single coherent system.
This role sits at the intersection of:
LLM architecture and fine-tuning
real-time speech systems
agent design (prompting + tools + policies)
multimodal reasoning (audio + text + actions)
You will take ideas from research through production systems serving millions of calls per day.
What Makes You a Great Fit
Strong LLM / Multimodal Background
Experience with LLMs, multimodal models, or speech-language systems
Deep understanding of prompting, fine-tuning, and alignment techniques
Familiarity with streaming or real-time inference is a strong plus
Systems Thinking
Ability to reason about full systems, not just models
-
Comfortable designing interactions between:
model
tools
prompts
runtime constraints
Fast Experimental Loop
You can go from idea → dataset → experiment → conclusion in days
You know how to design experiments that actually answer the question
Product Intuition
Strong sense for what makes an interaction feel natural vs robotic
Ability to translate abstract modeling ideas into user-facing improvements
Builder Mentality
You take ownership from research through deployment
You thrive in ambiguous, fast-moving environments
You care about impact, not just elegance
How You Show Up
You think in systems, not just models
You obsess over latency, correctness, and real-world behavior
You are comfortable discarding ideas quickly when data disagrees
You push toward simple abstractions for complex problems
Bonus Points
Experience with real-time voice systems or conversational AI
Background in tool-using agents or agent frameworks
Experience with multimodal datasets (audio + text + actions)
-
Contributions to LLM or speech-related research or open source
Why This Role Matters
Your work will define how our agents:
understand users in real time
decide when to respond
choose what tools to call
balance speed vs correctness
behave under complex policies
This is the core intelligence layer of the product.
Compensation & Benefits
Competitive salary: $180,000 – $260,000
Meaningful equity
Full healthcare, dental, vision
Office in Jackson Square, SF
High autonomy, high impact
Compensation Range: $140K - $250K