Find a career with Emergence Capital Partners companies

Explore career opportunities across the Emergence Capital portfolio.

companies

Jobs

Member of Technical Staff - ML Infra

Causal Labs

Software Engineering, IT, Data Science

San Francisco, CA, USA

Posted 6+ months ago

Responsibilities

Design, deploy, and maintain large distributed ML training and inference clusters
Develop efficient, scalable end-to-end pipelines to manage petabyte-scale datasets and model training throughout the entire ML lifecycle
Research and test various training approaches including parallelization techniques and numerical precision trade-offs across different model scales
Analyze, profile and debug low-level GPU operations to optimize performance
Stay up-to-date on research to bring new ideas to work

What we’re looking for

We value a relentless approach to problem-solving, rapid execution, and the ability to quickly learn in unfamiliar domains.

Strong grasp of state-of-the-art techniques for optimizing training and inference workloads
Demonstrated proficiency with distributed training frameworks (e.g. FSDP, DeepSpeed) to train large foundation models
Knowledge of cloud platforms (GCP, AWS, or Azure) and their ML/AI service offerings
Familiarity with containerization and orchestration frameworks (e.g., Kubernetes, Docker)
Background working on distributed task management systems and scalable model serving & deployment architectures
Understanding of monitoring, logging, observability, and version control best practices for ML systems

You don’t have to meet every single requirement above.