Find a career with Emergence Capital Partners companies

Explore career opportunities across the Emergence Capital portfolio.

companies

Jobs

Member of Technical Staff, AI Supercomputing

Radical Numerics

Software Engineering, IT, Data Science

San Francisco, CA, USA · Tokyo, Japan

Posted on Jun 9, 2026

Apply now

About Us

Radical Numerics is an AI research lab building general biological intelligence. Our mission is to master the code of life, and our purpose is to reduce human suffering.

Our team created Evo, and started the field of generative genomics. Our work was featured on the cover of Science, and presented by our CEO on the main stage of TED2025. Evo was used to create the first AI gene therapy tool CRISPR-Cas9, and the first AI whole genome from scratch. Evo 2, featured in Nature, is the largest fully open source AI project across any domain.

Radical Numerics is bringing the rigor of distributed systems, model architecture, and numerics research to the challenges of biology. We’ve redesigned the foundation model training stack to turn the world’s raw scientific data (e.g. biological sequences, experiments, and physical processes), into intelligible, generative models that can expand and accelerate what humanity can understand, design, and cure.

The same generative breakthroughs that enable life-saving cures also lowers the barrier to creating engineered threats and AI-generated bioweapons. We believe these forces are inseparable. Radical Numerics was founded to develop both the power to design and the responsibility to defend.

About the Role

As a Member of Technical Staff, AI Supercomputing at Radical Numerics, you will design, build, and operate the GPU supercomputing environment that powers our large-scale training and inference. You will deliver high-performance, reliable, and cost-efficient compute so our researchers can move fast at scale, turning frontier infrastructure into the foundation for the next generation of biological world models.

This role is ideal for someone who combines deep operational instincts with an interest in modern machine learning. You should care about how every layer of the cluster affects research velocity: provisioning and capacity, scheduling and multi-tenancy, storage and lineage, communication overhead, observability, and the reliability of long-running jobs across thousands of accelerators.

What You'll Do

Operate and automate large GPU clusters. Own provisioning, imaging, and capacity planning across large distributed compute systems, with a focus on uptime, utilization, and cost efficiency.
Build a unified compute interface. Write software that abstracts cluster management and presents a single, ergonomic interface for training and inference, so researchers spend their time on science rather than infrastructure.
Extend scheduling and orchestration. Adapt systems like Kubernetes or Slurm for topology-aware placement, preemption, quotas, and fair-share multi-tenancy across competing workloads.
Maximize throughput and hardware efficiency. Profile and tune performance across the stack, including communication patterns, memory efficiency, custom kernels, compilation paths, and systems instrumentation, to ensure training compute is used effectively.
Improve reliability and recovery. Establish standards and mechanisms for robustness and error recovery, including monitoring, fault tolerance, checkpointing, and incident analysis for fast-moving research infrastructure.
Build reliable storage and artifact paths. Design durable paths for datasets, checkpoints, and logs, with clear retention and lineage that support reproducible, large-scale experimentation.
Collaborate across research and engineering. Partner closely with model researchers and training scientists to unblock large-scale runs, advise on parallelism and performance trade-offs, and design systems that support new scientific directions rather than constrain them.

What We're Looking For

Proven track record operating large-scale GPU clusters and container orchestration systems such as Kubernetes or Slurm.
Proficiency in building performant, maintainable software in at least one backend language (we use Python and Rust), with a focus on performance and reliability.
Strong systems background spanning Linux, networking, and infrastructure-as-code.
Strong understanding of modern deep learning frameworks and their systems internals (e.g., PyTorch, Triton, CUDA, C++).
Ability to debug complex, multi-layered systems involving distributed training, memory and performance regressions, and reliability issues in large codebases.
Comfort operating across the stack and owning projects end to end, with a bias toward initiative and execution.
Excellent written and verbal communication skills bridging technical and scientific domains.

Nice to Have

Familiarity with CUDA/NCCL and performance profiling for distributed training and inference.
Experience supporting large-scale distributed training for frontier or foundation models.
Contributions to open-source ML systems or infrastructure such as PyTorch, Torchtitan, or Megatron-LM.
Familiarity with ML runtimes, compilers, numerics, communication libraries, and custom kernel development.
Experience improving researcher productivity through infrastructure design, developer tooling, or workflow improvements.
Background in applied math, systems, computational biology, or related quantitative sciences.

Why Radical Numerics

Help build the computational foundation for multimodal biological world models aimed at rapid detection, response, and countermeasures across global health.
Work on systems problems at the frontier of distributed training, architecture, and numerics, in service of real biological applications.
Join a collaborative culture that values rigor, creativity, and cross-disciplinary partnership across AI labs, biotechs, hospital systems, and research institutes.
Competitive compensation, comprehensive benefits, and support for continual learning.

Radical Numerics is committed to equal employment opportunity and does not discriminate in any employment opportunities or practices based on an individual's race, color, creed, gender (including gender identity and gender expression), religion (all aspects of religious beliefs, observance or practice, including religious dress or grooming practices), marital status, registered domestic partner status, age, national origin or ancestry (including language use restrictions and possession of a driver’s license issued under California Vehicle Code section 12801.9), natural hair, physical or mental disability, political affiliation, medical condition (including cancer or a record or history of cancer, and genetic characteristics), sex (including pregnancy, childbirth, breastfeeding or related medical condition), genetic information, sexual orientation, military and veteran status or any other consideration made unlawful by federal, state, or local laws. It also prohibits unlawful discrimination based on the perception that anyone has any of those characteristics, or is associated with a person who has or is perceived as having any of those characteristics.

Radical Numerics participates in E-Verify and will provide the federal government with your Form I-9 information to confirm that you are authorized to work in the U.S.

Apply now

See more open positions at Radical Numerics