Machine Learning Engineer

Santa Rosa, California

Upcoming AI start up
Apply for this Job
About the Company

An innovative AI startup is revolutionizing the transformation of weather data into actionable insights by building safe, steerable physics foundation models. These models empower industries such as aviation, emergency management, and renewable energy while addressing critical climate risks. Fresh off a $6M seed round, the company is expanding its impact and scale.

Member of Technical Staff - ML Infrastructure

Responsibilities

Design, deploy, and maintain large, distributed ML training and inference clusters.

Develop scalable, end-to-end pipelines for managing petabyte-scale datasets and model training.

Research and test advanced training approaches, including parallelization techniques and numerical precision trade-offs.

Analyze, profile, and debug low-level GPU operations to optimize system performance.

Qualifications

Expertise in state-of-the-art techniques for optimizing training and inference workloads.

Proficiency with distributed training frameworks (e.g., FSDP, DeepSpeed).

Familiarity with cloud platforms (GCP, AWS, or Azure) and their ML/AI service offerings.

Experience with containerization and orchestration frameworks (e.g., Kubernetes, Docker).

Background in distributed task management systems and scalable model serving architectures.

Understanding of best practices in monitoring, logging, observability, and version control for ML systems.

Member of Technical Staff - ML Research

Responsibilities

Operate across the full ML stack: data processing, model development, evaluation, and infrastructure.

Develop and implement novel model architectures and training algorithms.

Build robust data pipelines and training infrastructures for massive, multimodal datasets.

Rapidly iterate on experiments and conduct ablation studies to fine-tune model performance.

Qualifications

Strong foundation in machine learning fundamentals with depth in at least one core domain (e.g., Computer Vision, Sensor Fusion, Language Models, Physics-informed Neural Networks).

Proven experience in training models and analyzing experimental results through detailed ablation studies.

Expertise in developing and optimizing large-scale data pipelines.

Familiarity with distributed training methodologies.

Bonus Experience with meteorology, computational fluid dynamics, or numerical simulations.

Date Posted: 02 May 2025
Apply for this Job