Join us in creating the next-generation compute platform for AGI. Are you passionate about developing hardware and software to train and manage the largest Machine Learning workloads for AGI? We are on the lookout for skilled Machine Learning Performance Engineers to become a part of our innovative team dedicated to revolutionizing AI with powerful, specialized hardware for Large Language Models (LLMs).
Our company has successfully secured over $80 million in Series A funding, enabling us to enhance the capabilities of AI. With our cutting-edge hardware, researchers can train 7B-class models from scratch every single day and tackle 70B-class models multiple times a month.
As an ML Performance Engineer, your responsibilities will include developing production-level libraries for efficient distributed training and serving, as well as creating performance models and tools to validate and direct scheduling decisions.
Key Requirements: - Bachelor's degree in Computer Science or a related field
- Proficiency in Python and familiarity with ML frameworks such as JAX, PyTorch, or TensorFlow
- In-depth understanding of the Transformer architecture and experience in distributed computing, high-performance networking, or large-scale ML systems
This position is hybrid in the Bay Area.