Machine Learning Systems Engineer

San Francisco, California

Acceler8 Talent
Apply for this Job
Join Us to Transform AI and Enhance Productivity.

Become part of our exciting mission to revolutionize human-computer collaboration and streamline workflows through innovative AI products. Join a team that is at the forefront of shaping the future of enterprise operations by harnessing the power of Large Language Models (LLMs) to maximize organizational impact.

Your Contributions:
  • Collaborate to deliver engaging experiences leveraging Large Language Models.
  • Design Robust ML Systems:
  • Create and implement scalable machine learning and distributed systems specifically for LLMs.
  • Enhance Performance:
  • Innovate and optimize at the foundational stack levels, developing high-performance infrastructure with custom kernels.
  • Utilize Parallelism Techniques:
  • Develop advanced parallelism methods to facilitate efficient large-scale LLM distribution training.
Your Qualifications:
  • Experience in training LLMs using tools like Megatron, DeepSpeed, and deploying with vLLM, TGI, TensorRT-LLM, etc.
  • A solid understanding of the architectures of leading AI accelerators like TPU, IPU, HPU, along with their tradeoffs.
  • Proficiency in working with kernel languages such as OAI Triton, Pallas, and compilers like XLA.
  • Demonstrated experience in tuning LLM workloads, with familiarity in MLPerf or production workloads being a plus.
If you are driven by a passion for AI innovation and excited to explore the limits of technology, we welcome you to be part of our collaborative, forward-thinking team.

Date Posted: 11 May 2025
Apply for this Job