Inference Performance Engineer

United States

Acceler8 Talent
Job Expired - Click here to search for similar jobs

Join Our Inference Performance Team: Optimizing Foundation Models On-Device


We're building the future of on-device AI by making foundation models smarter, faster, and more efficient. As part of the Inference Performance Team, you'll work on challenging, high-impact projects to push the limits of what's possible with foundation model inference.


What You'll Do

  • Pinpoint performance bottlenecks and navigate quality-performance trade-offs in reference implementations (e.g., openai/whisper) and our optimized frameworks.
  • Design, prototype, and test performance improvements tailored to meet enterprise customer needs.
  • Drive innovation in our open-source inference frameworks by pitching and delivering new ideas.
  • Help expand support to new platforms-currently focused on Apple but actively growing into Android, Linux, and soon Windows.
  • Collaborate with ML Research Engineers to turn theoretical advances into practical, real-world optimizations.

Core Qualifications:

  • 3+ years of industry experience working on technically challenging problems.
  • Proficiency in Python or C/C .
  • Experience with CUDA, OpenCL, or Metal.
  • A strong understanding of hardware acceleration (GPUs, NPUs, TPUs, CPUs).
  • Familiarity with modern ML frameworks like TensorFlow, PyTorch, Core ML, or ONNX.
  • Expertise in GPU kernel programming.
  • Contributions to major ML frameworks or open-source projects.

Why This Role?

You'll play a critical role in advancing the performance of foundation models across platforms like Apple, Android, and Linux-shaping the future of efficient, scalable on-device AI.

Date Posted: 02 May 2025
Job Expired - Click here to search for similar jobs