Inference Performance Engineer

United States

Acceler8 Talent

Job Expired - Click here to search for similar jobs

Join Our Inference Performance Team: Optimizing Foundation Models On-Device

We're building the future of on-device AI by making foundation models smarter, faster, and more efficient. As part of the Inference Performance Team, you'll work on challenging, high-impact projects to push the limits of what's possible with foundation model inference.

What You'll Do

Pinpoint performance bottlenecks and navigate quality-performance trade-offs in reference implementations (e.g., openai/whisper) and our optimized frameworks.
Design, prototype, and test performance improvements tailored to meet enterprise customer needs.
Drive innovation in our open-source inference frameworks by pitching and delivering new ideas.
Help expand support to new platforms-currently focused on Apple but actively growing into Android, Linux, and soon Windows.
Collaborate with ML Research Engineers to turn theoretical advances into practical, real-world optimizations.

Core Qualifications:

3+ years of industry experience working on technically challenging problems.
Proficiency in Python or C/C .
Experience with CUDA, OpenCL, or Metal.
A strong understanding of hardware acceleration (GPUs, NPUs, TPUs, CPUs).
Familiarity with modern ML frameworks like TensorFlow, PyTorch, Core ML, or ONNX.
Expertise in GPU kernel programming.
Contributions to major ML frameworks or open-source projects.

Why This Role?

You'll play a critical role in advancing the performance of foundation models across platforms like Apple, Android, and Linux-shaping the future of efficient, scalable on-device AI.

Date Posted: 02 May 2025

Job Expired - Click here to search for similar jobs