AI Inference Software Engineer

Palo Alto, California

Acceler8 Talent
Apply for this Job
AI Inference Software Engineer

Are you a skilled AI Inference Software Engineer eager to make an impact in the world of cutting-edge AI technology? Join our innovative team focused on developing specialized AI chips that redefine performance capabilities, enabling groundbreaking applications like real-time video generation and advanced reasoning. We are searching for an AI Inference Software Engineer who is passionate about enhancing our high-performance software stack.

As a pioneering hardware startup, we are committed to designing ASICs tailored to specific model architectures. Our flagship product stands out with unparalleled throughput and latency, making it suitable for applications with extraordinary performance demands. We operate in an energetic in-person environment in Cupertino, where all technical staff take part across both engineering and research disciplines.

In your role as an AI Inference Software Engineer, you will contribute to the architecture, design, and implementation of our host software stack. You will work closely with firmware, driver, and AI model teams to construct a robust and high-performance hardware-software ecosystem. Your contributions will significantly impact the deployment of new AI products and cater to the specific requirements of our model-centric hardware.

What we offer:
  • Comprehensive medical, dental, and vision packages with full premium coverage.
  • $2,000/month housing subsidy for those residing within walking distance of the office.
  • Complimentary daily lunch and dinner at our Cupertino office.
  • Relocation assistance for candidates moving to the area.
Key responsibilities:
  • Design and develop high-performance software for our host stack.
  • Implement modular code using Rust, C , and Python.
  • Collaborate with firmware and driver teams to enhance the HW/SW stack.
  • Engage with AI researchers and product teams for front-end serving.
  • Create scheduling logic for real-time inference and continuous batching.
  • Implement acceleration techniques such as speculative decoding and KV cache sharing.
  • Develop distributed networking primitives for multi-server inference.
The ideal candidates for this role will possess strong expertise in C and Python, experience with transformer models and inference stacks (e.g., vLLM, SGLang), and a solid understanding of distributed systems, networking, and parallel programming. Familiarity with Rust, CUDA, and hardware accelerators is a bonus.

If you are an engineer who flourishes in a dynamic and collaborative environment, this AI Inference Software Engineer position presents an exciting opportunity to work on technology that pushes the boundaries of AI hardware and software integration. Apply today to be part of our mission to create model-specific hardware that shapes the future of AI.

Date Posted: 21 May 2025
Apply for this Job