Machine Learning Engineer

San Jose, California

Koda Staff
Apply for this Job

Machine Learning Engineer - Speech & Audio AI

Location: San Francisco, CA (Hybrid)

Employment Type: Full-time

Experience Level: Mid to Senior


Are you passionate about shaping the future of voice and sound technology? Join a cutting-edge AI startup in San Francisco that's building the next generation of speech and audio intelligence products.


We're looking for a Machine Learning Engineer who enjoys solving complex problems and working across multiple areas of AI and data-driven technology in a dynamic environment.


What You'll Do

  • Design, train, and optimize ML models for speech recognition, audio classification, speaker diarization, or text-to-speech (TTS).
  • Collaborate with product and research teams to bring state-of-the-art models into production.
  • Develop scalable pipelines for model training, evaluation, and deployment.
  • Apply techniques like self-supervised learning, transformers, or diffusion models to real-world audio data.
  • Analyze and clean large-scale voice datasets (structured and unstructured).
  • Monitor and improve inference performance in real-time audio systems.

What We're Looking For

  • 2-6 years of experience in machine learning, with a focus on speech/audio.
  • Strong background in deep learning (PyTorch or TensorFlow).

Hands-on experience with tools and frameworks such as:

  • Hugging Face Transformers
  • torchaudio, librosa, Kaldi, ESPnet
  • Neural vocoders (e.g., WaveGlow, WaveNet, HiFi-GAN)
  • Voice conversion frameworks (e.g., RVC, DiffVC, YourTTS)
  • TTS engines like Coqui TTS
  • Self-supervised learning tools like S3PRL
  • Solid understanding of digital signal processing and acoustic modeling, with experience in: FFmpeg, SoX, NumPy/SciPy, Praat
  • Experience deploying ML models in cloud environments (AWS, GCP, or Azure).
  • BS or MS in CS, EE, ML, or related field (or equivalent industry experience).

Date Posted: 07 June 2025
Apply for this Job