Machine Learning Engineer

San Jose, California

Koda Staff

Apply for this Job

Machine Learning Engineer - Speech & Audio AI

Location: San Francisco, CA (Hybrid)

Employment Type: Full-time

Experience Level: Mid to Senior

Are you passionate about shaping the future of voice and sound technology? Join a cutting-edge AI startup in San Francisco that's building the next generation of speech and audio intelligence products.

We're looking for a Machine Learning Engineer who enjoys solving complex problems and working across multiple areas of AI and data-driven technology in a dynamic environment.

What You'll Do

Design, train, and optimize ML models for speech recognition, audio classification, speaker diarization, or text-to-speech (TTS).
Collaborate with product and research teams to bring state-of-the-art models into production.
Develop scalable pipelines for model training, evaluation, and deployment.
Apply techniques like self-supervised learning, transformers, or diffusion models to real-world audio data.
Analyze and clean large-scale voice datasets (structured and unstructured).
Monitor and improve inference performance in real-time audio systems.

What We're Looking For

2-6 years of experience in machine learning, with a focus on speech/audio.
Strong background in deep learning (PyTorch or TensorFlow).

Hands-on experience with tools and frameworks such as:

Hugging Face Transformers
torchaudio, librosa, Kaldi, ESPnet
Neural vocoders (e.g., WaveGlow, WaveNet, HiFi-GAN)
Voice conversion frameworks (e.g., RVC, DiffVC, YourTTS)
TTS engines like Coqui TTS
Self-supervised learning tools like S3PRL
Solid understanding of digital signal processing and acoustic modeling, with experience in: FFmpeg, SoX, NumPy/SciPy, Praat
Experience deploying ML models in cloud environments (AWS, GCP, or Azure).
BS or MS in CS, EE, ML, or related field (or equivalent industry experience).

Date Posted: 07 June 2025

Apply for this Job

Show me similar jobs

Send me jobs by email