Machine Learning Engineer - Speech & Audio AI
Location: San Francisco, CA (Hybrid)
Employment Type: Full-time
Experience Level: Mid to Senior
Are you passionate about shaping the future of voice and sound technology? Join a cutting-edge AI startup in San Francisco that's building the next generation of speech and audio intelligence products.
We're looking for a Machine Learning Engineer who enjoys solving complex problems and working across multiple areas of AI and data-driven technology in a dynamic environment.
What You'll Do
- Design, train, and optimize ML models for speech recognition, audio classification, speaker diarization, or text-to-speech (TTS).
- Collaborate with product and research teams to bring state-of-the-art models into production.
- Develop scalable pipelines for model training, evaluation, and deployment.
- Apply techniques like self-supervised learning, transformers, or diffusion models to real-world audio data.
- Analyze and clean large-scale voice datasets (structured and unstructured).
- Monitor and improve inference performance in real-time audio systems.
What We're Looking For
- 2-6 years of experience in machine learning, with a focus on speech/audio.
- Strong background in deep learning (PyTorch or TensorFlow).
Hands-on experience with tools and frameworks such as:
- Hugging Face Transformers
- torchaudio, librosa, Kaldi, ESPnet
- Neural vocoders (e.g., WaveGlow, WaveNet, HiFi-GAN)
- Voice conversion frameworks (e.g., RVC, DiffVC, YourTTS)
- TTS engines like Coqui TTS
- Self-supervised learning tools like S3PRL
- Solid understanding of digital signal processing and acoustic modeling, with experience in: FFmpeg, SoX, NumPy/SciPy, Praat
- Experience deploying ML models in cloud environments (AWS, GCP, or Azure).
- BS or MS in CS, EE, ML, or related field (or equivalent industry experience).