Research Engineer

Palo Alto, California

Tykhe Inc
Job Expired - Click here to search for similar jobs

We are seeking a highly skilled and experienced Research Lead for Speech, Audio, and Conversational AI to join our innovative team. In this role, you will spearhead the research and development of cutting-edge technologies in speech processing, text-to-speech (TTS), audio analysis, and real-time conversational AI. You will push the boundaries of what's possible in automatic speech recognition (ASR), speaker identification, diarization, speech synthesis, voice cloning, dubbing and audio generation.


Key Responsibilities:

  • Bring the state of the art in Audio/Speech and Large Language Models to develop advanced Audio Language Models and Speech Language Models.
  • Research, architect, and deploy new generative AI methods such as autoregressive models, causal models, and diffusion models
  • Design and implement low-latency end-to-end models with multilingual speech/audio as both input and output.
  • Conduct experiments to evaluate and improve the performance of these models, focusing on accuracy, naturalness, efficiency, and real-time capabilities across multiple languages.
  • Stay at the forefront of advancements in speech processing, audio analysis, and large language models, integrating new techniques into our foundation models.
  • Collaborate with cross-functional teams to integrate these foundation models into Krutrim's AI stack and products.
  • Publish research findings in top-tier conferences and journals such as INTERSPEECH, ICASSP, ICLR, ICML, NeurIPS, and IEEE/ACM Transactions on Audio, Speech, and Language Processing.
  • Mentor and guide junior researchers and engineers, fostering a collaborative and innovative team environment.
  • Drive the adoption of best practices in model development, including rigorous testing, documentation, and ethical considerations in multilingual AI.

Qualifications:

  • Ph.D. in Computer Science, Electrical Engineering, or a related field with a focus on speech processing, audio analysis, and machine learning.
  • Train speech / audio models for representation (like, W2V-BERT, SONAR, AST), generation (like, Hi-Fi GAN, VQ-GAN, AudioLDM), Conformers, multilingual multitask models (like, SeamlessM4T).
  • Expertise with Audio Language Models like AudioPALM, Moshi and Seamless M4T
  • Proven track record of developing and applying novel neural network architectures such as Transformers, Mixture of Experts, Diffusion Models, and State Space Machines (MAMBA, SAMBA).
  • Extensive experience in developing and optimizing models for low-latency, real-time applications.
  • Strong background in multilingual speech recognition, voice cloning, dubbing and synthesis, with an understanding of the challenges specific to different language families.
  • Proficiency in deep learning frameworks (e.g., TensorFlow, PyTorch) and experience deploying large-scale speech and audio models.
  • Demonstrated expertise in high-performance computing with proficiency in Python, C/C , CUDA, and kernel-level programming for AI applications.
  • Experience with audio signal processing techniques and their application in end-to-end neural models.
Date Posted: 05 May 2025
Job Expired - Click here to search for similar jobs