ML Ops Engineer

San Francisco, California

Jobot
Job Expired - Click here to search for similar jobs
Seed stage startup offering salary and generous equity - work remote from anywhere.

This Jobot Job is hosted by: Karyn Spies
Are you a fit? Easy Apply now by clicking the "Apply Now" button and sending us your resume.
Salary: $150,000 - $200,000 per year

A bit about us:

Company at the forefront of the intersection between artificial intelligence (AI) and gaming, offering innovative solutions and immersive experiences to our users. Our AI character studio allows users to create, interact with, and share AI characters which have varying skills - from seeing everything on your screen to interacting with these characters in AR/VR. We have been growing steadily and now it's time to 100x our growth.

Why join us?

You'll be part of a creative team that's passionate about pushing the boundaries of AI and gaming. We have been pushing the boundaries of what's possible and have been covered by the leading press for our efforts.

We offer a collaborative environment where your ideas and leadership can shape the future of our products and community. With us, you'll have the opportunity to make a significant impact in an exciting, fast-growing industry.

Job Details

Responsibilities:
  • Architecting and deploying open source models at scale.
  • Model and inference optimization and server configuration and deployment with an emphasis on stability, scalability, and speed of inference/generation.
  • Open source model evaluation.
  • Building fine-tuning datasets and LoRAs.
  • Providing best practices and executing POC for automated and efficient model operations at scale.

Qualifications:
  • 5+ years of experience in ML Ops or a related field.
  • 3+ years of experience in managing machine learning projects end-to-end.
  • Recent focus (at least 18 months) on text generation models at scale.
  • Monitoring Build & Production systems using automated monitoring and alarm tools.
  • Knowledge of machine learning frameworks: TensorFlow, PyTorch, Keras, Scikit-Learn.
  • Experience with running large scale inference services at scale: NVIDIA Triton, TGI, vLLM
  • Experience with container technologies (Docker, Kubernetes, EKS, ECS).
  • Experience with multiple cloud providers (AWS, GCP, Azure, etc).
  • Experience in distributed computing.
  • Experience with a wide range of ML Models (Text Generation, Classification, OCR, Object Detection, Stable Diffusion)
  • Proven track record of building and managing ML pipelines, including data preparation, model training, deployment, and monitoring.
  • Solid foundation in DevOps principles and practices with experience in CI/CD pipelines for ML deployments.
  • Solid programming skills with at least some of the following: C, C , Rust, Python, JavaScript
  • Experience with model performance monitoring and troubleshooting.
  • Passion for ML and its potential to solve real-world problems.


Interested in hearing more? Easy Apply now by clicking the "Apply Now" button.
Date Posted: 01 May 2024
Job Expired - Click here to search for similar jobs