Senior Data Engineer

San Francisco, California

Gloo
Job Expired - Click here to search for similar jobs

My client is redefining how video is captured, processed, and understood. Our products power everything from cutting-edge AI video generation to large-scale video analytics. As we scale, we're investing heavily in our data engineering team - building robust, efficient, and intelligent pipelines that handle petabytes of video data every day.

We are seeking a Senior Data Engineer with deep experience in building data systems for large-scale video ingestion, processing, and analysis. You'll be at the heart of our video data stack, enabling everything from ML model training to real-time video analytics.

What You'll Do

  • Architect and build high-throughput pipelines for video ingestion, transcoding, feature extraction, and metadata generation.
  • Design distributed storage systems optimized for video formats (e.g., MP4, AV1, HEVC) and derived video data (embeddings, frames, metadata).
  • Implement ETL/ELT processes for raw video ingestion into analytics-ready datasets.
  • Develop feature stores and data APIs that serve structured video data to downstream ML and product teams.
  • Partner with ML engineers, researchers, and platform teams to optimize data access patterns for training and inference on video datasets.
  • Work with real-time and batch systems to stream or schedule video processing jobs (e.g., via Kafka, Spark, Flink).
  • Build monitoring, observability, and data quality checks to ensure video pipelines are reliable and scalable.
  • Drive data modeling best practices for complex video metadata and annotation schemas.

What We're Looking For

  • 5+ years of professional experience in data engineering, preferably with a focus on unstructured data (video, audio, images).
  • Strong experience with distributed data processing frameworks (e.g., Apache Spark, Beam, Flink).
  • Deep understanding of video formats, codecs, and transcoding pipelines.
  • Proficient in Python, SQL, and Scala or Java.
  • Experience designing streaming pipelines using Kafka, Pulsar, or equivalent.
  • Solid understanding of cloud infrastructure (AWS, GCP, or Azure), including storage systems like S3, GCS, or specialized video storage.
  • Experience working with metadata stores, feature stores, and data lakes.
  • Familiarity with ML/AI workflows involving video (e.g., extracting frames, generating embeddings, preprocessing for model training) is a strong plus.
  • Excellent problem-solving skills, ownership mentality, and a passion for working with video data at scale.

Date Posted: 02 May 2025
Job Expired - Click here to search for similar jobs