Senior Data Engineer

San Francisco, California

Gloo

Job Expired - Click here to search for similar jobs

My client is redefining how video is captured, processed, and understood. Our products power everything from cutting-edge AI video generation to large-scale video analytics. As we scale, we're investing heavily in our data engineering team - building robust, efficient, and intelligent pipelines that handle petabytes of video data every day.

We are seeking a Senior Data Engineer with deep experience in building data systems for large-scale video ingestion, processing, and analysis. You'll be at the heart of our video data stack, enabling everything from ML model training to real-time video analytics.

What You'll Do

Architect and build high-throughput pipelines for video ingestion, transcoding, feature extraction, and metadata generation.
Design distributed storage systems optimized for video formats (e.g., MP4, AV1, HEVC) and derived video data (embeddings, frames, metadata).
Implement ETL/ELT processes for raw video ingestion into analytics-ready datasets.
Develop feature stores and data APIs that serve structured video data to downstream ML and product teams.
Partner with ML engineers, researchers, and platform teams to optimize data access patterns for training and inference on video datasets.
Work with real-time and batch systems to stream or schedule video processing jobs (e.g., via Kafka, Spark, Flink).
Build monitoring, observability, and data quality checks to ensure video pipelines are reliable and scalable.
Drive data modeling best practices for complex video metadata and annotation schemas.

What We're Looking For

5+ years of professional experience in data engineering, preferably with a focus on unstructured data (video, audio, images).
Strong experience with distributed data processing frameworks (e.g., Apache Spark, Beam, Flink).
Deep understanding of video formats, codecs, and transcoding pipelines.
Proficient in Python, SQL, and Scala or Java.
Experience designing streaming pipelines using Kafka, Pulsar, or equivalent.
Solid understanding of cloud infrastructure (AWS, GCP, or Azure), including storage systems like S3, GCS, or specialized video storage.
Experience working with metadata stores, feature stores, and data lakes.
Familiarity with ML/AI workflows involving video (e.g., extracting frames, generating embeddings, preprocessing for model training) is a strong plus.
Excellent problem-solving skills, ownership mentality, and a passion for working with video data at scale.

Date Posted: 02 May 2025

Job Expired - Click here to search for similar jobs