My client is redefining how video is captured, processed, and understood. Our products power everything from cutting-edge AI video generation to large-scale video analytics. As we scale, we're investing heavily in our data engineering team - building robust, efficient, and intelligent pipelines that handle petabytes of video data every day.
We are seeking a Senior Data Engineer with deep experience in building data systems for large-scale video ingestion, processing, and analysis. You'll be at the heart of our video data stack, enabling everything from ML model training to real-time video analytics.
What You'll Do
- Architect and build high-throughput pipelines for video ingestion, transcoding, feature extraction, and metadata generation.
- Design distributed storage systems optimized for video formats (e.g., MP4, AV1, HEVC) and derived video data (embeddings, frames, metadata).
- Implement ETL/ELT processes for raw video ingestion into analytics-ready datasets.
- Develop feature stores and data APIs that serve structured video data to downstream ML and product teams.
- Partner with ML engineers, researchers, and platform teams to optimize data access patterns for training and inference on video datasets.
- Work with real-time and batch systems to stream or schedule video processing jobs (e.g., via Kafka, Spark, Flink).
- Build monitoring, observability, and data quality checks to ensure video pipelines are reliable and scalable.
- Drive data modeling best practices for complex video metadata and annotation schemas.
What We're Looking For
- 5+ years of professional experience in data engineering, preferably with a focus on unstructured data (video, audio, images).
- Strong experience with distributed data processing frameworks (e.g., Apache Spark, Beam, Flink).
- Deep understanding of video formats, codecs, and transcoding pipelines.
- Proficient in Python, SQL, and Scala or Java.
- Experience designing streaming pipelines using Kafka, Pulsar, or equivalent.
- Solid understanding of cloud infrastructure (AWS, GCP, or Azure), including storage systems like S3, GCS, or specialized video storage.
- Experience working with metadata stores, feature stores, and data lakes.
- Familiarity with ML/AI workflows involving video (e.g., extracting frames, generating embeddings, preprocessing for model training) is a strong plus.
- Excellent problem-solving skills, ownership mentality, and a passion for working with video data at scale.