ML Data Infrastructure Engineer

Sunnyvale, California

Bayone

REMOTE :
Please send only top two profiles for now. Kindly send with test report. Kindly send it to this email thread.

Client's exp. Is a plus.

Key Responsibilities:

Design and implement scalable data processing pipelines for ML training and validation
Build and maintain feature stores with support for both batch and real-time features
Develop data quality monitoring, validation, and testing frameworks
Create systems for dataset versioning, lineage tracking, and reproducibility
Implement automated data documentation and discovery tools
Design efficient data storage and access patterns for ML workloads
Partner with data scientists to optimize data preparation workflows

Technical Requirements:

7+ years of software engineering experience, with 3+ years in data infrastructure
Strong expertise in GCP's data and ML infrastructure:
- BigQuery for data warehousing
- Dataflow for data processing
- Cloud Storage for data lakes
- Vertex AI Feature Store
- Cloud Composer (managed Airflow)
- Dataproc for Spark workloads
Deep expertise in data processing frameworks (Spark, Beam, Flink)
Experience with feature stores (Feast, Tecton) and data versioning tools
Proficiency in Python and SQL
Experience with data quality and testing frameworks
Knowledge of data pipeline orchestration (Airflow, Dagster)

Nice to Have:

Experience with streaming systems (Kafka, Kinesis)
Experience with GCP-specific security and IAM best practices
Knowledge of Cloud Logging and Cloud Monitoring for data pipelines
Familiarity with Cloud Build and Cloud Deploy for CI/CD
Experience with streaming systems (Pub/Sub, Dataflow)
Knowledge of ML metadata management systems
Familiarity with data governance and security requirements
Experience with dbt or similar data transformation tools

Date Posted: 28 April 2025

Apply for this Job

Show me similar jobs

Send me jobs by email