ML Data Infrastructure Engineer

Sunnyvale, California

Bayone
Apply for this Job
REMOTE :
Please send only top two profiles for now. Kindly send with test report. Kindly send it to this email thread.

Client's exp. Is a plus.

Key Responsibilities:
  • Design and implement scalable data processing pipelines for ML training and validation
  • Build and maintain feature stores with support for both batch and real-time features
  • Develop data quality monitoring, validation, and testing frameworks
  • Create systems for dataset versioning, lineage tracking, and reproducibility
  • Implement automated data documentation and discovery tools
  • Design efficient data storage and access patterns for ML workloads
  • Partner with data scientists to optimize data preparation workflows
Technical Requirements:
  • 7+ years of software engineering experience, with 3+ years in data infrastructure
  • Strong expertise in GCP's data and ML infrastructure:
    • BigQuery for data warehousing
    • Dataflow for data processing
    • Cloud Storage for data lakes
    • Vertex AI Feature Store
    • Cloud Composer (managed Airflow)
    • Dataproc for Spark workloads
  • Deep expertise in data processing frameworks (Spark, Beam, Flink)
  • Experience with feature stores (Feast, Tecton) and data versioning tools
  • Proficiency in Python and SQL
  • Experience with data quality and testing frameworks
  • Knowledge of data pipeline orchestration (Airflow, Dagster)
Nice to Have:
  • Experience with streaming systems (Kafka, Kinesis)
  • Experience with GCP-specific security and IAM best practices
  • Knowledge of Cloud Logging and Cloud Monitoring for data pipelines
  • Familiarity with Cloud Build and Cloud Deploy for CI/CD
  • Experience with streaming systems (Pub/Sub, Dataflow)
  • Knowledge of ML metadata management systems
  • Familiarity with data governance and security requirements
  • Experience with dbt or similar data transformation tools
Date Posted: 28 April 2025
Apply for this Job