REMOTE :
Please send only top two profiles for now. Kindly send with test report. Kindly send it to this email thread.
Client's exp. Is a plus.
Key Responsibilities: - Design and implement scalable data processing pipelines for ML training and validation
- Build and maintain feature stores with support for both batch and real-time features
- Develop data quality monitoring, validation, and testing frameworks
- Create systems for dataset versioning, lineage tracking, and reproducibility
- Implement automated data documentation and discovery tools
- Design efficient data storage and access patterns for ML workloads
- Partner with data scientists to optimize data preparation workflows
Technical Requirements: - 7+ years of software engineering experience, with 3+ years in data infrastructure
- Strong expertise in GCP's data and ML infrastructure:
- BigQuery for data warehousing
- Dataflow for data processing
- Cloud Storage for data lakes
- Vertex AI Feature Store
- Cloud Composer (managed Airflow)
- Dataproc for Spark workloads
- Deep expertise in data processing frameworks (Spark, Beam, Flink)
- Experience with feature stores (Feast, Tecton) and data versioning tools
- Proficiency in Python and SQL
- Experience with data quality and testing frameworks
- Knowledge of data pipeline orchestration (Airflow, Dagster)
Nice to Have: - Experience with streaming systems (Kafka, Kinesis)
- Experience with GCP-specific security and IAM best practices
- Knowledge of Cloud Logging and Cloud Monitoring for data pipelines
- Familiarity with Cloud Build and Cloud Deploy for CI/CD
- Experience with streaming systems (Pub/Sub, Dataflow)
- Knowledge of ML metadata management systems
- Familiarity with data governance and security requirements
- Experience with dbt or similar data transformation tools