Job Description Job Summary: We are looking for an experienced Senior Data Engineer with a strong background in Google Cloud Platform (GCP) and Apache Spark to join our dynamic team. You will be responsible for designing, building, and optimizing scalable data pipelines, leveraging GCP services and Spark to handle large-scale data processing and analytics. You will play a key role in shaping the architecture of our data platform and work closely with cross-functional teams to enable data-driven decision-making.
Key Responsibilities: - Design & Build Scalable Data Pipelines: Architect, build, and optimize highly efficient data pipelines using Apache Spark on Google Cloud Platform (GCP) (e.g., BigQuery, Dataflow, Dataproc, Pub/Sub, etc.).
- Data Processing & Transformation: Work with large volumes of structured and unstructured data, developing data processing and transformation workflows that support business intelligence and analytics use cases.
- Collaborate with Cross-Functional Teams: Work closely with Data Scientists, Business Intelligence teams, and Product teams to understand business requirements and deliver scalable data solutions.
- Big Data Engineering: Utilize Spark to process and analyze large datasets in distributed computing environments, ensuring data processing tasks are efficient and scalable.
- Optimize Performance & Cost Efficiency: Fine-tune the performance of data workflows and reduce processing costs through the effective use of GCP services and Spark performance optimizations (e.g., partitioning, caching, memory management).
- Cloud Infrastructure Management: Manage and optimize cloud resources in GCP, ensuring high availability, scalability, and reliability of data pipelines and processing jobs.
- ETL & Data Integration: Design and implement complex ETL workflows, including data extraction, transformation, and loading from multiple source systems into cloud-based data warehouses or data lakes.
- Data Quality & Governance: Ensure data quality and consistency across pipelines and adhere to data governance, security, and privacy standards.
- Mentorship & Leadership: Provide technical leadership and mentorship to junior data engineers and foster a culture of best practices in data engineering.
- Monitoring & Troubleshooting: Implement monitoring solutions to track pipeline performance, set up alerting for failures, and troubleshoot any issues in the data processing workflows.
- Documentation & Reporting: Create detailed technical documentation and reports to communicate data pipeline designs, performance metrics, and optimizations to stakeholders.
Skills & Qualifications: - Proven Experience: 5+ years of hands-on experience in data engineering, with strong expertise in Google Cloud Platform (GCP) and Apache Spark.
- GCP Services Expertise: Experience with GCP services such as BigQuery, Dataflow, Dataproc, Pub/Sub, Cloud Storage, Cloud Composer, and Cloud Functions.
- Big Data Technologies: Proficiency in working with Apache Spark (PySpark, Scala, or Java), Hadoop, and Kafka for building distributed data processing pipelines.
- ETL Process Design: Expertise in designing and implementing complex ETL workflows and understanding of data ingestion, transformation, and storage.
- Programming Skills: Strong programming skills in Python, Scala, or Java, with hands-on experience in big data frameworks (e.g., Apache Spark).
- SQL & NoSQL Databases: Expertise in SQL (BigQuery, PostgreSQL, etc.) and knowledge of NoSQL databases (e.g., MongoDB, Cassandra).
- Data Warehousing: Experience building and managing data warehouses, especially using BigQuery or similar cloud-based storage systems.
- Performance Optimization: Expertise in optimizing Spark jobs and cloud-based data workflows for performance, scalability, and cost efficiency.
- Cloud Infrastructure Management: Familiarity with cloud-native DevOps practices, containerization (e.g., Docker), and CI/CD pipelines.
- Data Governance & Security: Strong knowledge of data privacy, governance, and security best practices in cloud environments.
- Version Control & Collaboration: Proficient in using version control tools (e.g., Git) and agile development practices.
- Education: Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related field. Certifications in GCP (e.g., Google Cloud Professional Data Engineer) are a plus.
Preferred Qualifications: - Real-Time Data Processing: Knowledge of real-time data processing tools such as Apache Kafka or Google Pub/Sub.
Personal Attributes: - Leadership: Strong leadership skills with a track record of leading data engineering teams and driving initiatives that improve data workflows.
- Problem-Solving: Excellent analytical and problem-solving skills, particularly in distributed computing and large-scale data processing.
- Collaboration: Effective communicator who can collaborate with technical and non-technical stakeholders.
- Adaptability: Ability to thrive in a fast-paced, constantly evolving environment and embrace new technologies.
- Mentorship: Passion for coaching and mentoring junior team members to develop their technical skills.
Why Join Us: - Innovative Work Environment: Join a team working with cutting-edge technologies to build scalable data solutions.
- Career Growth: Opportunities to expand your expertise in GCP and Spark, and work on exciting and complex data engineering projects.
- Competitive Compensation: Attractive salary, benefits, and opportunities for career advancement.