Lead Data Engineer

San Francisco, California

Compunnel
Apply for this Job
Job Summary:

We are looking for a dynamic and experienced Lead Data Engineer to oversee the development and optimization of our data pipelines and infrastructure. This role combines hands-on engineering with leadership responsibilities, focusing on high-volume data processing, real-time analytics, and scalable data solutions. You will work closely with cross-functional teams, including product managers, business stakeholders, and technical leads, to drive data-driven decisions and enhance business outcomes.

Key Responsibilities:

Development Tasks:
  • Collect and analyze metrics based on user interactions to support data-driven decisions.
  • Visualize data effectively for business teams, ensuring insights are accessible and actionable.
  • Develop and redesign data pipelines using Kafka Streams for real-time data processing.
  • Implement scalable data solutions using Spring Boot (Java) and Databricks Spark Streaming.
  • Leadership Duties:
  • Lead measurement processes from requirements gathering through to production delivery.
  • Collaborate with team leads, business partners, and product managers to align data strategies with business goals.
  • Balance hands-on engineering (50%) with leadership duties, mentoring senior developers and data engineers.
Required Qualifications:
  • 8â€"10 years of experience in data processing, data platforms, data lakes, big data, or data warehousing.
  • 5+ years of strong proficiency in Python and Spark (mandatory).
  • 3+ years of hands-on experience in ETL workflows using Spark and Python.
  • 4+ years of experience with large-scale data loads, feature extraction, and data processing pipelines (batch, near real-time, real-time).
  • Solid understanding of data quality, data accuracy, and best practices in data governance.
  • 3+ years of experience in building and deploying ML models in production environments.
  • Experience with Python deep learning libraries (PyTorch, TensorFlow, Keras) is preferred.
  • Hands-on experience with Large Language Models (LLMs) and transformers is a plus.
  • Proficiency in integrating with data stores such as SQL/NoSQL databases, in-memory stores like Redis, and data lakes (e.g., Delta Lake).
  • Strong experience with Kafka Streams, including producers and consumers.
  • Proficiency in Databricks or similar data lake platforms.
  • Experience with Java and Spring Boot for data processing tasks.
  • Familiarity with notebook environments like Jupyter Notebook.
Preferred Qualifications:
  • Ability to develop creative, unconventional solutions to complex data challenges.
  • Strong problem-solving skills with the ability to adapt to rapidly evolving technologies.
  • Initiative and self-motivation to take ownership of tasks and drive projects independently.
Certifications:
  • Databricks Certified Developer (preferred).
  • Cloud certifications (e.g., Azure, AWS) are a plus.
Work Environment:
  • Collaborative, fast-paced environment with a focus on data-driven innovation.
  • Opportunity to lead a team of data professionals while staying hands-on with cutting-edge technologies.
Primary Skills:
  • Python and PySpark
  • Kafka and Kafka Streams
  • MySQL and MySQL Heat
  • Azure Delta Lake
  • ETL Processes and Data Streaming with Spark
  • Kafka Integrations Using Spring Boot (Java)

Education: Bachelors Degree

Certification: Databricks Certified Developer
Date Posted: 04 April 2025
Apply for this Job