Lead Data Engineer

San Francisco, California

Compunnel

Job Summary:

We are looking for a dynamic and experienced Lead Data Engineer to oversee the development and optimization of our data pipelines and infrastructure. This role combines hands-on engineering with leadership responsibilities, focusing on high-volume data processing, real-time analytics, and scalable data solutions. You will work closely with cross-functional teams, including product managers, business stakeholders, and technical leads, to drive data-driven decisions and enhance business outcomes.

Key Responsibilities:

Development Tasks:

Collect and analyze metrics based on user interactions to support data-driven decisions.
Visualize data effectively for business teams, ensuring insights are accessible and actionable.
Develop and redesign data pipelines using Kafka Streams for real-time data processing.
Implement scalable data solutions using Spring Boot (Java) and Databricks Spark Streaming.
Leadership Duties:
Lead measurement processes from requirements gathering through to production delivery.
Collaborate with team leads, business partners, and product managers to align data strategies with business goals.
Balance hands-on engineering (50%) with leadership duties, mentoring senior developers and data engineers.

Required Qualifications:

8â€"10 years of experience in data processing, data platforms, data lakes, big data, or data warehousing.
5+ years of strong proficiency in Python and Spark (mandatory).
3+ years of hands-on experience in ETL workflows using Spark and Python.
4+ years of experience with large-scale data loads, feature extraction, and data processing pipelines (batch, near real-time, real-time).
Solid understanding of data quality, data accuracy, and best practices in data governance.
3+ years of experience in building and deploying ML models in production environments.
Experience with Python deep learning libraries (PyTorch, TensorFlow, Keras) is preferred.
Hands-on experience with Large Language Models (LLMs) and transformers is a plus.
Proficiency in integrating with data stores such as SQL/NoSQL databases, in-memory stores like Redis, and data lakes (e.g., Delta Lake).
Strong experience with Kafka Streams, including producers and consumers.
Proficiency in Databricks or similar data lake platforms.
Experience with Java and Spring Boot for data processing tasks.
Familiarity with notebook environments like Jupyter Notebook.

Preferred Qualifications:

Ability to develop creative, unconventional solutions to complex data challenges.
Strong problem-solving skills with the ability to adapt to rapidly evolving technologies.
Initiative and self-motivation to take ownership of tasks and drive projects independently.

Certifications:

Databricks Certified Developer (preferred).
Cloud certifications (e.g., Azure, AWS) are a plus.

Work Environment:

Collaborative, fast-paced environment with a focus on data-driven innovation.
Opportunity to lead a team of data professionals while staying hands-on with cutting-edge technologies.

Primary Skills:

Python and PySpark
Kafka and Kafka Streams
MySQL and MySQL Heat
Azure Delta Lake
ETL Processes and Data Streaming with Spark
Kafka Integrations Using Spring Boot (Java)

Education: Bachelors Degree

Certification: Databricks Certified Developer

Date Posted: 04 April 2025

Apply for this Job

Show me similar jobs

Send me jobs by email