Job Summary: We are looking for a dynamic and experienced Lead Data Engineer to oversee the development and optimization of our data pipelines and infrastructure. This role combines hands-on engineering with leadership responsibilities, focusing on high-volume data processing, real-time analytics, and scalable data solutions. You will work closely with cross-functional teams, including product managers, business stakeholders, and technical leads, to drive data-driven decisions and enhance business outcomes.
Key Responsibilities: Development Tasks: - Collect and analyze metrics based on user interactions to support data-driven decisions.
- Visualize data effectively for business teams, ensuring insights are accessible and actionable.
- Develop and redesign data pipelines using Kafka Streams for real-time data processing.
- Implement scalable data solutions using Spring Boot (Java) and Databricks Spark Streaming.
- Leadership Duties:
- Lead measurement processes from requirements gathering through to production delivery.
- Collaborate with team leads, business partners, and product managers to align data strategies with business goals.
- Balance hands-on engineering (50%) with leadership duties, mentoring senior developers and data engineers.
Required Qualifications: - 8â€"10 years of experience in data processing, data platforms, data lakes, big data, or data warehousing.
- 5+ years of strong proficiency in Python and Spark (mandatory).
- 3+ years of hands-on experience in ETL workflows using Spark and Python.
- 4+ years of experience with large-scale data loads, feature extraction, and data processing pipelines (batch, near real-time, real-time).
- Solid understanding of data quality, data accuracy, and best practices in data governance.
- 3+ years of experience in building and deploying ML models in production environments.
- Experience with Python deep learning libraries (PyTorch, TensorFlow, Keras) is preferred.
- Hands-on experience with Large Language Models (LLMs) and transformers is a plus.
- Proficiency in integrating with data stores such as SQL/NoSQL databases, in-memory stores like Redis, and data lakes (e.g., Delta Lake).
- Strong experience with Kafka Streams, including producers and consumers.
- Proficiency in Databricks or similar data lake platforms.
- Experience with Java and Spring Boot for data processing tasks.
- Familiarity with notebook environments like Jupyter Notebook.
Preferred Qualifications: - Ability to develop creative, unconventional solutions to complex data challenges.
- Strong problem-solving skills with the ability to adapt to rapidly evolving technologies.
- Initiative and self-motivation to take ownership of tasks and drive projects independently.
C
ertifications: - Databricks Certified Developer (preferred).
- Cloud certifications (e.g., Azure, AWS) are a plus.
Work Environment: - Collaborative, fast-paced environment with a focus on data-driven innovation.
- Opportunity to lead a team of data professionals while staying hands-on with cutting-edge technologies.
Primary Skills: - Python and PySpark
- Kafka and Kafka Streams
- MySQL and MySQL Heat
- Azure Delta Lake
- ETL Processes and Data Streaming with Spark
- Kafka Integrations Using Spring Boot (Java)
Education: Bachelors Degree
Certification: Databricks Certified Developer