Data Engineer

Philadelphia, Pennsylvania

eTeam
Apply for this Job
Role:Data Engineer
Location: Philadelphia, PA
Duration: 07 Months

Job Description:


• Our client is seeking a highly skilled Data Engineer to design, build, and maintain scalable data platforms that enable large-scale ingestion, storage, processing, and analysis of structured and unstructured data.

• This role will focus on constructing data products (data lake / data warehouse), optimizing data pipelines, and implementing robust ETL workflows to support analytics, machine learning, and operational reporting.

• The ideal candidate will be proficient in distributed computing, cloud-based data architectures (GCP), and modern data processing frameworks.

• Experience with real-time data streaming (Kafka, Apache Beam), MLOps, and infrastructure automation (Terraform, Jenkins) is highly preferred.

• Data Platform & Architecture Development

• Design, implement, and maintain scalable data platforms for efficient data storage, processing, and retrieval.

• Build cloud-native and distributed data systems that enable self-service analytics, real-time data processing, and AI-driven decision-making.

• Develop data models, schemas, and transformation pipelines that support evolving business needs while ensuring operational stability.

• Apply best practices in data modeling, indexing, and partitioning to optimize query performance, cost efficiency, considering best practices for Sustainability.

• ETL, Data Pipelines & Streaming Processing

• Build and maintain highly efficient ETL pipelines using SQL, Python, to process large-scale datasets.

• Implement real-time data streaming pipelines using Kafka, Apache Beam, or equivalent technologies.

• Develop reusable internal data processing tools to streamline operations and empower teams across the organization.

• Write advanced SQL queries for extracting, transforming, and loading (ETL) data with a focus on execution efficiency.

• Ensure data validation, quality monitoring, and governance using automated processes and dashboards.

• MLOps & Cloud-Based Data Infrastructure

• Deploy machine learning pipelines with MLOps best practices to support AI and predictive analytics applications.

• Optimize data pipelines for ML models, ensuring seamless integration between data engineering and machine learning workflows.

• Work with cloud platforms (GCP) to manage data storage, processing, and security.

• Utilize Terraform, Jenkins, CI/CD tools to automate data pipeline deployments and infrastructure management.

• Collaboration & Agile Development

• Work in Agile/DevOps teams, collaborating closely with data scientists, software engineers, and business stakeholders.

• Advocate for data-driven decision-making, educating teams on best practices in data architecture and engineering.

• 5+ years of experience as a Data Engineer working with large-scale data processing.

• Strong proficiency in SQL for data transformation, optimization, and analytics.

• Expertise in programming languages (Python, Java, Scala, or Go) with an understanding of functional and object-oriented programming paradigms.

• Experience with distributed computing frameworks.

• Proficiency in cloud-based data engineering on AWS, GCP, or Azure.

• Strong knowledge of data modeling, data governance, and schema design.

• Experience with CI/CD tools (Jenkins, Terraform) for infrastructure automation.

• Experience with real-time data streaming (Kafka, or equivalent).

• Strong understanding of MLOps and integrating data engineering with ML pipelines.

• Familiarity with knowledge graphs and GraphQL APIs for data relationships.

• Background in retail, customer classification, and personalization systems.

• Knowledge of business intelligence tools and visualization platforms.

• Retail industry experience, specifically Production of Consumer Goods
Date Posted: 12 May 2025
Apply for this Job