Role:Data Engineer
Location: Philadelphia, PA
Duration: 07 Months
Job Description:
• Our client is seeking a highly skilled Data Engineer to design, build, and maintain scalable data platforms that enable large-scale ingestion, storage, processing, and analysis of structured and unstructured data.
• This role will focus on constructing data products (data lake / data warehouse), optimizing data pipelines, and implementing robust ETL workflows to support analytics, machine learning, and operational reporting.
• The ideal candidate will be proficient in distributed computing, cloud-based data architectures (GCP), and modern data processing frameworks.
• Experience with real-time data streaming (Kafka, Apache Beam), MLOps, and infrastructure automation (Terraform, Jenkins) is highly preferred.
• Data Platform & Architecture Development
• Design, implement, and maintain scalable data platforms for efficient data storage, processing, and retrieval.
• Build cloud-native and distributed data systems that enable self-service analytics, real-time data processing, and AI-driven decision-making.
• Develop data models, schemas, and transformation pipelines that support evolving business needs while ensuring operational stability.
• Apply best practices in data modeling, indexing, and partitioning to optimize query performance, cost efficiency, considering best practices for Sustainability.
• ETL, Data Pipelines & Streaming Processing
• Build and maintain highly efficient ETL pipelines using SQL, Python, to process large-scale datasets.
• Implement real-time data streaming pipelines using Kafka, Apache Beam, or equivalent technologies.
• Develop reusable internal data processing tools to streamline operations and empower teams across the organization.
• Write advanced SQL queries for extracting, transforming, and loading (ETL) data with a focus on execution efficiency.
• Ensure data validation, quality monitoring, and governance using automated processes and dashboards.
• MLOps & Cloud-Based Data Infrastructure
• Deploy machine learning pipelines with MLOps best practices to support AI and predictive analytics applications.
• Optimize data pipelines for ML models, ensuring seamless integration between data engineering and machine learning workflows.
• Work with cloud platforms (GCP) to manage data storage, processing, and security.
• Utilize Terraform, Jenkins, CI/CD tools to automate data pipeline deployments and infrastructure management.
• Collaboration & Agile Development
• Work in Agile/DevOps teams, collaborating closely with data scientists, software engineers, and business stakeholders.
• Advocate for data-driven decision-making, educating teams on best practices in data architecture and engineering.
• 5+ years of experience as a Data Engineer working with large-scale data processing.
• Strong proficiency in SQL for data transformation, optimization, and analytics.
• Expertise in programming languages (Python, Java, Scala, or Go) with an understanding of functional and object-oriented programming paradigms.
• Experience with distributed computing frameworks.
• Proficiency in cloud-based data engineering on AWS, GCP, or Azure.
• Strong knowledge of data modeling, data governance, and schema design.
• Experience with CI/CD tools (Jenkins, Terraform) for infrastructure automation.
• Experience with real-time data streaming (Kafka, or equivalent).
• Strong understanding of MLOps and integrating data engineering with ML pipelines.
• Familiarity with knowledge graphs and GraphQL APIs for data relationships.
• Background in retail, customer classification, and personalization systems.
• Knowledge of business intelligence tools and visualization platforms.
• Retail industry experience, specifically Production of Consumer Goods
Date Posted: 12 May 2025
Apply for this Job