Job Title: Data Quality Engineer - onsite .
Location: ONSITE in either Bethlehem, PA (preferred ) or Holmdel, NJ. Job Type : Hybrid
Duration: 6 months
Opening : 1 Opening
.Seeking an experienced Data Engineer to be part of our Data and Analytics organization. You will be playing a key role in building and delivering best-in-class data and analytics solutions aimed at creating value and impact for the organization.
Responsibilities to include:
- Architect, build, and maintain scalable and reliable data pipelines including robust data quality as part of data pipeline which can be consumed by analytics and BI layer.
- Design, develop and implement low-latency, high-availability, and performant data applications and recommend & implement innovative engineering solutions.
- Design, develop, test and debug code in Python, SQL, PySpark, bash scripting as per company standards.
- Design and implement data quality framework and apply it to critical data pipelines to make the data layer robust and trustworthy for downstream consumers.
- Design and develop orchestration layer for data pipelines which are written in SQL, Python and PySpark.
- Apply and provide guidance on software engineering techniques like design patterns, code refactoring, framework design, code reusability, code versioning, performance optimization, and continuous build and Integration (CI/CD) to make the data analytics team robust and efficient.
- Performing all job functions consistent with company policies and procedures, including those which govern handling PHI and PII.
- Develop relationships with business team members by being proactive, displaying an increasing understanding of the business processes and by recommending innovative solutions.
- Communicate project output in terms of customer value, business objectives, and product opportunity.
Requirements:
- 5+ years of experience with Bachelors / Master's degree in computer science, Engineering, Applied mathematics or related field.
- Extensive hands-on development experience in Python, SQL and Bash, pytest.
- Extensive Experience in performance optimization of data pipelines.
- Extensive hands-on experience working with cloud data warehouse and data lake platforms like Databricks, Redshift or Snowflake.
- Familiarity with building and deploying scalable data pipelines to develop and deploy Data Solutions using Python, SQL, PySpark.
- Extensive experience in all stages of software development and expertise in applying software engineering best practices.
- Experience in developing and implementing Data Quality framework either home grown or using any open-source frameworks like Great Expectations, Soda, Deequ.
- Extensive experience in developing end-to-end orchestration layer for data pipelines using frameworks like Apache Airflow, Prefect, Databricks Workflow.
- Familiar with RESTful Webservices (REST APIs) to be able to integrate with other services.
- Familiarity with API Gateways like APIGEE to secure webservice endpoints.
- Familiarity with concurrency and parallelism.
- Familiarity with Data pipelines and Client development cycle.