Data Engineer
About the Team
The Enterprise Data Management (EDM) team at RA Capital is committed to delivering high-quality,
reliable data across the organization. With a focus on data integrity, accessibility, and compliance, EDM
ensures that data assets are optimized to support strategic business goals. The team manages the end-to-
end data lifecycle, collaborates closely with both internal stakeholders and external vendors, and plays a
vital role in developing robust data infrastructure to support RA Capital's decision-making processes.
As a Data Engineer within the Enterprise Data Management team, you will be responsible for:
• Developing, maintaining, and optimizing end-to-end data pipelines to support vendor data
ingestion, quality assurance, and integration into RA Capital's internal systems.
• Creating robust data integration mechanisms between the data warehouse and downstream
applications, ensuring seamless data flow and accessibility for end users.
• Creating and maintaining production-level code to match, standardize, and reconcile data from
multiple vendors, ensuring consistency and accuracy across various data sources. Utilize natural
language processing and other machine learning techniques to improve data matching and
validation.
• Design and optimize data models in Snowflake, ensuring performance, scalability, and ease of
access for analytics and reporting.
• Write clean, maintainable, and scalable code to manage data flows, build APIs, and integrate
systems as part of data pipeline solutions. Apply software development best practices, including
version control (e.g., Git) and CI/CD.
• Implementing data quality checks and controls to ensure that data meets integrity and
compliance standards.
• Collaborating with vendors on data formats, transformations, and delivery schedules to ensure
high-quality, timely data integration.
• Coordinating with IT and business teams to align data solutions with strategic goals, proactively
managing communication and issue resolution.
• Documenting data processes, pipeline architecture, and data management protocols, translating
technical concepts effectively for non-technical stakeholders.
• Tier 1 monitoring and support of the data platform including data pipelines.
• Maintain comprehensive documentation and adhere to software and data engineering best
practices, including data governance, security, and compliance.
Key Skills
• Data Engineering & Integration: Proficient in designing, building, and managing data pipelines
and ETL processes, including the use of frameworks for production-grade code.
• Software Development: Strong foundation in software development, including experience with
object-oriented programming, modular design, and API development. Familiarity with version
control (Git) and CI/CD processes.
• Data Matching & Standardization: Skilled in developing algorithms and methods to match and
standardize data across multiple vendor sources.
• Data Quality Management: Experience in implementing and monitoring data quality checks and
controls to maintain data accuracy.
• Technical Proficiency: Advanced SQL, Python, and Spark skills, with experience in Databricks or
Snowflake; familiarity with cloud-based data solutions is a plus.
• Vendor Management: Ability to coordinate with external data providers, track deliverables, and
address data discrepancies promptly.
• Communication & Stakeholder Management: Strong communication skills to convey technical
concepts and align data solutions with broader business objectives.
Requirements
We are looking for a highly skilled and detail-oriented Data Engineer with a passion for data quality,
integration, and vendor management. All applicants must meet the following requirements to be
considered:
Required:
• Must be authorized to work in the United States.
• Bachelor's degree (or higher) in Computer Science, Data Science, Information Technology,
Software Engineering, or a related field is required.
• Must have 3+ years of relevant work experience using SQL, Java, Python, and Spark; experience
with Databricks or Snowflake required.
• Must have 3+ years of relevant work experience in software development, data integration, vendor
data management, and production-level code for data matching and standardization.
• Strong documentation skills and the ability to communicate technical concepts effectively.
• Must be flexible and willing to work off hours as needed.
• Must be based in Massachusetts or willing to relocate.
Preferred:
• Experience working with unstructured as well as structured data
• Experience working with AWS related technologies such as S3, EC2, EBS
• Experience managing, supporting and developing pipelines in both Snowflake and Databricks
• Experience developing and preparing data for use by AI/ML applications
Date Posted: 21 December 2024
Apply for this Job