Senior Backend Data Infrastructure Engineer

San Francisco, California

Integral Privacy Technologies

Job Expired - Click here to search for similar jobs

Exciting Opportunity Alert.

This role involves building innovative data pipelines from scratch, dealing with data we've never encountered before, all while adhering to strict privacy guidelines. We are looking for someone who is eager to work in-person in San Francisco 4 days a week. As a startup, our priorities can change rapidly, so being adaptable, resourceful, and comfortable with uncertainty is crucial. If you seek a predictable role with well-established data processes, this isn't the position for you.

About Integral

At Integral, we are transforming the way organizations handle sensitive data efficiently and securely. Our platform is designed to provide compliant solutions for processing terabytes of sensitive information daily.

The Role

We are on the lookout for a Senior Backend Data Infrastructure Engineer to help architect and maintain our core data processing engine. You'll be responsible for designing and implementing high-performance data pipelines that handle massive datasets while adhering to rigorous compliance and security standards.

You will collaborate closely with our CTO and platform team in San Francisco, focusing on building and optimizing data pipelines that ingest data from various brokers, apply compliant schemas, and deliver this data to clean rooms and data lakes.

The challenge lies in supporting both scalability and the dynamic nature of data, as we process diverse data types to facilitate claims workflows and unstructured AI applications. Our aim is to limit assumptions on data ingestion while ensuring clarity on data delivery.

Technology Stack

PySpark for large-scale data processing
Delta Lake and streaming solutions
Data warehouses (BigQuery, Snowflake)
Data lakes (S3, GCS)
Databricks for compute management

Your Contributions

Design and implement scalable data pipelines processing over 1TB of data daily
Optimize performance and cost efficiency of the data pipelines
Develop robust schema validation and data quality assessments
Design efficient data storage and retrieval strategies
Maintain and enhance our PySpark codebase
Establish best practices for data engineering
Collaborate effectively with platform and customer-facing teams

What We Value

5+ years of backend software engineering experience
Expertise in PySpark with large-scale data processing
Strong experience with Delta Lake and streaming paradigms
Proven ability to build pipelines processing >1TB of data
Familiarity with various data warehouses and lakes
Solid software engineering fundamentals
Mastery of optimization techniques in PySpark
A security-first mindset when handling data

Location & Job Arrangement

Based in San Francisco
In-office 4 days a week

What Makes This Role Unique

Engage with complex technical challenges at scale
Have a direct impact on core product functionalities
Opportunity to shape and enhance data engineering practices
Work with cutting-edge privacy technology

Please share your most significant experiences optimizing large-scale data pipelines, focusing on performance enhancements, reliability improvements, and cost efficiency while upholding high standards of data quality.

Date Posted: 06 May 2025

Job Expired - Click here to search for similar jobs