Senior Backend Data Infrastructure Engineer

San Francisco, California

Integral Privacy Technologies
Job Expired - Click here to search for similar jobs
Exciting Opportunity Alert.

This role involves building innovative data pipelines from scratch, dealing with data we've never encountered before, all while adhering to strict privacy guidelines. We are looking for someone who is eager to work in-person in San Francisco 4 days a week. As a startup, our priorities can change rapidly, so being adaptable, resourceful, and comfortable with uncertainty is crucial. If you seek a predictable role with well-established data processes, this isn't the position for you.

About Integral

At Integral, we are transforming the way organizations handle sensitive data efficiently and securely. Our platform is designed to provide compliant solutions for processing terabytes of sensitive information daily.

The Role

We are on the lookout for a Senior Backend Data Infrastructure Engineer to help architect and maintain our core data processing engine. You'll be responsible for designing and implementing high-performance data pipelines that handle massive datasets while adhering to rigorous compliance and security standards.

You will collaborate closely with our CTO and platform team in San Francisco, focusing on building and optimizing data pipelines that ingest data from various brokers, apply compliant schemas, and deliver this data to clean rooms and data lakes.

The challenge lies in supporting both scalability and the dynamic nature of data, as we process diverse data types to facilitate claims workflows and unstructured AI applications. Our aim is to limit assumptions on data ingestion while ensuring clarity on data delivery.

Technology Stack
  • PySpark for large-scale data processing
  • Delta Lake and streaming solutions
  • Data warehouses (BigQuery, Snowflake)
  • Data lakes (S3, GCS)
  • Databricks for compute management
Your Contributions
  • Design and implement scalable data pipelines processing over 1TB of data daily
  • Optimize performance and cost efficiency of the data pipelines
  • Develop robust schema validation and data quality assessments
  • Design efficient data storage and retrieval strategies
  • Maintain and enhance our PySpark codebase
  • Establish best practices for data engineering
  • Collaborate effectively with platform and customer-facing teams
What We Value
  • 5+ years of backend software engineering experience
  • Expertise in PySpark with large-scale data processing
  • Strong experience with Delta Lake and streaming paradigms
  • Proven ability to build pipelines processing >1TB of data
  • Familiarity with various data warehouses and lakes
  • Solid software engineering fundamentals
  • Mastery of optimization techniques in PySpark
  • A security-first mindset when handling data
Location & Job Arrangement
  • Based in San Francisco
  • In-office 4 days a week
What Makes This Role Unique
  • Engage with complex technical challenges at scale
  • Have a direct impact on core product functionalities
  • Opportunity to shape and enhance data engineering practices
  • Work with cutting-edge privacy technology
Please share your most significant experiences optimizing large-scale data pipelines, focusing on performance enhancements, reliability improvements, and cost efficiency while upholding high standards of data quality.

Date Posted: 06 May 2025
Job Expired - Click here to search for similar jobs