Senior Data Engineer

Denver, Colorado

SAILTEAM.io
Apply for this Job
Senior Data Engineer
Team: ML
Job Description

Job Title: Senior Data Engineer
Location: Remote (Eastern Time Zone Hours)

Employment Type: Full-Time
About Us
We are a Computer Vision Product Company on a mission to dramatically increase the operational safety of critical rope applications by delivering real-time data to the right people before catastrophic failures occur.

Failures in critical rope applications are often due to inadequate visual inspection , a standard practice in industries such as Construction, Maritime Mooring, Mining, and Oil & Gas/Drilling . When these ropes fail, lives are lost, and company reputations suffer.

At Scope , we are leveraging the latest advancements in technology to solve this problem. Our current focus is Electric Utility Construction and Maintenance , where we equip operators with the ability to assess the break strength of their stringing lines-without destructive testing . This eliminates reliance on "educated guesses" and allows companies to confidently ensure their lines are fit for service .
What You'll Do
  • Architect and build scalable data pipelines and workflows using Dagster to move, transform, and make data available for machine learning and analytics.
  • Design and optimize storage solutions for large-scale industrial and vision data, ensuring efficient retrieval and accessibility for ML engineers.
  • Develop robust data ingestion frameworks for consuming live production images, video, and metadata in an extensible and scalable manner.
  • Collaborate with ML engineers to ensure data- both computer vision and ancillary metadata- is structured and processed optimally for experimentation and model training.
  • Work with Kubernetes-based environments to orchestrate and deploy data processing jobs.
  • Enhance CI/CD for data workflows , ensuring automated deployment and testing via GitLab CI/CD . We deploy on merge and you'll make that better, faster, safer, and cheaper.
  • Own and maintain AWS-based data infrastructure , leveraging Terraform for Infrastructure as Code.
  • Implement data governance best practices , including data quality validation, lineage tracking, and metadata management.
  • Optimize batch and real-time processing frameworks , incorporating best practices for performance, scalability, and reliability.
  • Act as a technical leader in data engineering, defining best practices and guiding future scaling efforts.
What We're Looking For

Must-Have Skills
  • 5+ years of experience in data engineering, with a focus on scalable, production-grade data infrastructure.
  • Strong Python skills with emphasis on type safety, functional programming patterns, and modern Python practices. The ideal candidate has used Rust, Scala, Kotlin, F , and/or a lisp dialect before.
  • Experience with data processing frameworks such as Pandas (with Pandera), PyArrow, or Dask.
  • Deep expertise in data orchestration tools , preferably Dagster (experience with Prefect, Airflow, NiFi, or similar tools is acceptable).
  • Experience with streaming and event-driven architectures such as Apache Ray Core, Kafka, Kinesis, Pulsar, Storm, or Dempsy, or real time data processing frameworks like Flink or Spark Streaming.
  • Hands-on experience with Kubernetes , particularly in data pipeline orchestration.
  • Experience deploying infrastructure via Terraform (or similar IaC tools).
  • Proficiency in Cloud Services, preferably AWS. S3, EKS, Lambda, Glue, and RDS (or other-cloud equivalents).
  • Strong database skills , including SQL, NoSQL, and columnar storage (e.g., Postgres, BigQuery, ClickHouse).
  • Experience with strongly-typed ORMs (e.g., SQLAlchemy/SQLModel , Hibernate, Diesel) and data validation frameworks (e.g., Pydantic, Great Expectations).
  • Comfortable with hybrid storage , combining databases and blob storage for large objects such as videos and computer vision datasets.
  • CI/CD expertise, preferably with GitLab for managing automated data pipeline deployments.
Nice-to-Have Skills
  • Familiarity with ML experiment tracking, metadata management, and data lineage tracking.
  • Understanding of ML workflows and how data engineering enables efficient model training/deployment.
  • Experience with embedding management , particularly for inference stores, using tools such as Chroma or pg vector.
  • Experience with video processing pipelines and efficient storage/retrieval of large media files.
What We Offer
  • A chance to own and shape the data infrastructure at a fast-growing computer vision AI company .
  • A highly collaborative, fast-paced environment working with cutting-edge ML and data engineering .
  • Competitive salary, annual incentive plan , and benefits.
  • Opportunities for growth and leadership as we scale our data team.
Date Posted: 28 February 2025
Apply for this Job