Senior Data Engineer Team: ML
Job DescriptionJob Title: Senior Data Engineer Location: Remote (Eastern Time Zone Hours)
Employment Type: Full-Time
About Us We are a Computer Vision Product Company on a mission to dramatically increase the operational safety of critical rope applications by delivering real-time data to the right people before catastrophic failures occur.
Failures in critical rope applications are often due to inadequate visual inspection , a standard practice in industries such as Construction, Maritime Mooring, Mining, and Oil & Gas/Drilling . When these ropes fail, lives are lost, and company reputations suffer.
At Scope , we are leveraging the latest advancements in technology to solve this problem. Our current focus is Electric Utility Construction and Maintenance , where we equip operators with the ability to assess the break strength of their stringing lines-without destructive testing . This eliminates reliance on "educated guesses" and allows companies to confidently ensure their lines are fit for service .
What You'll Do - Architect and build scalable data pipelines and workflows using Dagster to move, transform, and make data available for machine learning and analytics.
- Design and optimize storage solutions for large-scale industrial and vision data, ensuring efficient retrieval and accessibility for ML engineers.
- Develop robust data ingestion frameworks for consuming live production images, video, and metadata in an extensible and scalable manner.
- Collaborate with ML engineers to ensure data- both computer vision and ancillary metadata- is structured and processed optimally for experimentation and model training.
- Work with Kubernetes-based environments to orchestrate and deploy data processing jobs.
- Enhance CI/CD for data workflows , ensuring automated deployment and testing via GitLab CI/CD . We deploy on merge and you'll make that better, faster, safer, and cheaper.
- Own and maintain AWS-based data infrastructure , leveraging Terraform for Infrastructure as Code.
- Implement data governance best practices , including data quality validation, lineage tracking, and metadata management.
- Optimize batch and real-time processing frameworks , incorporating best practices for performance, scalability, and reliability.
- Act as a technical leader in data engineering, defining best practices and guiding future scaling efforts.
What We're Looking ForMust-Have Skills - 5+ years of experience in data engineering, with a focus on scalable, production-grade data infrastructure.
- Strong Python skills with emphasis on type safety, functional programming patterns, and modern Python practices. The ideal candidate has used Rust, Scala, Kotlin, F , and/or a lisp dialect before.
- Experience with data processing frameworks such as Pandas (with Pandera), PyArrow, or Dask.
- Deep expertise in data orchestration tools , preferably Dagster (experience with Prefect, Airflow, NiFi, or similar tools is acceptable).
- Experience with streaming and event-driven architectures such as Apache Ray Core, Kafka, Kinesis, Pulsar, Storm, or Dempsy, or real time data processing frameworks like Flink or Spark Streaming.
- Hands-on experience with Kubernetes , particularly in data pipeline orchestration.
- Experience deploying infrastructure via Terraform (or similar IaC tools).
- Proficiency in Cloud Services, preferably AWS. S3, EKS, Lambda, Glue, and RDS (or other-cloud equivalents).
- Strong database skills , including SQL, NoSQL, and columnar storage (e.g., Postgres, BigQuery, ClickHouse).
- Experience with strongly-typed ORMs (e.g., SQLAlchemy/SQLModel , Hibernate, Diesel) and data validation frameworks (e.g., Pydantic, Great Expectations).
- Comfortable with hybrid storage , combining databases and blob storage for large objects such as videos and computer vision datasets.
- CI/CD expertise, preferably with GitLab for managing automated data pipeline deployments.
Nice-to-Have Skills - Familiarity with ML experiment tracking, metadata management, and data lineage tracking.
- Understanding of ML workflows and how data engineering enables efficient model training/deployment.
- Experience with embedding management , particularly for inference stores, using tools such as Chroma or pg vector.
- Experience with video processing pipelines and efficient storage/retrieval of large media files.
What We Offer - A chance to own and shape the data infrastructure at a fast-growing computer vision AI company .
- A highly collaborative, fast-paced environment working with cutting-edge ML and data engineering .
- Competitive salary, annual incentive plan , and benefits.
- Opportunities for growth and leadership as we scale our data team.