Senior Data/ML Engineering Position

Canada

eDNA Explorer

Job Expired - Click here to search for similar jobs

eDNA Explorer is expanding through a partnership with iTrackDNA in Canada to create eDNA Explorer Canada. We are building a cutting-edge software platform for processing and analyzing environmental DNA (eDNA) data. Our system processes biological samples to identify species based on their genetic material, integrates environmental data, and provides insights into biodiversity and ecological patterns. We're using modern cloud-native data engineering principles and AI to build robust, scalable pipelines for scientific data analysis.

Position Overview

We are seeking a Senior Data/ML Engineer to join our team working on the eDNA Explorer data processing pipelines. The ideal candidate will have strong Python development skills, experience with data orchestration frameworks, and a background in building cloud-based data processing systems. Knowledge of bioinformatics or genomics is a plus but not required.

This is a grant-funded position with the possibility of future hiring as an employee at the end of the grant.

Technology Stack

Our platform leverages the following technologies:

Core Technologies

Python: Primary development language (version 3.12)
Dagster: Data orchestration framework
Docker/Kubernetes: Containerization and orchestration
Google Cloud Platform: Primary cloud provider
Google Cloud Storage
BigQuery
Secret Manager
PostgreSQL: Relational database
SQLAlchemy: ORM for database interactions
Polars: High-performance data processing library

Data Science & Bioinformatics

scikit-learn: Machine learning library
plotly: Data visualization
Earth Engine API: Environmental data collection
Bioinformatics tools: DADA2, custom taxonomic classification tools

DevOps & Infrastructure

Poetry: Dependency management
Ruff: Linting and code quality
Helm/ArgoCD: Kubernetes deployment and CD
GitHub Actions: CI pipelines

Key Responsibilities

As a Senior Data/ML Engineer, you will:

Design, develop, and maintain data processing pipelines using Dagster
Implement high-performance data transformation operations using Python and Polars
Optimize cloud resource usage and cost-efficiency in GCP
Collaborate with bioinformaticians to implement scientific algorithms
Build and improve our ML features for taxonomic classification and feature importance
Develop robust error handling and logging systems for pipeline monitoring
Create comprehensive tests for pipeline components
Contribute to deployment and CI/CD processes
Participate in code reviews and technical design discussions

Current Projects

You will have the opportunity to work on several exciting initiatives:

Improved Sequence Analysis Pipeline: Enhance our taxonomic classification system to handle complex eDNA samples with greater accuracy
Feature Importance Framework: Further develop our machine learning approach to identify key environmental factors influencing species distribution, ecosystem health and deeper understanding of the system biodiversity that leads to better restoration and management efforts.
Terradactyl Integration: Expand our environmental data collection capabilities with additional data sources
Pipeline Performance Optimization: Improve processing speed and resource efficiency for large-scale sequence data
Developer Experience Improvements: Enhance testing, monitoring, and deployment systems

Required Qualifications

5+ years of professional software development experience, with at least 3 years focused on data engineering
Strong proficiency in Python development, including testing and performance optimization
Experience with data orchestration frameworks (Dagster, Airflow, Prefect, etc.)
Demonstrated experience with cloud platforms, preferably GCP
Knowledge of SQL and relational database design
Experience with containerization technologies (Docker, Kubernetes)
Comfort working in a collaborative, fast-paced environment
Ability to understand and implement complex data workflows

Preferred Qualifications

Experience with scientific or bioinformatics data processing
Background in machine learning, particularly scikit-learn
Knowledge of genomics or related biological fields
Experience with Polars or other high-performance data processing libraries
Familiarity with geographic information systems (GIS) or Earth Engine
Experience with CI/CD systems and automated testing

Education

Bachelor's degree in Computer Science, Data Science, Bioinformatics, or a related field
Advanced degree (MS/PhD) in a relevant field is a plus

Skills That Will Help You Succeed

Problem-solving: The ability to tackle complex data processing challenges
Adaptability: Comfort with learning new technologies and scientific concepts
Attention to detail: Precision is critical when working with scientific data
Communication: The ability to explain technical concepts to team members with diverse backgrounds
Initiative: Self-direction to identify improvements and implement solutions

Our Development Environment

You'll be working in a modern development environment with:

Git-based workflow with pull requests and code reviews
Cloud-based development environments
Containerized testing and deployment
Comprehensive CI/CD pipelines
Collaborative team with both engineers and scientists

Why Join Our Team?

Working at eDNA Explorer offers the opportunity to:

Apply cutting-edge data engineering to solve real environmental challenges
Work with a diverse team of engineers, data scientists, and biologists
Develop skills across the full stack of modern data technologies
Build systems that directly contribute to environmental research and biodiversity monitoring
Grow your career in an expanding field at the intersection of technology and biology

Location

This position is available as remote within Canada with some preference for candidates who can occasionally visit our offices located at the University of Victoria on Vancouver Island in beautiful British Columbia. Applicant must be a Canadian citizen or have a work permit to work in Canada.

The Helbing lab is situated in the Department of Biochemistry & Microbiology at the University of Victoria. The rest of the eDNA Explorer team is in the United States. Check out the lab website here: The eDNA Explorer platform can be viewed here: .

How to Apply

Please submit your resume and a brief cover letter explaining your interest in eDNA Explorer and this role. Include examples of relevant projects you've worked on, particularly those involving data pipelines, cloud infrastructure, or scientific computing.

Submit your application by email with the header "eDNA Explorer Canada SDML position" to Dr. Caren Helbing at . Applications will be evaluated on an ongoing rolling basis until the position is filled.

Date Posted: 23 May 2025

Job Expired - Click here to search for similar jobs