eDNA Explorer is expanding through a partnership with iTrackDNA in Canada to create eDNA Explorer Canada. We are building a cutting-edge software platform for processing and analyzing environmental DNA (eDNA) data. Our system processes biological samples to identify species based on their genetic material, integrates environmental data, and provides insights into biodiversity and ecological patterns. We're using modern cloud-native data engineering principles and AI to build robust, scalable pipelines for scientific data analysis.
Position Overview
We are seeking a Senior Data/ML Engineer to join our team working on the eDNA Explorer data processing pipelines. The ideal candidate will have strong Python development skills, experience with data orchestration frameworks, and a background in building cloud-based data processing systems. Knowledge of bioinformatics or genomics is a plus but not required.
This is a grant-funded position with the possibility of future hiring as an employee at the end of the grant.
Technology Stack
Our platform leverages the following technologies:
Core Technologies
- Python: Primary development language (version 3.12)
- Dagster: Data orchestration framework
- Docker/Kubernetes: Containerization and orchestration
- Google Cloud Platform: Primary cloud provider
- Google Cloud Storage
- BigQuery
- Secret Manager
- PostgreSQL: Relational database
- SQLAlchemy: ORM for database interactions
- Polars: High-performance data processing library
Data Science & Bioinformatics
- scikit-learn: Machine learning library
- plotly: Data visualization
- Earth Engine API: Environmental data collection
- Bioinformatics tools: DADA2, custom taxonomic classification tools
DevOps & Infrastructure
- Poetry: Dependency management
- Ruff: Linting and code quality
- Helm/ArgoCD: Kubernetes deployment and CD
- GitHub Actions: CI pipelines
Key Responsibilities
As a Senior Data/ML Engineer, you will:
- Design, develop, and maintain data processing pipelines using Dagster
- Implement high-performance data transformation operations using Python and Polars
- Optimize cloud resource usage and cost-efficiency in GCP
- Collaborate with bioinformaticians to implement scientific algorithms
- Build and improve our ML features for taxonomic classification and feature importance
- Develop robust error handling and logging systems for pipeline monitoring
- Create comprehensive tests for pipeline components
- Contribute to deployment and CI/CD processes
- Participate in code reviews and technical design discussions
Current Projects
You will have the opportunity to work on several exciting initiatives:
- Improved Sequence Analysis Pipeline: Enhance our taxonomic classification system to handle complex eDNA samples with greater accuracy
- Feature Importance Framework: Further develop our machine learning approach to identify key environmental factors influencing species distribution, ecosystem health and deeper understanding of the system biodiversity that leads to better restoration and management efforts.
- Terradactyl Integration: Expand our environmental data collection capabilities with additional data sources
- Pipeline Performance Optimization: Improve processing speed and resource efficiency for large-scale sequence data
- Developer Experience Improvements: Enhance testing, monitoring, and deployment systems
Required Qualifications
- 5+ years of professional software development experience, with at least 3 years focused on data engineering
- Strong proficiency in Python development, including testing and performance optimization
- Experience with data orchestration frameworks (Dagster, Airflow, Prefect, etc.)
- Demonstrated experience with cloud platforms, preferably GCP
- Knowledge of SQL and relational database design
- Experience with containerization technologies (Docker, Kubernetes)
- Comfort working in a collaborative, fast-paced environment
- Ability to understand and implement complex data workflows
Preferred Qualifications
- Experience with scientific or bioinformatics data processing
- Background in machine learning, particularly scikit-learn
- Knowledge of genomics or related biological fields
- Experience with Polars or other high-performance data processing libraries
- Familiarity with geographic information systems (GIS) or Earth Engine
- Experience with CI/CD systems and automated testing
Education
- Bachelor's degree in Computer Science, Data Science, Bioinformatics, or a related field
- Advanced degree (MS/PhD) in a relevant field is a plus
Skills That Will Help You Succeed
- Problem-solving: The ability to tackle complex data processing challenges
- Adaptability: Comfort with learning new technologies and scientific concepts
- Attention to detail: Precision is critical when working with scientific data
- Communication: The ability to explain technical concepts to team members with diverse backgrounds
- Initiative: Self-direction to identify improvements and implement solutions
Our Development Environment
You'll be working in a modern development environment with:
- Git-based workflow with pull requests and code reviews
- Cloud-based development environments
- Containerized testing and deployment
- Comprehensive CI/CD pipelines
- Collaborative team with both engineers and scientists
Why Join Our Team?
Working at eDNA Explorer offers the opportunity to:
- Apply cutting-edge data engineering to solve real environmental challenges
- Work with a diverse team of engineers, data scientists, and biologists
- Develop skills across the full stack of modern data technologies
- Build systems that directly contribute to environmental research and biodiversity monitoring
- Grow your career in an expanding field at the intersection of technology and biology
Location
This position is available as remote within Canada with some preference for candidates who can occasionally visit our offices located at the University of Victoria on Vancouver Island in beautiful British Columbia. Applicant must be a Canadian citizen or have a work permit to work in Canada.
The Helbing lab is situated in the Department of Biochemistry & Microbiology at the University of Victoria. The rest of the eDNA Explorer team is in the United States. Check out the lab website here: The eDNA Explorer platform can be viewed here: .
How to Apply
Please submit your resume and a brief cover letter explaining your interest in eDNA Explorer and this role. Include examples of relevant projects you've worked on, particularly those involving data pipelines, cloud infrastructure, or scientific computing.
Submit your application by email with the header "eDNA Explorer Canada SDML position" to Dr. Caren Helbing at . Applications will be evaluated on an ongoing rolling basis until the position is filled.