Role Overview
We are seeking an experienced AI Infrastructure Engineer to spearhead the development, deployment, and ongoing optimization of machine learning and artificial intelligence systems. This individual will work cross-functionally to design scalable solutions that bridge cutting-edge research with critical business applications.
What You'll Do
- Lead the design, implementation, and maintenance of AI systems from prototype to production.
- Partner closely with engineers, quantitative researchers, traders, and data scientists to identify high-value opportunities for AI and ML integration across the organization.
- Build automated pipelines for model retraining, validation, and monitoring to ensure system stability and minimal operational disruption.
- Act as a key contributor to the selection and integration of AI/ML frameworks, optimizing usage across diverse compute environments.
- Architect and manage robust systems for feature engineering, including the development of feature stores and model registries.
- Develop internal platforms that support efficient, reproducible machine learning experimentation at scale.
- Diagnose and resolve computational inefficiencies related to GPU and CPU resource utilization.
- Stay at the forefront of AI advancements and bring innovative techniques into the technology stack.
What We're Looking For
- Degree in Computer Science, Artificial Intelligence, Machine Learning, or a closely related field; advanced degrees are a plus.
- Minimum of 3 years of professional experience building AI/ML-driven applications.
- Strong foundation in machine learning principles, algorithms, and real-world applications.
- Expertise in Python and familiarity with best practices in large-scale software development.
- Proven experience delivering and maintaining machine learning models in live production environments.
- Deep familiarity with MLOps workflows, including model versioning, deployment automation, and monitoring.
- Proficiency with ML frameworks such as TensorFlow, PyTorch, ONNX, or TensorRT.
- Hands-on experience with Large Language Models (LLMs), including techniques such as retrieval-augmented generation (RAG) and model fine-tuning.
- Solid understanding of the compute and storage architectures necessary to support AI/ML initiatives.
- Strong problem-solving mindset, with the ability to independently troubleshoot and optimize complex systems.
- Excellent communication skills and a collaborative approach to working across technical and business teams.