About Us:
At GMI, we are at the forefront of scalable AI infrastructure solutions. Our platforms power state-of-the-art machine learning, enabling cutting-edge applications in the generative AI domain. As a fast-moving and innovative team, we thrive on leveraging open-source solutions and industry best practices to deliver robust, high-performance AI systems for our clients.
About the Role:
We are seeking a Software Engineering Intern who will focus on adapting and optimizing open-source foundation models for our GPU inference platform. You will work closely with experienced engineers and AI researchers, gaining hands-on exposure to large-scale model deployment techniques. This is an opportunity to build valuable skills in model optimization, GPU acceleration, and systems-level engineering while contributing to the next generation of AI-powered products.
Key Responsibilities:
- Model Adaptation & Integration: Adapt open-source foundation models (e.g., LLMs, vision transformers, multimodal models) to run efficiently on our custom GPU inference infrastructure.
- Performance Optimization: Identify bottlenecks in model inference pipelines, implement GPU kernels, and optimize code to reduce latency and improve throughput.
- Platform Tooling & Automation: Develop scripts and tooling for automating model conversion, quantization, and configuration processes to streamline deployment workflows.
- Testing & Validation: Implement benchmarking tests and validation suites to ensure model accuracy, reliability, and performance meet internal standards.
- Collaboration with Cross-Functional Teams: Work closely with machine learning researchers, MLOps engineers, and infrastructure teams to refine performance strategies and ensure smooth integration of foundation models into production environments.
- Documentation & Knowledge Sharing: Document adaptation procedures, best practices, and lessons learned. Contribute to internal knowledge bases and present findings in team meetings.
Qualifications:
- Educational Background: Currently pursuing a Graduate degree in Computer Science, Electrical Engineering, or a related technical field.
- Programming Skills: Proficiency in Python and familiarity with go and CUDA is a plus.
- Foundational Knowledge in Machine Learning: Understanding of attention based models, PyTorch, and GPU-accelerated computing.
- Problem-Solving Mindset: Strong analytical skills, with the ability to troubleshoot performance issues and propose innovative optimization strategies.
- Team Player: Excellent communication skills, eagerness to learn, and the ability to collaborate effectively with diverse teams.
What You'll Gain:
- Real-world exposure to large-scale, production-grade AI deployments.
- Hands-on experience with state-of-the-art models and GPU acceleration techniques.
- Mentorship from experienced engineers and researchers.
- Opportunities to impact performance-critical aspects of cutting-edge AI products.
If you're passionate about AI systems engineering and excited to work at the intersection of machine learning and high-performance computing, we encourage you to apply.