AI Engineer

Santa Clara, California

Diverse Lynx
Job Expired - Click here to search for similar jobs
AI+HPC infra requirement

looking for someone with Architectural and design experience also along with experience in handling 1000+ nodes.

Technical/Functional Skills -

Proficiency in RoCEv2, K8s, KVM, Ubuntu, Python, Shell, Go, Rust, GPU drivers, and Cluster interconnect with 200G/400G networking.

Managing GPU clusters optimizing GPU-based services/tools/software

Roles & Responsibilities -

Develop, implement, and maintain GPU-based clusters of 10 to 1000 nodes, ensuring optimal performance and availability.

Administer Client/AI platforms - Distributed Client services, LLMs, Vector-DB and AI inferencing, by managing deployments, resource allocation, monitoring, and security.

Collaborate with cross-functional teams to address AI infrastructure requirements, support AI-related projects, and provide technical expertise.

Monitor and evaluate the performance of AI systems and clusters, ensuring that they adhere to industry best practices and meet company standards.

Compile reports, document procedures, and publish recommendations for improving AI infrastructure and solutions.

Use AI/Client to continuously improve internal processes and tools that are used in end-to-end delivery of your services in this team

Diverse Lynx LLC is an Equal Employment Opportunity employer. All qualified applicants will receive due consideration for employment without any discrimination. All applicants will be evaluated solely on the basis of their ability, competence and their proven capability to perform the functions outlined in the corresponding role. We promote and support a diverse workforce across all levels in the company.

Date Posted: 16 May 2024
Job Expired - Click here to search for similar jobs