Computer Vision Engineer

San Jose, California

PEOPLE FORCE CONSULTING INC

As a Senior Data Scientist with expertise in Vision-Language Models (VLMs) and related technologies to lead the development of efficient, cost-effective multimodal AI solutions. The ideal candidate will have experience with advanced VLM frameworks such as VILA, Isaac, and VSS, and a proven track record of implementing production-grade VLMs for training and testing in real-world environments. A background in healthcare, particularly medical devices, is highly desirable. This role will focus on exploring and deploying state-of-the-art VLM methodologies on cloud platforms like AWS or Azure.

Experience: - 10+ Years

Location: - San Jose, CA, Waukesha, WI (100%onsite needed)

Educational Qualifications: - Education: Master s or Ph.D. in Computer Science, Data Science, Machine Learning, or a related field.

Responsibilities: -

VLM Development & Deployment:

Design, train, and deploy efficient Vision-Language Models (e.g., VILA, Isaac Sim) for multimodal applications.
Explore cost-effective methods such as knowledge distillation, modal-adaptive pruning, and LoRA fine-tuning to optimize training and inference.
Implement scalable pipelines for training/testing VLMs on cloud platforms (AWS SageMaker, Azure ML).

Multimodal AI Solutions:

Develop solutions that integrate vision and language capabilities for applications like image-text matching, visual question answering (VQA), and document data extraction.
Leverage interleaved image-text datasets and advanced techniques (e.g., cross-attention layers) to enhance model performance.

Healthcare Domain Expertise:

Apply VLMs to healthcare-specific use cases such as medical imaging analysis, position detection, motion detection and measurements.
Ensure compliance with healthcare standards while handling sensitive data.

Efficiency Optimization:

Evaluate trade-offs between model size, performance, and cost using techniques like elastic visual encoders or lightweight architectures.
Benchmark different VLMs (e.g., GPT-4V, Claude 3.5) for accuracy, speed, and cost-effectiveness on specific tasks.

Collaboration & Leadership:

Collaborate with cross-functional teams including engineers and domain experts to define project requirements.
Mentor junior team members and provide technical leadership on complex projects.

Experience:

Minimum of 10+ years of experience in machine learning or data science roles with a focus on vision-language models.
Proven expertise in deploying production-grade multimodal AI solutions.
Experience in healthcare or medical devices is highly preferred.

Technical Skills:

Proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow).
Hands-on experience with VLMs such as VILA, Isaac Sim, or VSS.
Familiarity with cloud platforms like AWS SageMaker or Azure ML Studio for scalable AI deployment.

Domain Knowledge:

Understanding of medical datasets (e.g., imaging data) and healthcare regulations.

Soft Skills:

Strong problem-solving skills with the ability to optimize models for real-world constraints.
Excellent communication skills to explain technical concepts to diverse stakeholders

Good to have skills: -

Vision-Language Models: VILA, Isaac Sim, EfficientVLM
Cloud Platforms: AWS SageMaker, Azure ML
Optimization Techniques: LoRA fine-tuning, modal-adaptive pruning
Multimodal Techniques: Cross-attention layers, interleaved image-text datasets
MLOps Tools: Docker, MLflow

Date Posted: 28 April 2025

Apply for this Job

Show me similar jobs

Send me jobs by email