Computer Vision Engineer

San Jose, California

PEOPLE FORCE CONSULTING INC
Apply for this Job
As a Senior Data Scientist with expertise in Vision-Language Models (VLMs) and related technologies to lead the development of efficient, cost-effective multimodal AI solutions. The ideal candidate will have experience with advanced VLM frameworks such as VILA, Isaac, and VSS, and a proven track record of implementing production-grade VLMs for training and testing in real-world environments. A background in healthcare, particularly medical devices, is highly desirable. This role will focus on exploring and deploying state-of-the-art VLM methodologies on cloud platforms like AWS or Azure.

Experience: - 10+ Years

Location: - San Jose, CA, Waukesha, WI (100%onsite needed)

Educational Qualifications: - Education: Master s or Ph.D. in Computer Science, Data Science, Machine Learning, or a related field.

Responsibilities: -

VLM Development & Deployment:
  • Design, train, and deploy efficient Vision-Language Models (e.g., VILA, Isaac Sim) for multimodal applications.
  • Explore cost-effective methods such as knowledge distillation, modal-adaptive pruning, and LoRA fine-tuning to optimize training and inference.
  • Implement scalable pipelines for training/testing VLMs on cloud platforms (AWS SageMaker, Azure ML).
Multimodal AI Solutions:
  • Develop solutions that integrate vision and language capabilities for applications like image-text matching, visual question answering (VQA), and document data extraction.
  • Leverage interleaved image-text datasets and advanced techniques (e.g., cross-attention layers) to enhance model performance.
Healthcare Domain Expertise:
  • Apply VLMs to healthcare-specific use cases such as medical imaging analysis, position detection, motion detection and measurements.
  • Ensure compliance with healthcare standards while handling sensitive data.
Efficiency Optimization:
  • Evaluate trade-offs between model size, performance, and cost using techniques like elastic visual encoders or lightweight architectures.
  • Benchmark different VLMs (e.g., GPT-4V, Claude 3.5) for accuracy, speed, and cost-effectiveness on specific tasks.
Collaboration & Leadership:
  • Collaborate with cross-functional teams including engineers and domain experts to define project requirements.
  • Mentor junior team members and provide technical leadership on complex projects.
Experience:
  • Minimum of 10+ years of experience in machine learning or data science roles with a focus on vision-language models.
  • Proven expertise in deploying production-grade multimodal AI solutions.
  • Experience in healthcare or medical devices is highly preferred.
Technical Skills:
  • Proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow).
  • Hands-on experience with VLMs such as VILA, Isaac Sim, or VSS.
  • Familiarity with cloud platforms like AWS SageMaker or Azure ML Studio for scalable AI deployment.
Domain Knowledge:
  • Understanding of medical datasets (e.g., imaging data) and healthcare regulations.
Soft Skills:
  • Strong problem-solving skills with the ability to optimize models for real-world constraints.
  • Excellent communication skills to explain technical concepts to diverse stakeholders
Good to have skills: -
  • Vision-Language Models: VILA, Isaac Sim, EfficientVLM
  • Cloud Platforms: AWS SageMaker, Azure ML
  • Optimization Techniques: LoRA fine-tuning, modal-adaptive pruning
  • Multimodal Techniques: Cross-attention layers, interleaved image-text datasets
  • MLOps Tools: Docker, MLflow
Date Posted: 28 April 2025
Apply for this Job