Join our dynamic team for a fully remote opportunity, with occasional meetings in San Francisco for collaboration. At Datagrid, we believe everyone deserves their own personal army of AI helpers with deep access to company data to automate any task. Our technology ingests business data continuously from 100+ sources, streamlining processes like categorizing thousands of support tickets in just minutes.
We are a growing Series-A startup based in San Francisco but operate as a distributed company. We offer competitive salaries, comprehensive health benefits, equity, and a strong commitment to work/life balance.
Become part of our close-knit team that delivers quickly and innovates within the AI space. Recently, our agents have learned to navigate Microsoft Teams, write SQL queries, and automate intricate tasks on complex schedules, such as scheduling at 9:30 AM on Mondays, Wednesdays, and Fridays. Our Agents integrate seamlessly into platforms like Slack and Microsoft Teams, taking meaningful actions, including generating safety reports from worksite photos.
Key Responsibilities Your role will involve:
- Collaborating closely with an experienced team member to develop a framework for evaluating Agent performance, making it accessible for local development and CI/CD pipelines, and establishing alert mechanisms for Agent inconsistencies.
- Influencing and enhancing the capabilities of Datagrid's Agents.
- Selecting the most suitable open and closed source components to build a robust testing infrastructure.
- Integrating publicly available benchmarks like RAGBench into our testing framework.
- Empowering subject matter experts to contribute to the test library by utilizing customer queries, manually created cases, and synthetically generated questions.
- Tracking and exposing evaluation performance to monitor improvement over time.
Desired Qualifications - Proven experience in building test harnesses for Chat Agents from the ground up.
- Ten or more years of B2B software engineering experience.
- Ability to write effective LLM prompts independently.
- Proficiency in Node.js and server-side frameworks like NestJS or NextJS.
- Familiarity with JavaScript frameworks such as React or AngularJS.
- Experience with databases such as Weaviate and BigQuery.
- Background in working with GCP or similar cloud platforms.
Compensation: Salary Range: $200k - $240k
Equity included
Comprehensive medical, dental, and vision coverage fully provided
401k plan available
All candidates will be asked to tackle the following interview question: Work with me to design a system to evaluate the Agent's performance at SQL queries. While we don't expect a perfect answer, we value your ability to articulate your thought process.