Remote
Part time

QA Lead – AI Conversational Systems

specialize in analyzing conversation traces, identifying breakdowns, and building automated evaluations to ensure the AI performs at its best.

Job description

AI assistants are only as good as their reliability. We are building Ink'd, an AI real estate assistant that automates contract workflows; but to win trust, it must perform with accuracy and speed.

We’re hiring a QA Lead who will specialize in analyzing conversation traces, identifying breakdowns, and building automated evaluations to ensure the AI performs at its best.

What You’ll Do

  • Review conversation traces to detect failures in data extraction, misunderstandings, and latency issues.
  • Lead both manual QA processes and automated evaluation pipelines.
  • Build frameworks for LLM-as-judge evaluations and automated test cases.
  • Create dashboards and metrics to track AI performance (accuracy, latency, user satisfaction proxies).
  • Collaborate with engineering to proactively resolve recurring issues.

What We’re Looking For

  • 5+ years of QA experience (AI/NLP preferred).
  • Strong skills in both manual and automated QA.
  • Familiarity with Python scripting and test automation.
  • Ability to create metrics dashboards (Grafana, Kibana, custom).
  • Analytical, detail-oriented, and passionate about reliability.

Bonus Points

  • Familiarity with LangChain/LangGraph tracing tools.
  • Experience with synthetic data generation for QA.
  • Background in human-in-the-loop evaluation systems.

Job requirements

  • Core Technical Requirements
    • Experience in QA for AI/ML products (LLMs, conversational AI, NLP systems).
    • Ability to analyze conversation traces to identify:
      • Failures in data extraction.
      • Misunderstandings of user requests.
      • Pipeline slowdowns or latency issues.
    • Strong background in manual QA processes.
    • Ability to create automated evaluation frameworks:
      • LLM-as-judge style evaluation.
      • Automated metrics collection for accuracy, latency, and conversation quality.
    • Experience building dashboards (Langfuse, Grafana, Kibana, or custom) with KPIs such as:
      • Success metrics
      • Extraction accuracy.
      • User satisfaction proxies.
      • Latency breakdowns across pipeline stages.
    Soft Skills
    • Highly analytical mindset; able to spot subtle issues in AI behavior.
    • Strong communication skills to document findings and guide engineers.
    • Detail-oriented, with a passion for system reliability and accuracy.
    • Proactive in suggesting improvements, not just reporting bugs.
    Bonus Skills
    • Familiarity with LangChain/LangGraph tracing tools.
    • Experience with synthetic data generation for test coverage.
    • Knowledge of CI/CD integration for test automation pipelines.
    • Understanding of human-in-the-loop evaluation processes.
    Experience
    • 5+ years in QA, ideally with AI/NLP systems.
    • Strong experience in both manual and automated QA processes.
  • Posted on: 
    Aug 28, 2025