Rubrics tasks are a specific type of Evaluation task where models are assessed based on a dynamic rubric, which is defined by the contributor during the task. These tasks provide quantitative and qualitative insight into how well a model can follow arbitrary criteria.

Overview

Each rubrics task is designed to test specific aspects of model performance, from basic comprehension to complex reasoning abilities, across natural human defined criteria.

Task Structure

A Rubrics task consists of:

  • A prompt (question or instruction)
  • A list of model evaluation criteria (contributor defined based on the prompt)
  • Multiple model generated responses
  • Human evaluation of the model responses against the criteria

Additional annotations can be included when necessary for standard evaluation of the responses.

Messages

Each turn consists of sequential messages that represent a user prompt, multiple model responses, and corresponding human evaluation of the model responses against the criteria.

User message

Contains the initial prompt or instruction (role: user).

{
  "content": {
    "text": "Put the following data into a table and sort by name:\n\nSue,$20\nJared,$40\nMike,$60"
  },
  "role": "user",
  "source_id": "user",
  "annotations": []
}

Model Responses

Contains responses from different models (role: assistant).

{
  "content": {
    "text": "This is model 1 response"
  },
  "role": "assistant",
  "source_id": "model_1",
  "annotations": [
    {
      "id": "rubric_0_criteria_0_rating",
      "title": "The model must respond with a formatted table.",
      "value": "no_issues",
      "metadata": { "criteria": "rubric_0_criteria_0" }
    },
    {
      "id": "rubric_0_criteria_1_rating",
      "title": "The response must sort the names in alphabetical order.",
      "value": "major_issues",
      "metadata": { "criteria": "rubric_0_criteria_1" }
    },
    {
      "id": "rubric_0_criteria_2_rating",
      "title": "The response must include bold headers.",
      "value": "minor_issues",
      "metadata": { "criteria": "rubric_0_criteria_2" }
    }
  ]
}

Each model response includes a source_id that uniquely identifies the model that generated the response.

Message Annotations

Each model response is evaluated across the contributor-defined criteria. For example:

  • “The response must display information in a table.”
  • “The response must sort the names in alphabetical order.”
  • “The response must use metric units.”

Message’s annotations include the evaluation results for each dimension.

[
  {
    "id": "rubric_0_criteria_0_rating",
    "title": "The model must respond with a formatted table.",
    "value": "no_issues",
    "metadata": {
      "criteria": "rubric_0_criteria_0" // rubric criteria defined at turn-level annotation
    }
  }
]

The rubric criteria is unique to the prompt and is different per task. Additional evaluation dimensions can be included based on project requirements and evaluation objectives.

Turn-Level Annotations

The annotations at the turn level, specifies contributor-defined rubrics, preference related or aggregated information. Some common examples are:

  • Selected model identifier: selected_model_id
  • Likert scale rating: likert_value
  • Detailed justification for the selection: justification
  • Any other comparative analysis between model responses
[
  {
    "key": "selected_model_id",
    "value": "model_2"
  },
  {
    "key": "rubric_0_criteria_0",
    "title": "The model must respond with a formatted table.",
    "value": "objective",
  },
  {
    "key": "rubric_0_criteria_1",
    "title": "The response must sort the names in alphabetical order.",
    "value": "objective",
  },
  {
    "key": "rubric_0_criteria_2",
    "title": "The response must include bold headers.",
    "value": "implicit",
  }
]

Expanded Rubrics Task Output

This is a sample expanded sample Rubrics Task output returned by /v2/task.

Sample Rubrics Task Output
{
  "task_id": "task_123",
  "project": "project_123",
  "batch": "batch_123",
  "status": "completed",
  "created_at": "2025-01-01T08:31:03.169Z",
  "completed_at": "2025-01-02T04:00:39.923Z",
  "threads": [
    {
      "id": "thread_0",
      "turns": [
        {
          "id": "turn_0",
          "messages": [
            {
              "content": {
                "text": "Put the following data into a table and sort by name:\n\nSue,$20\nJared,$40\nMike,$60"
              },
              "role": "user",
              "source_id": "user",
              "annotations": []
            },
            {
              "content": {
                "text": "This is model 1 response"
              },
              "role": "assistant",
              "source_id": "model_1",
              "annotations": [
                {
                  "id": "rubric_0_criteria_0_rating",
                  "title": "The model must respond with a formatted table.",
                  "value": "no_issues",
                  "metadata": { "criteria": "rubric_0_criteria_0" }
                },
                {
                  "id": "rubric_0_criteria_1_rating",
                  "title": "The response must sort the names in alphabetical order.",
                  "value": "major_issues",
                  "metadata": { "criteria": "rubric_0_criteria_1" }
                },
                {
                  "id": "rubric_0_criteria_2_rating",
                  "title": "The response must include bold headers.",
                  "value": "minor_issues",
                  "metadata": { "criteria": "rubric_0_criteria_2" }
                }
              ]
            },
            {
              "content": {
                "text": "This is model 2 response"
              },
              "role": "assistant",
              "source_id": "model_2",
              "annotations": [
                {
                  "id": "rubric_0_criteria_0_rating",
                  "title": "The model must respond with a formatted table.",
                  "value": "no_issues",
                  "metadata": { "criteria": "rubric_0_criteria_0" }
                },
                {
                  "id": "rubric_0_criteria_1_rating",
                  "title": "The response must sort the names in alphabetical order.",
                  "value": "no_issues",
                  "metadata": { "criteria": "rubric_0_criteria_1" }
                },
                {
                  "id": "rubric_0_criteria_2_rating",
                  "title": "The response must include bold headers.",
                  "value": "major_issues",
                  "metadata": { "criteria": "rubric_0_criteria_2" }
                }
              ]
            }
          ],
          "annotations": [
            {
              "key": "selected_model_id",
              "value": "model_2"
            },
            {
              "key": "rubric_0_criteria_0",
              "title": "The model must respond with a formatted table.",
              "value": "objective",
            },
            {
              "key": "rubric_0_criteria_1",
              "title": "The response must sort the names in alphabetical order.",
              "value": "objective",
            },
            {
              "key": "rubric_0_criteria_2",
              "title": "The response must include bold headers.",
              "value": "implicit",
            }
          ]
        }
      ],
      "annotations": []
    }
  ]
}