RLHF

RLHF tasks are structured datasets used to train and fine-tune Large Language Models (LLMs) using human feedback. These tasks help align model behavior with human preferences by providing paired responses with clear preference indicators.

Overview

RLHF tasks consist of prompts and multiple model-generated responses, where human annotators provide explicit preferences between responses. This feedback helps models learn which outputs are more desirable according to human judgment across various dimensions such as Helpfulness, Accuracy, Safety, Writing quality, and Task completion.

Task Structure

An RLHF task contains:

A prompt (question or instruction)
Multiple model generated responses
Human preference annotations
Detailed feedback and justification

RLHF tasks can be followed by an SFT stage where the preferred model’s response is rewritten

Messages

Each turn consists of sequential messages that represent a user prompt, multiple model responses, and corresponding human preference annotations.

User message

Contains the initial prompt or instruction (role: user).

{
  "content": {
    "text": "This is a test prompt"
  },
  "role": "user",
  "source_id": "user",
  "annotations": []
}

Model Responses

Contains responses from different models (role: assistant). Optionally includes rewrite annotations to store human rewritten responses.

[
  {
    "content": {
      "text": "This is model 1 response"
    },
    "role": "assistant",
    "source_id": "model_1",
    "annotations": [
      { "key": "instruction_following", "value": 1 },
      { "key": "truthfulness", "value": 1 },
      {
        "key": "truthfulnes_justiifcation",
        "value": "Justification for the rating"
      },
      { "key": "factuality", "value": 2 },
      { "key": "tone", "value": 2 }
    ]
  },
  {
    "content": {
      "text": "This is model 2 response"
    },
    "role": "assistant",
    "source_id": "model_2",
    "annotations": [
      { "key": "instruction_following", "value": 3 },
      { "key": "truthfulness", "value": 3 },
      { "key": "factuality", "value": 2 },
      { "key": "tone", "value": 2 },
      {
        "key": "rewrite",
        "value": "Model 2 rewritten response"
      }
    ]
  }
]

Each model response includes a source_id that uniquely identifies the model that generated the response.

Message Annotations

Each model response is evaluated across multiple dimensions, which may include:

Instruction following
Truthfulness
Factuality
Tone

If the model response is rewritten, the rewrite annotation is added to the list.

Message’s annotations include the ratings for each dimension.

The rating dimensions are flexible and can be customized based on project requirements and objectives.

Turn-Level Annotations

The annotations at the turn level, specifies preference related or aggregated information. Some common examples are:

Selected model identifier: selected_model_id
Likert scale rating: likert_value
Detailed justification for the selection: justification
Any other comparative analysis between model responses

[
  {
    "key": "selected_model_id",
    "value": "base_model"
  },
  {
    "key": "likert_value",
    "value": 2
  },
  {
    "key": "justification",
    "value": "@Response 1 is better than @Response 2. @Response 2 has an issue in Truthfulness ..."
  }
]

Expanded RLHF Task Output

This is a sample expanded sample RLHF Task output returned by /v2/task.

Sample RLHF Task Output

{
  "task_id": "task_123",
  "project": "project_123",
  "batch": "batch_123",
  "status": "completed",
  "created_at": "2025-01-01T08:31:03.169Z",
  "completed_at": "2025-01-02T04:00:39.923Z",
  "threads": [
    {
      "id": "thread_0",
      "turns": [
        {
          "id": "turn_0",
          "messages": [
            {
              "content": {
                "text": "This is a test prompt"
              },
              "role": "user",
              "source_id": "user",
              "annotations": []
            },
            {
              "content": {
                "text": "This is model 1 response"
              },
              "role": "assistant",
              "source_id": "model_1",
              "annotations": [
                { "key": "instruction_following", "value": 1 },
                { "key": "truthfulness", "value": 1 },
                {
                  "key": "truthfulnes_justiifcation",
                  "value": "Justification for the rating"
                },
                { "key": "factuality", "value": 2 },
                { "key": "tone", "value": 2 }
              ]
            },
            {
              "content": {
                "text": "This is model 2 response"
              },
              "role": "assistant",
              "source_id": "model_2",
              "annotations": [
                { "key": "instruction_following", "value": 3 },
                { "key": "truthfulness", "value": 3 },
                { "key": "factuality", "value": 2 },
                { "key": "tone", "value": 2 },
                {
                  "key": "rewrite",
                  "value": "Model 2 rewritten response"
                }
              ]
            }
          ],
          "annotations": [
            {
              "key": "selected_model_id",
              "value": "model_2"
            },
            {
              "key": "likert_value",
              "value": 6
            },
            {
              "key": "justification",
              "value": "@Response 2 is much better than @Response 1 as @Response 1 made a mistake in ..."
            }
          ]
        }
      ],
      "annotations": []
    }
  ]
}

Get Started

API Reference

Project Archetypes

Endpoints

Overview

Task Structure

Messages

User message

Model Responses

Message Annotations

Turn-Level Annotations

Expanded RLHF Task Output

Get Started

API Reference

Project Archetypes

Endpoints

​Overview

​Task Structure

​Messages

​User message

​Model Responses

​Message Annotations

​Turn-Level Annotations

​Expanded RLHF Task Output

Overview

Task Structure

Messages

User message

Model Responses

Message Annotations

Turn-Level Annotations

Expanded RLHF Task Output