Reinforcement Learning from Human Feedback
user
).
assistant
). Optionally includes rewrite
annotations to store human rewritten responses.
source_id
that uniquely identifies the model
that generated the response.rewrite
annotation is added to the list.
Message’s annotations
include the ratings for each dimension.
annotations
at the turn level, specifies preference related or aggregated information. Some common examples are:
selected_model_id
likert_value
justification
/v2/task
.