dialoguekit.utils.dialogue_evaluation¶
Evaluation module.
Classes¶
Dialogue evaluator. |
Module Contents¶
- class dialoguekit.utils.dialogue_evaluation.Evaluator(dialogues: List[dialoguekit.core.dialogue.Dialogue], reward_config: Dict[str, Any])¶
Dialogue evaluator.
Evaluates a set of dialogues using standard metrics.
- Parameters:
dialogues – A list of Dialogue objects to be evaluated.
reward_config – A dictionary with reward settings. For an example config, consult the documentation.
- avg_turns() float ¶
Calculates the AvgTurns for the dialogues.
AvgTurns reflects the average number of system-user turn pairs in a list of dialogues.
- Returns:
The computed metric as a float value.
- user_act_ratio() Dict[str, float] ¶
Computes the UserActRatio for the dialogues.
UserActRatio per dialogue is computed as the ratio of user actions observed in the dialogue.
- Returns:
A dictionary with participant and ActRatio as key-value pairs.
- reward() Dict[str, List[Dict[str, float]]] ¶
Computes reward for the dialogues, according to the reward config.
Reward is used to penalize agents that do not support a set of intents defined in the config file, and long dialogues.
- Raises:
TypeError – if utterances are not annotated.
- Returns:
{ "missing_intents": [], "dialogues": [{ "reward": int, "user_turns": int, "repeats": int, }] }
- Return type:
A dictionary with following structure (most important is “reward”)
- satisfaction(satisfaction_classifier: dialoguekit.nlu.models.satisfaction_classifier.SatisfactionClassifierSVM) List[int] ¶
Classifies dialogue-level satisfaction score.
Satisfaction is scored using a SatisfactionClassifier model. Based on last n turns, it computes a satisfaction score.
- Returns:
A list with satisfaction score for each dialogue.