dialoguekit.utils.dialogue_evaluation¶

Evaluation module.

Classes¶

Dialogue evaluator.

class dialoguekit.utils.dialogue_evaluation.Evaluator(dialogues: List[dialoguekit.core.dialogue.Dialogue], reward_config: Dict[str, Any])¶

Dialogue evaluator.

Evaluates a set of dialogues using standard metrics.

Parameters:

dialogues – A list of Dialogue objects to be evaluated.
reward_config – A dictionary with reward settings. For an example config, consult the documentation.

avg_turns() → float¶

Calculates the AvgTurns for the dialogues.

AvgTurns reflects the average number of system-user turn pairs in a list of dialogues.

user_act_ratio() → Dict[str, float]¶

Computes the UserActRatio for the dialogues.

UserActRatio per dialogue is computed as the ratio of user actions observed in the dialogue.

reward() → Dict[str, List[Dict[str, float]]]¶

Computes reward for the dialogues, according to the reward config.

Reward is used to penalize agents that do not support a set of intents defined in the config file, and long dialogues.

Returns:

{
    "missing_intents": [],
    "dialogues": [{
        "reward": int,
        "user_turns": int,
        "repeats": int,
    }]
}

Return type:

A dictionary with following structure (most important is “reward”)

Classifies dialogue-level satisfaction score.

Satisfaction is scored using a SatisfactionClassifier model. Based on last n turns, it computes a satisfaction score.