dialoguekit.utils.dialogue_evaluation

Evaluation module.

Classes

Evaluator

Dialogue evaluator.

Module Contents

class dialoguekit.utils.dialogue_evaluation.Evaluator(dialogues: List[dialoguekit.core.dialogue.Dialogue], reward_config: Dict[str, Any])

Dialogue evaluator.

Evaluates a set of dialogues using standard metrics.

Parameters:
  • dialogues – A list of Dialogue objects to be evaluated.

  • reward_config – A dictionary with reward settings. For an example config, consult the documentation.

avg_turns() float

Calculates the AvgTurns for the dialogues.

AvgTurns reflects the average number of system-user turn pairs in a list of dialogues.

Returns:

The computed metric as a float value.

user_act_ratio() Dict[str, float]

Computes the UserActRatio for the dialogues.

UserActRatio per dialogue is computed as the ratio of user actions observed in the dialogue.

Returns:

A dictionary with participant and ActRatio as key-value pairs.

reward() Dict[str, List[Dict[str, float]]]

Computes reward for the dialogues, according to the reward config.

Reward is used to penalize agents that do not support a set of intents defined in the config file, and long dialogues.

Raises:

TypeError – if utterances are not annotated.

Returns:

{
    "missing_intents": [],
    "dialogues": [{
        "reward": int,
        "user_turns": int,
        "repeats": int,
    }]
}

Return type:

A dictionary with following structure (most important is “reward”)

satisfaction(satisfaction_classifier: dialoguekit.nlu.models.satisfaction_classifier.SatisfactionClassifierSVM) List[int]

Classifies dialogue-level satisfaction score.

Satisfaction is scored using a SatisfactionClassifier model. Based on last n turns, it computes a satisfaction score.

Returns:

A list with satisfaction score for each dialogue.