Evaluations

Evaluating the model performance with AI Quick Actions

With deployed models, you can create a model evaluation to evaluate its performance. You can select a dataset from Object Storage or upload one from the storage of the notebook you're working in. To upload datasets from your notebook, you must first set up policies that let the notebook session write files to Object Storage. You can label your model evaluation with an experiment name. You can either select from an existing experiment or create a new one. BERTScore, BLEU Score, Perplexity Score, Text Readability, and ROUGE are the evaluation metrics available for measuring model performance. You can save the model evaluation result in Object Storage. You can set the model evaluation parameters. Under advanced options, you can select the compute instance shape for the evaluation and optionally enter the Stop sequence. In addition, you can set up logging with your model evaluation to monitor it. Logging is optional but we recommend it to help troubleshoot errors with evaluation. You need to have the necessary policy to enable logging. For more information on logging, see Logs section. You can review the configurations and parameters of your evaluation before creating it.

If you go back to the Evaluation tab, you see the evaluation lifecycle state is Succeeded when the model evaluation is completed. You can view the evaluation result and download a copy of the model evaluation report to your local machine.

See Evaluation on GitHub for more information on, and tips about, Evaluations.
Note

Evaluations can't be run on ARM-based shapes.

Was this article helpful?