Evaluations
Evaluating the model performance with AI Quick Actions
With deployed models, you can create a model evaluation to evaluate its performance. You can select a dataset from Object Storage or upload one from the storage of the notebook you're working in. To upload datasets from your notebook, you must first set up policies that let the notebook session write files to Object Storage. You can label your model evaluation with an experiment name. You can either select from an existing experiment or create a new one. BERTScore, BLEU Score, Perplexity Score, Text Readability, and ROUGE are the evaluation metrics available for measuring model performance. You can save the model evaluation result in Object Storage. You can set the model evaluation parameters. Under advanced options, you can select the compute instance shape for the evaluation and optionally enter the Stop sequence. In addition, you can set up logging with your model evaluation to monitor it. Logging is optional but we recommend it to help troubleshoot errors with evaluation. You need to have the necessary policy to enable logging. For more information on logging, see Logs section. You can review the configurations and parameters of your evaluation before creating it.
If you go back to the Evaluation tab, you see the evaluation lifecycle state is
Succeeded
when the model evaluation is completed. You can view the
evaluation result and download a copy of the model evaluation report to your local machine.
Evaluations can't be run on ARM-based shapes.
For a complete list of parameters and values for AI Quick Actions CLI commands, see AI Quick Actions CLI.
This task can't be performed using the API.