Evaluations

Evaluating the model performance with AI Quick Actions

With deployed models, you can create a model evaluation to evaluate its performance. You can select a dataset from Object Storage or upload one from the storage of the notebook you're working in. To upload datasets from your notebook, you must first set up policies that let the notebook session write files to Object Storage. You can label your model evaluation with an experiment name. You can either select from an existing experiment or create a new one. BERTScore, BLEU Score, Perplexity Score, Text Readability, and ROUGE are the evaluation metrics available for measuring model performance. You can save the model evaluation result in Object Storage. You can set the model evaluation parameters. Under advanced options, you can select the compute instance shape for the evaluation and optionally enter the Stop sequence. In addition, you can set up logging with your model evaluation to monitor it. Logging is optional but we recommend it to help troubleshoot errors with evaluation. You need to have the necessary policy to enable logging. For more information on logging, see Logs section. You can review the configurations and parameters of your evaluation before creating it.

If you go back to the Evaluation tab, you see the evaluation lifecycle state is Succeeded when the model evaluation is completed. You can view the evaluation result and download a copy of the model evaluation report to your local machine.

See Evaluation on GitHub for more information on, and tips about, Evaluations.
Note

Evaluations can't be run on ARM-based shapes.
    1. Under AI Quick Actions, click Evaluations.
      The Evaluations page is shown.
    2. Click Create evaluations.
    3. Enter the name of the evaluation.
    4. Select the model deployment name.
    5. (Optional) Enter a description of the evaluation.
    6. To specify a dataset, click Choose an existing dataset or Upload dataset from notebook storage.
    7. (Optional) If you clicked Choose an existing dataset in step 6, select the compartment.
    8. (Optional) If you clicked Choose an existing dataset in step 6, select the Object Storage location of the dataset.
    9. (Optional) If you clicked Choose an existing dataset in step 6, specify the Object Storage path.
    10. To specify an experiment, click Choose an existing experiment or Create a new experiment. Use experiments to group similar models together for evaluation.
    11. Optional: If you clicked Choose an existing experiment, select the experiment.
    12. Optional: If you clicked Create a new experiment:
      1. Enter the experiment name.
      2. Optional: Give the experiment a description.
    13. Specify the Object Storage bucket to store the results in.
      1. Select the compartment.
      2. Select the Object Storage location.
      3. Optional: Specify the Object Storage path.
    14. Click Next.
    15. (Optional) Under Parameters, update the model evaluation parameters from the default values.
    16. Click Show advance options.
    17. Specify the instance shape and stop sequence to use.
    18. (Optional) Under Logging, specify the log group and log to use.
    19. Click Next.
      The review page is shown for the evaluation you want to create.
    20. Click Submit to start the evaluation.
    21. When the evaluation completes, and the Lifecycle state is set to Succeeded, click the arrow next to the evaluation.
      The evaluation metrics and model parameters are shown. Click Download to download the report in HTML format.
  • For a complete list of parameters and values for AI Quick Actions CLI commands, see AI Quick Actions CLI.

  • This task can't be performed using the API.