Troubleshooting guide¶
Troubleshoot ML Monitoring Managed containers
Jobs UI shows Service managed container name and/or version are not supported¶
User is passing invalid managed container url in the CONTAINER_CUSTOM_IMAGE variable
Ensure that the container is present on the url : dsmc://odsc-ml-monitoring-application:<version> Supported versions are listed here
Ensure the customer is using ML Monitoring Application in OC1 realm. We are not available on non-OC1 realms
Unable to read application config specified in CONFIG_LOCATION variable¶
The logs on ML job shows :
FileNotFoundError: Either the bucket named <bucket_name> does not exist in the namespace <namespace> or you are not authorized to access it
Ensure the application config is set using the environment variable: CONFIG_LOCATION
Ensure the application config is available on customer provided object storage
Ensure the URL provided is valid and exists
Ensure that ML job has Dynamic Group and Policies needs to be added for providing Object Storage access to the Job are configured appropriately Refer to the setup section.
Job run fails with not able to read baseline or prediction input dataset¶
Data reader specified in the application config does not have read permission to read from the input data location This is evident from the exception logs present in the ML job logs:
"Data reader baseline_reader read permissions": "(‘Invalid application configuration passed for : OciObjectStorageResourceValidation, Error: Read permission for file path <file_path> is unauthorized
Ensure the baseline_reader or prediction_reader section is set in the monitor config. Please see here for details
Ensure the input dataset is available on customer provided object storage
Ensure the URL provided is valid and exists
Ensure that ML job has Dynamic Group and Policies needs to be added for providing Object Storage data location specified in data reader in application config read access to the Job are configured appropriately Refer to the setup section.
Ensure the right reader type is set Supported reader type are here
One should run the ML job with action type as RUN_CONFIG_VALIDATION to ensure that all the configs are correct , If all the configs are correct then Job run is successful
Output json is not generated however job run is successful¶
Output json is not generated however job run is successful is due to missing Postprocessor Configuration in Application config
Ensure that a valid post processor is configured in the application config
Supported post processor SaveMetricOutputAsJsonPostProcessor
Namespace and bucket_name are mandatory parameters
One should run the ML job with action type as RUN_CONFIG_VALIDATION to ensure that all the configs are correct , If all the configs are correct then Job run is successful
Miscellaneous Run failures¶
Multiple issues that can cause runtime failures are as follows:
ApplicationActionType is not valid
The Value for required parameter ACTION_TYPE is empty
DATE_RANGE configured is not valid Refer to the Input Contracts
Valid Monitor id should be present in the application config
In valid storage details are provided
Valid Metric/transormer/InputScema/Postprocessor Configuration should be present
Possible solution are as follows :
One should run the ML job with action type as RUN_CONFIG_VALIDATION to ensure that all the configs are correct , If all the configs are correct then Job run is successful
Refer to the Input Contracts