Analyzing a Stored Video Using a Custom Model
Identify scene-based features and objects, and detect faces and label frames in a video by calling a video analysis custom model.
The maximum size and duration of each video is shown in the Limits section.
For more information about the video analysis, see the section on Stored Video Analysis.
Create the Dataset
Vision custom models are intended for users without a data science background. By creating a dataset, and instructing Vision to train a model based on the dataset, you can have a custom model ready for your scenario.
Data labeling is the process of identifying properties of records, such as, documents, text, and images, and annotating them with labels to identify those properties. The caption of an image and identification of an object in an image are both examples of a data label. You can use Oracle Cloud Infrastructure Data Labeling to do the data labeling. For more information, see the Data Labeling service guide. Here is an outline of the steps to take:
- Collect enough of images that match the distribution of the intended
application.
When choosing how many images are needed for your dataset, use as many images as you can in your training dataset. For each label to be detected, provide at least 10 images for the label. Ideally provide 50 or more images per label. The more images you provide the better the detection robustness and accuracy. Robustness is the ability to generalize to new conditions such as view angle or background.
- Collect a few varieties of other images to capture different camera capture
angles, lighting conditions, backgrounds, and others.
Collect a dataset that's representative of the problem and space you intend to apply the trained model on. While data from other domains might work, a dataset generated from the same intended devices, environments, and conditions of use, outperforms any other.
Provide enough perspectives for the images, as the model uses not only the annotations to learn what is correct, but also the background to learn what is wrong. For example, provide views from different sides of the object detected, with different lighting conditions, from different image capture devices, and so on. - Label all instances of the objects that occur in the sourced dataset.Keep the labels consistent. If you label many apples together as one apple, do so consistently in each image. Don't have space between the objects and the bounding box. The bounding boxes must closely match the objects labeled.Important
Verify each of these annotations as they're important for the model's performance.
Building a Custom Model
Build custom models in Vision to extract insights from images without needing data scientists.
- A paid tenancy account in Oracle Cloud Infrastructure.
- Familiarity with Oracle Cloud Infrastructure Object Storage.
- The correct policies.
Using the Console, learn how to create a Vision project, and how to train an image classification and object detection model.
Use the create command and required parameters to create a project:
oci ai-vision project create [OPTIONS]
Use the create command and required parameters to create a model:
For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.oci ai-vision model create [OPTIONS]
First, run the CreateProject operation to create a project.
Then run the CreateModel operation to create a model.
Train the Custom Model
After creating your dataset, you can train your custom model.
- Recommended training: Vision automatically selects the training duration to create the best model. The training might take up to 24 hours.
- Quick training: This option produces a model that's not fully optimized but is available in about an hour.
- Custom duration: This option lets you set your own maximum training duration.
The best training duration depends on the complexity of your detection problem, the typical number of objects in an image, the resolution, and other factors. Consider these needs, and allocate more time as the training complexity increases. The minimum amount of training time recommended is 30 minutes. A longer training time gives greater accuracy, but diminishing returns in accuracy with time. Use the quick training mode to get an idea of the smallest amount of time it takes to get a model that provides reasonable performance. Use the recommended mode to get a base optimized model. If you want a better result, increase the training time.
Call the Custom Model
Custom models can be called the same as you would call the pretrained model.
Use the analyze-video command and required parameters to classify the image:
For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.oci ai-vision analyze-video [OPTIONS]
Run the AnalyzeVideo operation to analyze an image.
Custom Model Metrics
The following metrics are provided for custom models in Vision.
- mAP@0.5 score
- The mean Average Precision (mAP) score with a threshold of 0.5 is provided only for custom object detection models. calculated by taking the mean Average Precision over all classes. It ranges from 0.0 to 1.0 where 1.0 is the best result.
- Precision
- The fraction of relevant instances among the retrieved instances.
- Recall
- The fraction of relevant instances that were retrieved.
- Threshold
- The decision threshold to make a class prediction for the metrics.
- Total images
- The total number of images used for training and testing.
- Test images
- The number of images from the dataset that were used for testing and not used for training.
- Training duration
- The length of time in hours that the model was trained.