Analyzing a Stored Video Using a Custom Model

Identify scene-based features and objects, and detect faces and label frames in a video by calling a video analysis custom model.

The maximum size and duration of each video is shown in the Limits section.

For more information about the video analysis, see the section on Stored Video Analysis.

Follow these steps to use a custom model in Vision.Metrics are available to analyze the custom model's performance.

Create the Dataset

Vision custom models are intended for users without a data science background. By creating a dataset, and instructing Vision to train a model based on the dataset, you can have a custom model ready for your scenario.

The key to building a useful custom model is preparing and training it with a good dataset. Vision supports the following dataset format:Collect a dataset that's representative of the problem and space you intend to apply the trained model on. While data from other domains might work, a dataset generated from the same intended devices, environments, and conditions of use, outperforms any other.

Data labeling is the process of identifying properties of records, such as, documents, text, and images, and annotating them with labels to identify those properties. The caption of an image and identification of an object in an image are both examples of a data label. You can use Oracle Cloud Infrastructure Data Labeling to do the data labeling. For more information, see the Data Labeling service guide. Here is an outline of the steps to take:

  1. Collect enough of images that match the distribution of the intended application.

    When choosing how many images are needed for your dataset, use as many images as you can in your training dataset. For each label to be detected, provide at least 10 images for the label. Ideally provide 50 or more images per label. The more images you provide the better the detection robustness and accuracy. Robustness is the ability to generalize to new conditions such as view angle or background.

  2. Collect a few varieties of other images to capture different camera capture angles, lighting conditions, backgrounds, and others.

    Collect a dataset that's representative of the problem and space you intend to apply the trained model on. While data from other domains might work, a dataset generated from the same intended devices, environments, and conditions of use, outperforms any other.

    Provide enough perspectives for the images, as the model uses not only the annotations to learn what is correct, but also the background to learn what is wrong. For example, provide views from different sides of the object detected, with different lighting conditions, from different image capture devices, and so on.
  3. Label all instances of the objects that occur in the sourced dataset.
    Keep the labels consistent. If you label many apples together as one apple, do so consistently in each image. Don't have space between the objects and the bounding box. The bounding boxes must closely match the objects labeled.
    Important

    Verify each of these annotations as they're important for the model's performance.

Building a Custom Model

Build custom models in Vision to extract insights from images without needing data scientists.

You need the following before building a custom model:
  • A paid tenancy account in Oracle Cloud Infrastructure.
  • Familiarity with Oracle Cloud Infrastructure Object Storage.
  • The correct policies.
  • Using the Console, learn how to create a Vision project, and how to train an image classification and object detection model.

    1. Open the navigation menu and click Analytics & AI. Under AI Services, click Vision.
    2. Create a project.
      1. From the Vision home page, under Custom Models, click Projects.
      2. Click Create project
      3. Select the compartment for the project.
      4. Enter a Name and description for the project. Avoid entering confidential information.
      5. Click Create project.
    3. In the list of projects, click the name of the project that you just created.
    4. On the project details page, click Create Model.
    5. Select the Model type to train: Image classification or Object detection.
    6. Select the training data.
      • If you don't have any annotated images, select Create a new dataset.

        You're taken to OCI Data Labeling, where you can create a dataset and add labels or draw bounding boxes over the image content. For more information, see Creating a Dataset and the section on labeling images in the Data Labeling documentation.

      • If you have an existing annotated dataset, select Choose existing dataset and then select the data source:
        • If you annotated the dataset in Data Labeling, click Data labeling service and then select the dataset.
        • If you annotated the images by using a third-party tool, click Object storage and then select the bucket that contains the images.
    7. Click Next.
    8. Enter a display name for the custom model.
    9. (Optional) Give the model a description to help you find it.
    10. Select the Training duration.
      • Recommended training Vision automatically selects the training duration to create the best model. The training might take up to 24 hours.
      • Quick training This option produces a model that's not fully optimized but is available in about an hour.
      • Custom This option lets you to set your own maximum training duration (in hours).
    11. Click Next.
    12. Review the information you provided in the previous steps. To make any changes, click Previous.
    13. When you want to start training the custom model, click Create and train.
  • Use the create command and required parameters to create a project:

    oci ai-vision project create [OPTIONS]

    Use the create command and required parameters to create a model:

    oci ai-vision model create [OPTIONS]
    For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.
  • First, run the CreateProject operation to create a project.

    Then run the CreateModel operation to create a model.

Train the Custom Model

After creating your dataset, you can train your custom model.

Train your model using one of Vision's custom model training modes. The training modes are:
  • Recommended training: Vision automatically selects the training duration to create the best model. The training might take up to 24 hours.
  • Quick training: This option produces a model that's not fully optimized but is available in about an hour.
  • Custom duration: This option lets you set your own maximum training duration.

The best training duration depends on the complexity of your detection problem, the typical number of objects in an image, the resolution, and other factors. Consider these needs, and allocate more time as the training complexity increases. The minimum amount of training time recommended is 30 minutes. A longer training time gives greater accuracy, but diminishing returns in accuracy with time. Use the quick training mode to get an idea of the smallest amount of time it takes to get a model that provides reasonable performance. Use the recommended mode to get a base optimized model. If you want a better result, increase the training time.

Call the Custom Model

Custom models can be called the same as you would call the pretrained model.

You can call the custom model to analyze images as a single request, or as a batch request. You must have done these steps first:
    1. Open the navigation menu and click Analytics & AI. Under AI Services, click Vision.
    2. On the Vision page, click Video Analysis.
    3. Select the compartment where you want to store the results.
    4. Select the location of the video:
      • Demo
      • Local file
      • Object storage
        1. (Optional) If you selected Demo, click Analyze demo video to start the analysis.
        2. (Optional) If you selected Local file:
          1. Select a bucket from the list. If the bucket is in a different compartment, click Change compartment.
          2. (Optional) Enter a prefix in the Add prefix text field.
          3. Drag the video file to the Select file area, or click select one... and browse to the image.
          4. Click Upload and analyze. The Pre-Authenticated URL for video dialog box is displayed.
          5. (Optional) Copy the URL.
          6. Click Close.
        3. If you selected Object storage, enter the video URL and click Analyze.

      The analyzeVideo API is invoked, and the model immediately analyzes the video. The status of the job is displayed.

      The Results area has tabs for each of Label detection, Object detection, Text detection, and Face detection with confidence scores, and the request and response JSON.

    5. (Optional) To stop the job running click Cancel.
    6. (Optional) To change the output location, click Change output location.
    7. (Optional) To select what is analyzed, click Video analysis capabilities, and select as appropriate from:
      • Label detection
      • Object detection
      • Text detection
      • Face detection
    8. (Optional) To generate code for video inferencing, click Code for video inferencing.
    9. (Optional) To analyze videos again, click Video job tracker, and select Recently uploaded videos from the menu.
      1. Click the video you want to analyze.
      2. Click Analyze.
    10. To see the status of a video analysis job, click Video job tracker, and select Get job status from the menu.
      1. Enter the job OCID.
      2. Click Get job status.
      3. (Optional) To stop the job running click Cancel.
      4. (Optional) To get the status of another job, click Get another video job status.
      5. (Optional) To get the JSON response, click Fetch response data.
      6. (Optional) To remove a job status, click Remove.
  • Use the analyze-video command and required parameters to classify the image:

    oci ai-vision analyze-video [OPTIONS]
    For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.
  • Run the AnalyzeVideo operation to analyze an image.

Custom Model Metrics

The following metrics are provided for custom models in Vision.

mAP@0.5 score
The mean Average Precision (mAP) score with a threshold of 0.5 is provided only for custom object detection models. calculated by taking the mean Average Precision over all classes. It ranges from 0.0 to 1.0 where 1.0 is the best result.
Precision
The fraction of relevant instances among the retrieved instances.
Recall
The fraction of relevant instances that were retrieved.
Threshold
The decision threshold to make a class prediction for the metrics.
Total images
The total number of images used for training and testing.
Test images
The number of images from the dataset that were used for testing and not used for training.
Training duration
The length of time in hours that the model was trained.