Vision provides pretrained image analysis AI models
that allow you to locate and tag objects, text, and entire scenes in images.
Pretrained models let you use AI with no data science experience. Simply provide an image to
the Vision service and get back information about the objects, text, and scenes in the image
without having to create your own model.
Use Cases
There are a number of use cases for pretrained image analysis models.
Digital asset management
Tag digital media like images for better indexing and retrieval.
Scene monitoring
Detect if items are on retail shelves, vegetation is growing in the surveillance image
of a power line, or if trucks are available at a lot for delivery or shipment.
Supported Formats
Vision supports several image analysis formats.
Images can be uploaded either from local storage or Oracle Cloud Infrastructure Object Storage. The images can be in the following
formats:
JPG
PNG
Pretrained Models
There are three types of pretrained image analysis model with Vision.
Object detection is used to locate and identity objects within an image. For example,
if you have an image of a living room, Vision locates the
objects there, such as a chair, a sofa, and a TV. It then provides bounding boxes for each of
the objects and identifies them.
Vision provides a confidence score for each object
identified. The confidence score is a decimal number. Scores closer to 1 indicate a higher
confidence in the objects classification, while lower scores indicate a lower confidence
score. The range of the confidence score for each label is from 0 to 1.
Image classification can be used to identify scene-based features and objects in an
image. You can have one classification or many classifications, depending on the use case and
the number of items in an image. For example, if you have an image of a person running, Vision identifies the person, the clothing, and the
footwear.
Vision provides a confidence score for each label. The
confidence score is a decimal number. Scores closer to 1 indicate a higher confidence in the
label, while lower scores indicate lower confidence score. The range of the confidence score
for each label is from 0 to 1.
Vision can detect and recognize text in a
document.
Language classification identifies the language of a document, then OCR draws bounding boxes
around the printed or hand-written text it locates in an image, and digitizes the text. For
example, if you have an image of a stop sign, Vision locates
the text in that image and extracts the text STOP. It provides bounding boxes
for the identified text.
Vision provides a confidence score for each text grouping.
The confidence score is a decimal number. Scores closer to 1 indicate a higher confidence in
the extracted text, while lower scores indicate lower confidence score. The range of the
confidence score for each label is from 0 to 1.
Text Detection can be used with Document AI or Image Analysis models.
OCR support is limited to English. If you know the text in your images is in English, set the
language to Eng.
Vision provides pretrained models for customers to
extract insights about their images without needing Data Scientists.
You need the following before using a pretrained model:
A paid tenancy account in Oracle Cloud Infrastructure.
Familiarity with Oracle Cloud Infrastructure Object Storage.
You can call the pretrained Image Analysis models as a batch request using Rest APIs,
SDK, or CLI. You can call the pretrained Image Analysis models as a single request using
the Console, Rest APIs, SDK, or CLI.
See the Limits section for information on what is allowed in batch
requests.