Key Phrase Extraction

Keyword extraction is the automated process of extracting the words with the most relevance, and expressions from the input text. It helps summarize the content, and recognizes the main topics.

The key phrase extraction model uses NLP and ML to find insights related to the main points of the text. It understands the unstructured input text, and returns key words and key phrases (KPs).

The KPs consists of subjects and objects that are being talked about in the document. Any modifiers, such as adjectives associated with these subjects and objects, are also included in the output. Confidence scores for each key phrase that signify the confidence about the KP are included. Confidence scores are a value from 0 to 1.

Use Cases

Some business use cases are:

  • Brand monitoring

  • Monitoring market research

  • Competitive market analysis

  • Customer support tickets

  • Employee feedback analysis

  • Customer reviews

  • Email analysis

Supported Features

  • Key phrases

  • Confidence scores

  • Requests support single record and multi-record batches.

Supported Languages for Input Text

  • English
  • Spanish

Examples

Input Text Key Phrases
Red Bull Racing Honda, the four-time Formula-1 World 
Champion team, has chosen Oracle Cloud Infrastructure 
(OCI) as their infrastructure partner. 
Red Bull Racing Honda 0.9997
Oracle Cloud Infrastructure 0.9583
infrastructure partner 0.9583
oci 0.9979
OCI recently added new services to the existing 
compliance program including SOC, HIPAA, and ISO, to enable our customers 
to solve their use cases. We also released new technical papers and 
guidance documents related to Object Storage, the Australian Prudential 
Regulation Authority (APRA), and the Central Bank of Brazil. These 
resources help regulated customers better understand how OCI 
supports their regional and industry-specific compliance requirements. 
Not only are we expanding our number of compliance offerings and 
regulatory alignments, we continue to add regions and services at 
a faster rate.
OCI 0.9999
new services 0.9998
existing compliance program 0.9998
including SOC 0.9998
use cases 0.9998
new white papers 0.9998
guidance documents 0.9998
Object Storage 0.9998
Australian Prudential Regulation Authority 0.9998
Central Bank of Brazil 0.9998
regulated customers 0.9998
industry-specific compliance requirements 0.9998
number of compliance offerings 0.9998
regulatory alignments 0.9998
faster rate 0.9998
ISO 0.9992
customers 0.9992
apra 0.9992
resources 0.9992
services 0.8186
HIPPA 0.9979
regions 0.9147

The JSON for the first example is:

Sample Request
POST https://<region-url>/20210101/actions/batchDetectLanguageKeyPhrases
API Request format:
{
  "documents": [
    {
      "key": "doc1",
      "text": "Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner."
    }
  ]
}
Response JSON:
{
    "documents": [
        {
            "key": "1",
            "keyPhrases": [
                {
                    "text": "red bull racing honda",
                    "score": 0.9997546563973576
                },
                {
                    "text": "oracle cloud infrastructure",
                    "score": 0.9997546563973576
                },
                {
                    "text": "infrastructure partner",
                    "score": 0.9997546563973576
                },
                {
                    "text": "oci",
                    "score": 0.9979336625058923
                }
            ],
            "languageCode": "en"
        }
    ],
    "errors": []
}

Limitations

  • Key phrases that are noun phrases with adjective modifiers are identified so words that don't follow this criteria could be ignored.

  • This model is case insensitive.

  • Text that contains multiple punctuation between words might be flagged as a key phrase.

  • URLs that are well formed (begin with http, https, or www) are identified.