Named Entity Recognition

Named Entity Recognition (NER) detects named entities in text.

The NER model uses natural language processing to find a variety of named entities. For each entity extracted, NER also returns the location of the entity extracted (offset and length), and a confidence score, which is a value 0 to 1.

Supported Languages for Input Text

  • English
  • Spanish

Use Cases

You could use the NER endpoint effectively in these scenarios:

Classifying content for news providers

It can be difficult to classify and categorize news article content. The NER model can automatically scan articles to identify the major people, organizations, and places in them. The extracted entities can be saved as tags with the related articles. Knowing the relevant tags for each article helps with automatically categorizing the articles in defined hierarchies, and content discovery.

Customer support

Recognizing relevant entities in customer complaints and feedback, product specifications, department details, or company branch details, helps to classify the feedback appropriately. The entities can then be forwarded to the person responsible for the identified product.

Similarly, there could be feedback tweets where you can categorize them all based on their locations, and the products mentioned.

Efficient search algorithms

You could use NER to extract entities that are then searched against the query, instead of searching for a query across the millions of articles and websites online. When run on articles, all the relevant entities associated with each article are extracted and stored separately. This separation could speed up the search process considerably. The search term is only matched with a small list of entities in each article, leading to quick and efficient searches.

It can be used for searching content from millions of research papers, Wikipedia articles, blogs, articles, and so on.

Content recommendations

Extracting entities from a particular article, and recommending the other articles that have the most similar entities mentioned in them is possible with NER. For example, it can be used effectively to develop content recommendations for a media industry client. It enables the extraction of the entities associated with historical content or previous activities. NER compares them with the label assigned to other unseen content to filter relevant entities.

Automatically summarizing job candidates

The NER model could facilitate the evaluation of job candidates, by simplifying the effort required to shortlist candidates with numerous applications. Recruiters could filter and categorize them based on identified entities like location, college degrees, employers, skills, designations, certifications, and patents.

Supported Entities

The following table describes the different entities that NER can extract. The entity type and subtype depends on the API that you call (detectDominantLanguageEntities or batchDetectDominantLanguageEntities).

Note

To maintain backward compatibility, the detectDominantLanguageEntities wasn't modified when we introduced the concept of subtype. We recommend that you use the batchDetectDominantLanguageEntities endpoint because the service uses types and subtypes. The isPii property was dropped to introduce the batching API so you can compute it with the supported entity types as in the following table.

Entity (Full Name) Entity Type (In Prediction) Entity Subtype (In prediction) Single Record API / Batch API (if blank, both APIs are consistent) Is PII Description
DATE DATE Single record

X

Absolute or relative dates, periods, and date range.

Examples:

"10th of June",

"third Friday in August"

"the first week of March"

DATETIME DATE Batch
EMAIL EMAIL
EVENT EVENT Χ Named hurricanes, sports events, and so on.
FACILITY FACILITY Single record Χ Buildings, airports, highways, bridges, and so on.
LOCATION FACILITY Batch
GEOPOLITICAL ENTITY GPE Single record Χ Countries, cities, and states.
LOCATION GPE Batch
IP ADDRESS IPADDRESS IP address according to IPv4 and IPv6 standards.
LANGUAGE LANGUAGE Χ Any named language.
LOCATION LOCATION Χ Non-GPE locations, mountain ranges, bodies of water.
CURRENCY MONEY Single record

X

Monetary values, including the unit.
QUANTITY CURRENCY Batch
NATIONALITIES, 
RELIGIOUS and 
POLITICAL GROUPS
NORP Χ Nationalities, religious or political groups.
ORGANIZATION ORG Χ Companies, agencies, institutions, and so on.
PERCENTAGE PERCENT Single record Χ Percentage.
QUANTITY PERCENTAGE Batch
PERSON PERSON People, including fictional characters.
PHONENUMBER PHONE_NUMBER

Supported phone numbers:

("GB") - United Kingdom
("AU") - Australia 
("NZ") - New Zealand 
("SG") - Singapore 
("IN") - India
("US")  - United States
PRODUCT PRODUCT Χ Vehicles, tools, foods, and so on (not services).
NUMBER QUANTITY Single record Χ Measurements, as weight or distance.
QUANTITY NUMBER Batch X
TIME TIME Single record

Χ

Anything less than 24 hours (time, duration, and so on).
DATETIME TIME Batch
URL URL URL.

Examples

Input Text Entities and Scores
Red Bull Racing Honda, the four-time Formula-1 World 
Champion team, has chosen Oracle Cloud Infrastructure 
(OCI) as their infrastructure partner.
Red Bull Racing Honda [ORG] 1.0000
four-time [QUANTITY/NUMBER] 1.0000
Formula-1 World [EVENT] 0.9705
Oracle Cloud Infrastructure (OCI [ORG] 0.9811
OCI recently added new services to the existing 
compliance program including SOC, HIPAA, and ISO, to enable our customers 
to solve their use cases. We also released new technical papers and 
guidance documents related to Object Storage, the Australian Prudential 
Regulation Authority (APRA), and the Central Bank of Brazil. These 
resources help regulated customers better understand how OCI 
supports their regional and industry-specific compliance requirements. 
Not only are we expanding our number of compliance offerings and 
regulatory alignments, we continue to add regions and services at 
a faster rate.
OCI [ORG] 1.0000
SOC [ORG] 1.0000
HIPAA [ORG] 1.0000
ISO [ORG] 1.0000
Australian Prudential Regulation Authority [ORG] 1.0000
Central Bank of Brazil [ORG] 0.9998
OCI [ORG] 1.0000

The JSON for the first example is:

Sample Request
POST https://<region-url>/20210101/actions/batchDetectLanguageEntities
API Request format:
"{
    "documents": [
       

{             "key": "doc1",             "text": " Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner."         }
    ]
}"
Response JSON:
{
    "documents": [
        {
            "key": "1",
            "entities": [
                {
                    "offset": 0,
                    "length": 15,
                    "text": "Red Bull Racing",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.9914557933807373,
                    "metaInfo": null
                },
                {
                    "offset": 16,
                    "length": 5,
                    "text": "Honda",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.6515499353408813,
                    "metaInfo": null
                },
                {
                    "offset": 27,
                    "length": 9,
                    "text": "four-time",
                    "type": "QUANTITY",
                    "subType": null,
                    "score": 0.9998091459274292,
                    "metaInfo": [
                        {
                            "offset": 27,
                            "length": 9,
                            "text": "four-time",
                            "subType": "UNIT",
                            "score": 0.9998091459274292
                        }
                    ]
                },
                {
                    "offset": 47,
                    "length": 5,
                    "text": "World",
                    "type": "LOCATION",
                    "subType": "NON_GPE",
                    "score": 0.5825434327125549,
                    "metaInfo": null
                },
                {
                    "offset": 79,
                    "length": 27,
                    "text": "Oracle Cloud Infrastructure",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.998045802116394,
                    "metaInfo": null
                },
                {
                    "offset": 108,
                    "length": 3,
                    "text": "OCI",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.9986366033554077,
                    "metaInfo": null
                }
            ],
            "languageCode": "en"
        }
    ],
    "errors": []
}

Limitations

  • Sometimes, entities might not be separated or combined as you expect.

  • NER uses the context of the sentence to identify entities. If the context isn't present in the text processed, entities might not be extracted as you expect.

  • Malformed text (structure and semantics) might reduce the performance.

  • Age isn't a separate entity so age-related periods might be identified as a date entity.