Named Entity Recognition

Named Entity Recognition (NER) detects named entities in text.

The NER model uses natural language processing to find a variety of named entities. For each entity extracted, NER also returns the location of the entity extracted (offset and length), and a confidence score, which is a value 0 to 1.

Supported Languages for Input Text

English
Spanish

Use Cases

You could use the NER endpoint effectively in these scenarios:

Classifying content for news providers

It can be difficult to classify and categorize news article content. The NER model can automatically scan articles to identify the major people, organizations, and places in them. The extracted entities can be saved as tags with the related articles. Knowing the relevant tags for each article helps with automatically categorizing the articles in defined hierarchies, and content discovery.

Customer support

Recognizing relevant entities in customer complaints and feedback, product specifications, department details, or company branch details, helps to classify the feedback appropriately. The entities can then be forwarded to the person responsible for the identified product.

Similarly, there could be feedback tweets where you can categorize them all based on their locations, and the products mentioned.

Efficient search algorithms

You could use NER to extract entities that are then searched against the query, instead of searching for a query across the millions of articles and websites online. When run on articles, all the relevant entities associated with each article are extracted and stored separately. This separation could speed up the search process considerably. The search term is only matched with a small list of entities in each article, leading to quick and efficient searches.

It can be used for searching content from millions of research papers, Wikipedia articles, blogs, articles, and so on.

Content recommendations

Extracting entities from a particular article, and recommending the other articles that have the most similar entities mentioned in them is possible with NER. For example, it can be used effectively to develop content recommendations for a media industry client. It enables the extraction of the entities associated with historical content or previous activities. NER compares them with the label assigned to other unseen content to filter relevant entities.

Automatically summarizing job candidates

The NER model could facilitate the evaluation of job candidates, by simplifying the effort required to shortlist candidates with numerous applications. Recruiters could filter and categorize them based on identified entities like location, college degrees, employers, skills, designations, certifications, and patents.

Supported Entities

The following table describes the different entities that NER can extract. The entity type and subtype depends on the API that you call (detectDominantLanguageEntities or batchDetectDominantLanguageEntities).

Note

To maintain backward compatibility, the detectDominantLanguageEntities wasn't modified when we introduced the concept of subtype. We recommend that you use the batchDetectDominantLanguageEntities endpoint because the service uses types and subtypes. The isPii property was dropped to introduce the batching API so you can compute it with the supported entity types as in the following table.


Entity (Full Name)	Entity Type (In Prediction)	Entity Subtype (In prediction)	Single Record API / Batch API (if blank, both APIs are consistent)	Is PII	Description
`DATE`	`DATE`		Single record	X	Absolute or relative dates, periods, and date range. Examples: "10^th of June", "third Friday in August" "the first week of March"
`DATE`	`DATETIME`	`DATE`	Batch	X
`EMAIL`	`EMAIL`			√
`EVENT`	`EVENT`			Χ	Named hurricanes, sports events, and so on.
`FACILITY`	`FACILITY`		Single record	Χ	Buildings, airports, highways, bridges, and so on.
`FACILITY`	`LOCATION`	`FACILITY`	Batch	Χ	Buildings, airports, highways, bridges, and so on.
`GEOPOLITICAL ENTITY`	`GPE`		Single record	Χ	Countries, cities, and states.
`GEOPOLITICAL ENTITY`	`LOCATION`	`GPE`	Batch	Χ	Countries, cities, and states.
`IP ADDRESS`	`IPADDRESS`			√	IP address according to IPv4 and IPv6 standards.
`LANGUAGE`	`LANGUAGE`			Χ	Any named language.
`LOCATION`	`LOCATION`			Χ	Non-GPE locations, mountain ranges, bodies of water.
`CURRENCY`	`MONEY`		Single record	X	Monetary values, including the unit.
`CURRENCY`	`QUANTITY`	`CURRENCY`	Batch	X	Monetary values, including the unit.
`NATIONALITIES, RELIGIOUS and POLITICAL GROUPS`	`NORP`			Χ	Nationalities, religious or political groups.
`ORGANIZATION`	`ORG`			Χ	Companies, agencies, institutions, and so on.
`PERCENTAGE`	`PERCENT`		Single record	Χ	Percentage.
`PERCENTAGE`	`QUANTITY`	`PERCENTAGE`	Batch	Χ	Percentage.
`PERSON`	`PERSON`			√	People, including fictional characters.
`PHONENUMBER`	`PHONE_NUMBER`			√	Supported phone numbers: `("GB") - United Kingdom` `("AU") - Australia` `("NZ") - New Zealand` `("SG") - Singapore` `("IN") - India` `("US") - United States`
`PRODUCT`	`PRODUCT`			Χ	Vehicles, tools, foods, and so on (not services).
`NUMBER`	`QUANTITY`		Single record	Χ	Measurements, as weight or distance.
`NUMBER`	`QUANTITY`	`NUMBER`	Batch	X	Measurements, as weight or distance.
`TIME`	`TIME`		Single record	Χ	Anything less than 24 hours (time, duration, and so on).
`TIME`	`DATETIME`	`TIME`	Batch	Χ	Anything less than 24 hours (time, duration, and so on).
`URL`	`URL`			√	URL.

Examples

Input Text Entities and Scores

Input Text	Entities and Scores
`Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner.`	`Red Bull Racing Honda [ORG] 1.0000 four-time [QUANTITY/NUMBER] 1.0000 Formula-1 World [EVENT] 0.9705 Oracle Cloud Infrastructure (OCI [ORG] 0.9811`
OCI recently added new services to the existing compliance program including SOC, HIPAA, and ISO, to enable our customers to solve their use cases. We also released new technical papers and guidance documents related to Object Storage, the Australian Prudential Regulation Authority (APRA), and the Central Bank of Brazil. These resources help regulated customers better understand how OCI supports their regional and industry-specific compliance requirements. Not only are we expanding our number of compliance offerings and regulatory alignments, we continue to add regions and services at a faster rate.	`OCI [ORG] 1.0000 SOC [ORG] 1.0000 HIPAA [ORG] 1.0000 ISO [ORG] 1.0000 Australian Prudential Regulation Authority [ORG] 1.0000 Central Bank of Brazil [ORG] 0.9998 OCI [ORG] 1.0000`

Red Bull Racing Honda, the four-time Formula-1 World 
Champion team, has chosen Oracle Cloud Infrastructure 
(OCI) as their infrastructure partner.

Red Bull Racing Honda [ORG] 1.0000
four-time [QUANTITY/NUMBER] 1.0000
Formula-1 World [EVENT] 0.9705
Oracle Cloud Infrastructure (OCI [ORG] 0.9811

OCI recently added new services to the existing 
compliance program including SOC, HIPAA, and ISO, to enable our customers 
to solve their use cases. We also released new technical papers and 
guidance documents related to Object Storage, the Australian Prudential 
Regulation Authority (APRA), and the Central Bank of Brazil. These 
resources help regulated customers better understand how OCI 
supports their regional and industry-specific compliance requirements. 
Not only are we expanding our number of compliance offerings and 
regulatory alignments, we continue to add regions and services at 
a faster rate.

OCI [ORG] 1.0000
SOC [ORG] 1.0000
HIPAA [ORG] 1.0000
ISO [ORG] 1.0000
Australian Prudential Regulation Authority [ORG] 1.0000
Central Bank of Brazil [ORG] 0.9998
OCI [ORG] 1.0000

The JSON for the first example is:

Sample Request

POST https://<region-url>/20210101/actions/batchDetectLanguageEntities

API Request format:

"{
    "documents": [
       

{             "key": "doc1",             "text": " Red Bull Racing Honda, the four-time Formula-1 World Champion team, has chosen Oracle Cloud Infrastructure (OCI) as their infrastructure partner."         }
    ]
}"

Response JSON:

{
    "documents": [
        {
            "key": "1",
            "entities": [
                {
                    "offset": 0,
                    "length": 15,
                    "text": "Red Bull Racing",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.9914557933807373,
                    "metaInfo": null
                },
                {
                    "offset": 16,
                    "length": 5,
                    "text": "Honda",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.6515499353408813,
                    "metaInfo": null
                },
                {
                    "offset": 27,
                    "length": 9,
                    "text": "four-time",
                    "type": "QUANTITY",
                    "subType": null,
                    "score": 0.9998091459274292,
                    "metaInfo": [
                        {
                            "offset": 27,
                            "length": 9,
                            "text": "four-time",
                            "subType": "UNIT",
                            "score": 0.9998091459274292
                        }
                    ]
                },
                {
                    "offset": 47,
                    "length": 5,
                    "text": "World",
                    "type": "LOCATION",
                    "subType": "NON_GPE",
                    "score": 0.5825434327125549,
                    "metaInfo": null
                },
                {
                    "offset": 79,
                    "length": 27,
                    "text": "Oracle Cloud Infrastructure",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.998045802116394,
                    "metaInfo": null
                },
                {
                    "offset": 108,
                    "length": 3,
                    "text": "OCI",
                    "type": "ORGANIZATION",
                    "subType": null,
                    "score": 0.9986366033554077,
                    "metaInfo": null
                }
            ],
            "languageCode": "en"
        }
    ],
    "errors": []
}

Limitations

Sometimes, entities might not be separated or combined as you expect.
NER uses the context of the sentence to identify entities. If the context isn't present in the text processed, entities might not be extracted as you expect.
Malformed text (structure and semantics) might reduce the performance.
Age isn't a separate entity so age-related periods might be identified as a date entity.

Oracle Cloud Infrastructure Documentation Try Free Tier

Named Entity Recognition

Supported Languages for Input Text

Use Cases 🔗

Supported Entities 🔗

Examples 🔗

Limitations 🔗

Oracle Cloud Infrastructure Documentation
Try Free Tier

Use Cases

Supported Entities

Examples

Limitations