Click Add Entity and then enter the name and select the
type. The dialog's fields reflect the entity type. For example, For regular
expressions entities, you can add the expression. For Value List entities, you
add the values and synonyms.
If your skill supports multiple languages through Digital Assistant's native language support, then you need to add the foreign-language
counterparts for the Value List entity's values and synonyms.
Because these values need to map to the corresponding value from the
primary langauge (The Primary Language Value), you need to select the primary
value before you add its secondary language counterpart. For example, if you've
added French as a secondary language to a skill's whose primary language is
English, you first select small as the Primary Language Value and then add
petite.
As an optional step, enter a description. You might use the description to
spell out the entity, like the pizza toppings for a PizzaTopping entity.
This descripition is not retained when you add the entity to a composite
bag.
You can add the following functions, which are optional. They can be
overwritten if you add the entity to a composite bag.
If a value list entity has a long list of values, but you
only want to show users only a few options at a time, you can set the
pagination for these values by entering a number in the
Enumeration Range Size field, or by defining an
Apache FreeMarker expression that evaluates to this number. For example, you
can define an expression that returns enum values based on
the channel.
When you set this property to 0, the skill
won't output a list at all, but will the user input against an entity
value.
If you set this number to one lower than the
total number of values defined for this entity, then the Resolve
Entities component displays a Show More button to accompany each full
set of values. If you use a Common Response component to resolve the
entity, then you can configure the Show More button yourself. You can change the Show More button text using the
showMoreLabel property that belongs to the Resolve
Entities and Common Response components.
Add an error message for invalid user input. Use an Apache
FreeMarker expression that includes the
system.entityToResolve.value.userInput property. For
example, ${system.entityToResolve.value.userInput!'This'}' is not a
valid pizza type.
To allow users to pick more than one value from a value
list entity, switch on Multiple Values. When you
switch this on, the values display as a numbered list. Switching this
option off displays the values as a list of options, which allows only a
single choice.
Switching on Fuzzy Match increases
the chances of the user input matching a value, particularly when your
values donβt have a lot of synonyms. Fuzzy matching uses word stemming to identify matches from the user input. Switching
off fuzzy matching enforces strict matching, meaning that the user input
must be an exact match to the values and synonyms; "cars" wonβt match a
value called "car", nor will "manager" match a "development manager"
value.
For skills that are configured with a translation service,
entity matching is based on the translation of the input. If you switch on
Match Original Value, the original input is also
considered in entity matching, which could be useful for matching values
that are untranslatable.
To force a user to select a single value, switch on
Prompt for Disambiguation and add a
disambiguation prompt. By default, this message is Please select one
value of <item name>, but you can replace this with one made
up solely of text (You can only order one pizza at a time. Which pizza do
you want to order?) or a combination of text and FreeMarker
expressions. For
example:
"I found multiple dates: <#list system.entityToResolve.value.disambiguationValues.Date as date>${date.date?number_to_date}<#sep> and </#list>. Which date should I use as expense date?"
Define a validation rule using a FreeMarker expression.
Note
You can only add prompts,
disambiguation, and validation for built-in entities when they belong to
a composite bag.
Click Create.
Next steps:
Add the entity to an intent. This informs the skill of the values that it needs to extract from the user input during the language processing. See Add Entities to Intents.
In the dialog flow, declare a variable for the entity.
Click Validate and review the validation messages for errors related to entity
event handlers (if used), potential problems like multiple values in a
value list entity sharing the same synonym, and for guidance on applying
best practices such as adding multiple prompts to make the skill more
engaging.
Value List Entities for Multiple Languages π
When you have a skill that is targeted to multiple languages and which uses Digital Assistant's native language support, you can set values for each language in the skill. For
each entity value in a skill's primary language, you should designate a corresponding
value in each additional language.
Tip:
To ensure that your skill
consistently outputs responses in the detected language, always include
useFullEntityMatches: true in Common Response, Resolve
Entities, and Match Entity states. As described in Add Natively-Supported Languages to a Skill, setting this
property to true (the default) returns the entity value as an
object whose properties differentiate the primary language from the detected
language. When referenced in Apache FreeMarker expressions, these properties ensure
that the appropriate language displays in the skill's message text and
labels.
Word Stemming Support in Fuzzy Match π
Starting with Release 22.10, fuzzy matching for list value entities is based on word
stemming, where a value match is based on the lexical root of
the word. In previous versions, fuzzy matching was enabled
through partial matching and auto correct. While this approach
was tolerant of typos in the user input, including transposed
words, it could also result in matches to more than one value
within the value list entity. With stemming, this scatter is
eliminated: matches are based on the word order of the user
input, so either a single match is made, or none at all. For
example, "Lovers Veggie" would not result in any matches, but
"Veggie Lover" would match to the Veggie Lovers value of a pizza
type entity. (Note that "Lover" is stemmed.) Stop words, such as
articles and prepositions, are ignored in extracted values, as
are special characters. For example, both "Veggie the Lover" and
"Veggie////Lover" would match the Veggie Lovers value.
Create ML Entities π
ML Entities are a model-driven approach to entity extraction. Like intents, you
create ML Entities from training utterances β likely the same training utterances that
you used to build your intents. For ML Entities, however, you annotate the words in the
training utterances that correspond to an entity.
To get started, you can annotate some of the training data yourself, but as
is the case for intents, you can develop a more varied (and therefore robust) training set by
crowd sourcing it. As noted in the training guidelines, robust entity detection requires anywhere from 600
- 5000 occurrences of each ML entity throughout the training set. Also, if the intent training
data is already expansive, then you may want to crowd source it rather than annotate each
utterance yourself. In either case, you should analyze your training data to find out if the
entities are evenly represented and if the entity values are sufficiently varied. With the
annotations complete, you then train the model, then test it. After reviewing the entities
detected in the test runs, you can continue to update the corpus and retrain to improve the
accuracy.
To create an ML Entity:
Click + Add Entity.
Complete the Create Entity dialog. Keep in mind that the Name and
Description appear in the crowd worker pages for Entity Annotation Jobs.
Enter a name that identifies the annotated content. A
unique name helps crowd workers.
Enter a description. Although this is an optional property,
crowd workers use it, along with the Name property, to differentiate
entities.
Choose ML Entity from the list.
Switch on Exclude System Entity Matches when the
training annotations contain names, locations, numbers, or other content that could
potentially clash with system entity values. Setting this option prevents the model from
extracting system entity values that are within the input that's resolved to this ML
entity. It enforces a boundary around this input so that the model recognizes it only as
an ML entity value and does not parse it further for system entity values. You can set
this option for composite bag entities that reference ML entities.
Click Create.
Click +Value List Entities to associate this
entity with up to five Value List Entities. This is optional, but associating an
ML Entity with a Value List Entity combines the contextual extraction of the ML
Entity and the context-agnostic extraction of the Value List Entity.
Click the DataSet tab. This page lists all
the utterances for each ML Entity in your skill, which include the utterances that you've
added yourself to bootstrap the entity, those submitted from crowd sourcing jobs, or have
been imported as JSON objects. From this page, you can add utterances manually or in bulk
by uploading a JSON file. You can also manage the utterances from this page by editing
them (including annotating or re-annotating them), or by deleting, importing, and
exporting them.
Add utterances manually:
Click Add Utterance. After you've added the
utterance, click Edit Annotations to open the Entity List.
Note
You can only add one
utterance at a time. If you want to add utterances in bulk, you can either add
them through an Entity Annotation job, or you can upload a JSON
file.
Highlight the text relevant to the ML Entity, then complete the
labeling by selecting the ML Entity from the Entity List. You can remove an
annotation by clicking x in the label.
Add utterances from a JSON file. This JSON file contains a list of
utterance
objects.
[
{
"Utterance": {
"utterance": "I expensed $35.64 for group lunch at Joe's on 4/7/21",
"languageTag": "en",
"entities": [
{
"entityValue": "Joe's"
"entityName": "VendorName",
"beginOffset": 37,
"endOffset": 42
}
]
}
},
{
"Utterance": {
"utterance": "Give me my $30 for Coffee Klatch on 7/20",
"languageTag": "en",
"entities": [
{
"entityName": "VendorName",
"beginOffset": 19,
"endOffset": 32
}
]
}
}
]
You
can upload it by clicking More >
Import to retrieve it from your local system.
The entities object describes the ML entities that have been
identified within the utterance. Although the preceding example illustrates a single
entities object for each utterance, an utterance may contain
multiple ML entities which means multiple entities
objects:
[
{
"Utterance": {
"utterance": "I want this and that",
"languageTag": "en",
"entities": [
{
"entityName": "ML_This",
"beginOffset": 7,
"endOffset": 11
},
{
"entityName": "ML_That",
"beginOffset": 16,
"endOffset": 20
}
]
}
},
{
"Utterance": {
"utterance": "I want less of this and none of that",
"languageTag": "en",
"entities": [
{
"entityName": "ML_This",
"beginOffset": 15,
"endOffset": 19
},
{
"entityName": "ML_That",
"beginOffset": 32,
"endOffset": 36
}
]
}
}
]
entityName
identifies the ML Entity itself and entityValue identifies the text
labeled for the entity. entityValue is an optional key that you can
use to validate the labeled text against changes made to the utterance. The label
itself is identified by the beginOffset and
endOffset properties, which represent the offset for the
characters that begin and end the label. This offset is determined by character, not
by word, and is calculated from the first character of the utterance (0-1).
Note
You can't create the ML Entities
from this JSON. They must exist before you upload the file.
If you don't want to determine the offsets, you can leave the
entities object undefined and then apply the labels after you
upload the JSON
file.
[
{
"Utterance": {
"utterance": "I expensed $35.64 for group lunch at Joe's on 4/7/21",
"languageTag": "en",
"entities": []
}
},
{
"Utterance": {
"utterance": "Give me my $30 for Coffee Klatch on 7/20",
"languageTag": "en",
"entities": []
}
}
]
The
system checks for duplicates to prevent redundant entries. Only changes made to the
entities definition in the JSON file are applied. If an utterance
has been changed in the JSON file, then it's considered a new utterance.
Edit an annotated utterance:
Click Edit to remove the annotation.
Note
A modified utterance is
considered a new (unannotated) utterance.
Click Edit Annotations to open the Entity
List.
Highlight the text, then select an ML Entity from the Entity
List.
If you need to remove an annotation, click x
in the label.
When you've completed annotating the utterances. Click
Train to update both trainer Tm and the Entity
model.
Test the recognition by entering a test phrase in the Utterance
Tester, ideally one with a value not found in any training data. Check the results to find
out if the model detected the correct ML Entity and if the text has been labeled correctly
and completely.
Associate the ML Entity with an intent.
Exclude System Entity Matches π
Switching on Exclude System Entity Matches prevents the model
from replacing previously extracted system entity values with competing values found
within the boundaries of an ML entity. With this option enabled, "Create a meeting on
Monday to discuss the Tuesday deliverable" keeps the DATE_TIME and ML entity values
separate by resolving the applicable DATE_TIME entity (Monday) and ignoring "Tuesday" in
the text that's recognized as the ML entity ("discuss the Tuesday deliverable").
When this option is disabled, the skill instead resolves two DATE_TIME
entities values, Monday and Tuesday. Clashing values like these diminish the user
experience by updating a previously slotted entity value with an unintended value or by
interjecting a disambiguation prompt that interrupts the flow of the conversation.
Note
You can set the Exclude
System Entity Matches option for composite bag entities that
reference an ML entity.
Import Value List Entities from a CSV File π
Rather than creating your entities one at a time, you can create entire sets of them
when you import a CSV file containing the entity definitions.
This CSV file contains columns for the entity name,
(entity), the entity value (value) and any synonyms
(synonyms). You can create this file from scratch, or you can reuse
or repurpose a CSV that has been created from an export.
Whether you're starting anew or using an exported file, you need to be
mindful of the version of the skill that you're importing to because of the format and
content changes for native language support that were introduced in Version 20.12.
Although you can import a CSV from a prior release into a 20.12 skill without incident
in most cases, there are still some compatibility issues that you may need to address.
But before that, let's take a look at the format of a pre-20.12 file. This file is
divided into the following columns: entity, value, and
synonyms. For
example:
For skills created with, or upgraded to, Version 20.12, the import files
have language tags appended to the value and synonyms
column headers. For example, if the skill's primary native language is English
(en), then the value and synonyms
columns are en:value and
en:synonyms:
CSVs
that support multiple native languages require additional sets of
value and synonyms columns for each secondary
language. If a native English language skill's secondary language is French
(fr), then the CSV has fr:value and
fr:synonyms columns as counterparts to the en
columns:
entity,en:value,en:synonyms,fr:value,fr:synonyms
PizzaSize,Large,lrg:lrge:big,grande,grde:g
PizzaSize,Medium,med,moyenne,moy
PizzaSize,Small,,petite,p
PizzaSize,Extra Large,XL,pizza extra large,
Here are some things to note if you plan to import CSVs across versions:
If you import a pre-20.12 CSV into a 20.12 skill (including those that support
native languages or use translation services), the values and synonyms are
imported as primary languages.
All entity values for both the primary and secondary languages must
be unique within an entity, so you can't import a CSV if the same value has been
defined more than once for a single entity. Duplicate values may occur in
pre-20.12 versions, where values can be considered unique because of variations
in letter casing. This is not true for 20.12, where casing is more strictly
enforced. For example, you can't import a CSV if it has both PizzaSize,
Small and PizzaSize, SMALL. If you plan to upgrade
Version 20.12, you must first resolve all entity values that are the same, but
differentiated only by letter casing before performing the upgrade.
Primary language support applies to skills created using Version 20.12 and
higher, so you must first remove language tags and any secondary language
entries before you can import a Version 20.12 CSV into a skill created with a
prior version.
When you import a 20.12 CSV into a 20.12 skill:
You can import a multi-lingual CSV into skills that do not use
native language support, including those that use translation services.
If you import a multi-lingual CSV into a skill that supports native
languages or uses translation services, then only rows that provide a valid
value for the primary language are imported. The rest are ignored.
With these caveats in mind, here's how you create entities through an import:
Click Entities () in the side navbar.
Click More, choose Import Value
list entities, and then select the .csv
file from your local system.
Add the entity or entities to an intent (or to an entity list
and then to an intent).
Export Value List Entities to a CSV File π
You can export the values and synonyms in a CSV file for reuse in another skill. The
exported CSVs share the same format as the CSVs used for creating entities through
imports in that they contain entity, value, and
synonyms columns. The these CVS have release-specific requirements
which can impact their reuse.
The CSVs exported from skills created with, or upgraded to, Version
20.12 are equipped for native language support though the primary (and sometimes
secondary) language tags that are appended to the value and
synonyms columns. For example, the CSV in the following
snippet has a set of value and synonyms
columns for the skill's primary language, English (en) and
another set for its secondary language, French
(fr):
entity,en:value,en:synonyms,fr:value,fr:synonyms
The
primary language tags are included in all 20.12 CSVs regardless of native
language support. They are present in skills that are not intended to perform
any type of translation (native or through a translation service) and in skills
that use translation services.
The CSVs exported from skills running on versions prior to 20.12 have the
entity, value, and synonyms columns, but no language tags.
To export value list entities:
Click Entities () in the side navbar.
Click More, choose Export Value
list entities and then save the file.
The exported .csv file is named for your skill.
If you're going to use this file as an import, then you may need to perform
some of the edits described in Import Intents from a CSV File if you're going to import it to, or export it from,
Version 20.12 skills and prior versions.
Create Dynamic Entities π
Dynamic entity values are managed through the endpoints of the Dynamic
Entities API that are described in the REST API for Oracle Digital Assistant. To add,
modify, and delete the entity values and synonyms, you must first create a dynamic
entity to generate the entityId that's used in the REST
calls.
To create the dynamic entity:
Click + Entity.
Choose Dynamic Entities from the Type
list.
If the backend service is unavailable or hasn't yet pushed any
values, or if you do not maintain the service, click +
Value to add mock values that you can use for testing purposes.
Typically, you would add these static values before the dynamic entity
infrastructure is in place. These values are lost when you clone, version, or
export a skill. After you provision the entity values through the API, you can
overwrite, or retain, these values (though in most cases you would overwrite
them).
Click Create.
Tip:
If the API
refreshes the entity values as you're testing the conversation, click
Reset to restart the conversation.
A couple of notes for service developers:
You can query for the dynamic entities configured for a skill using
the generated entityId with the botId. You
include these values in the calls to create the push requests and objects that
update the entity values.
An entity cannot have more than 150,000 values. To reduce the likelihood of exceeding this limit when you're dealing with large amounts of data, send PATCH requests with your deletions before you send PATCH requests with your additions.
Note
Dynamic entities are only supported on instances of Oracle Digital Assistant that were provisioned on Oracle Cloud Infrastructure (sometimes referred to as
the Generation 2 cloud infrastructure). If your instance is provisioned on the
Oracle Cloud Platform (as are all version 19.4.1 instances), then you can't use
feature.
Guidelines for Creating ML
Entities π
Here's a general approach to creating an ML Entity.
Create concise ML Entities. The ML Entity definition is at the base
of a useful training set, so clarity is key in terms of its name and the
description which help crowd workers annotate utterances.
Because crowd workers rely on the ML Entity descriptions and names, you
must ensure that your ML Entities are easily distinguishable from each
other, especially when there's potential overlap. If the differences are not
clear to you, it's likely that crowd workers will be confused. For example,
the Merchant and Account Type entities may be difficult to differentiate in
some cases. In "Transfer $100 from my savings account to Pacific Gas and
Electric," you can clearly label "savings" as Account Type and Pacific Gas
and Electric as Merchant. However, the boundary between the two can be
blurred in sentences like "Need to send money to John, transfer $100 from my
savings to his checking account." Is "checking account" an Account type, or
a Merchant name? In this case, you may decide that any recipient should
always be a merchant name rather than an account type.
In preparation of crowd sourcing the training utterances, consider the typical
user input for different entity extraction contexts. For example, can the value
be extracted in the user's initial message (initial utterance context), or is it
extracted from responses to the skill's prompts (slot utterance context)?
Context
Description
Example Utterances (detected ML Entity values in
bold)
Initial utterance context
A message that's usually well-structured and includes ML
Entity values. For an expense reporting skill, for example,
the utterance would include a value that the model can
detect for an ML Entity called Merchant.
Create an expense for team dinner at John's Pasta
Shop for $85 on May 3
Slot utterance context
A user message that provides the ML Entity in response to
a prompt, either because of conversation design (the skill
prompts with "Who is the merchant?") or to slot a value
because it hasn't been provided by a previously submitted
response.
In other circumstances, the ML Entity value
may have already been provided, but may be included in
other user messages in the same conversation. For
example, the skill might prompt users to provide
additional expense details or describe the image of an
uploaded receipt.
Merchant is John's Pasta Shop.
Team dinner. Amount $85. John's Pasta
Shop.
Description is TurboTaxi from home to CMH
airport.
If you don't have enough training data, or if you're
starting from scratch, launch an Intent Paraphrasing Job. To gather viable (and
abundant) utterances for training and testing, integrate the entity
context into the job by creating tasks for each intent. To gather
diverse phrases, consider breaking down each intent by conversation
context.
For the task's prompt, provide crowd workers context and
ask them, "How would you respond?" or "What would you say?" Use the
accompanying hints to provide examples and to illustrate different
contexts. For example:
Prompt
Hint
You're talking to an expense
reporting bot, and you want to create an expense.
What would be the first thing you would say?
Ensure that the merchant name is in
the utterance. You might say something like, "Create
an expense for team dinner at John's Pasta Shop for
$85 on May 3."
This task asks for phrases that not only initiate the
conversation, but also include a merchant name. You might also want
utterances that reflect responses prompted by the skill when the user
doesn't provide a value. For example, "Merchant is John's Pasta Shop" in
response to the skill's "Who is the merchant?" prompt.
Prompt
Hint
You've submitted an expense to the
an expense reporting bot, but didn't provide a
merchant name. How would you respond?
Identify the merchant. For example,
"Merchant is John's Pasta Shop."
You've uploaded an image of a receipt
to an expense reporting bot. It's now asking you to
describe the receipt. How would you respond?
Identify the merchant's name on the
receipt. For example: "Grandiose Shack Hotel receipt
for cloud symposium."
To test false positives for testing β words and phrases that the
model should not identify as ML Entities β you may also want to collect
"negative examples". These utterances do include an ML Entity value.
Context
Example Utterances
Initial utterance context
Pay me back for Tuesday's
dinner
Slot utterance context
Pos presentation dinner. Amount
$50. 4 people.
Description xerox lunch for
5
Hotel receipt for interview
stay
Gather a large training set by setting an appropriate
number of paraphrases per intent. For the model to generalize
successfully, your data set must contain somewhere between 500 and 5000
occurrences for each ML entity. Ideally, you should avoid the low end of
this range.
Once the crowd workers have completed the job (or have completed
enough utterances that you can cancel the job), you can either add the
utterances, or launch an Intent Validation job to verify them. You can also
download the results to your local system for additional review.
Reserve about 20% of the utterances for testing. To create CSVs for the Utterance Tester from the downloaded CSVs for Intent
Paraphrasing and Intent Validation jobs:
For Intent Paraphrasing jobs: transfer the contents in the
result column (the utterances provided by crowd
workers) to the utterance column in the Utterance
Tester CSV. Transfer the contents of the intentName
column to the expectedIntent column in the Utterance
Tester CSV.
For Intent Validation jobs: transfer the contents in the
prompt column (the utterances provided by crowd
workers) to the utterance column in the Utterance
Tester CSV. Transfer the contents of the intentName
column to the expectedIntent column in the Utterance
Tester CSV.
Add the remaining utterances to a CSV file with a single column,
utterance. Create an Entity Annotation Job by uploading
this CSV. Because workers are labeling the entity values, they will likely
classify negative utterances as "I'm not sure" or "None of the entities
apply."
After the Entity Annotation job is complete, you can add the
results, or you can launch an Entity Validation job to verify the labeling. Only
the utterances that workers deem correct in an Entity Validation job can be
added to the corpus.
Tip:
You can add, remove, or adjust the
annotation labels in the Dataset tab of the Entities page.
Train the entity by selecting Entity.
Run test cases to evaluate entity recognition using the
utterances that you reserved from the Intent Paraphrasing job. You can divide up
these utterances into different test suites to test different behaviors (unknown values, punctuation
that may not be present in the training data, false positives, and so on).
Because there may be a large number of these utterances, you can create test
suites by uploading a CSV into the Utterance Tester.
Note
The Utterance
Tester only displays entity labels for passing test cases. Use a Quick Test instead to view the labels for utterances
that resolve below the confidence threshold.
Use the results to refine the data set. Iteratively add, remove, or
edit the training utterances until test run results indicate the model is
effectively identifying ML Entities.
Note
To prevent inadvertant entity
matches that degrade the user experience, switch on Exclude
System Entity Matches if the training data contains names,
locations, numbers.
ML Entity Training
Guidelines π
The model generalizes an entity using both the context around a word (or
words) and the lexical information about the word itself. For the model to generalize
effectively, we recommend that the number of annotations per entity to range somewhere
between 500 and 5000. You may already have a training set thatβs both large enough and
has the variation of entity values that youβd expect from end users. If this is the
case, you can launch an Entity Annotation job and then incorporate the results into the
training data. However, if you donβt have enough training data, or if the data that you
do have lacks sufficient coverage for all the ML entities, then you can collect
utterances from crowd-sourced Intent Paraphrasing jobs.
Whatever the source, the distribution of entity values should reflect your
general idea of the values that the model may encounter. To adequately train the
model:
Do not overuse the same entity values in your trainining data.
Repetitive entity values in your training data prevent the model from
generalizing on unknown values. For example, you expect the ML Entity to
recognize a variety of values, but the entity is represented by only 10-20
different values in your training set. In this case, the model will not
generalize, even if there are two or three thousand annotations.
Vary the number of words for each entity value. If you expect users
to input entity values that are three-to-five words long, but your training data
is annotated with one- or two-word entity values, then the model may fail to
identify the entity as the number of words increase. In some cases, it may only
partially identify the entity. The model assumes the entity boundary from the
utterances that you've provided. If you've trained the model on values with one
or two words, then it assumes the entity boundary is only one or two words long.
Adding entities with more words enables the model to recognize longer entity
boundaries.
Utterance length should reflect your use case and the anticipated
user input. You can train the model to detect entities for messages of varying
lengths by collecting both short and long utterances. The utterances can even
have multiple phrases. If you expect short utterances that reflect the
slot-filling context, then gather your sample data accordingly. Likewise, if
you're anticipating utterances for the initial context scenario, then the
training set should contain complete phrases.
Include punctuation. If entity names require special characters,
such as '-' and '/', include them in the entity values in the training
data.
Ensure that all ML Entities are equally represented in your
training data. An unbalanced training set has too many instances of one entity
and too few of another. The models produced from unbalanced training sets
sometimes fail to detect the entity with too few instances and over-predict for
the entities with disproportionately high instances. This leads to
false-positives.
ML Entity Testing
Guidelines π
Before your train your skill, you should reserve about 20% of unannotated
utterances to find out how the model generalizes when presented with utterances or
entity values that are not part of its training data. This set of utterances may not be
your only testing set, depending on the behaviors you want to evaluate. For example:
Use only slot context utterances to find out how well the model
predicts entities with less context.
Use utterances with "unknown" values to find out how well the model generalizes
with values that are not present in the training data.
Use utterances without ML Entities to find out if the model detects
any false positives.
Use utterances that contain ML Entity values with punctuation to
find out how well the model performs with unusual entity values.