SQL Dialog Skills
SQL Dialogs are skills that can translate a user's natural language utterances into SQL queries, send the queries to a backend data source, and display the response. This version of SQL Dialogs supports integration with Oracle database services, such as Oracle Enterprise Database Service.
This version doesn't support multi-language SQL Dialog skills or skills where the primary language is not English. When you create a new skill (or version or clone of a skill), you use the Primary Language field in the Create dialog to specify the primary language. A skill is multi-language if the resource bundle, sample utterances, machine learning, and value lists, for example, have more than one language or if the dialog flow contains code to detect the language of the user input and translate it behind the scenes.
When writing skills that provide database information to end users, developers typically need to define the use cases, write custom components to retrieve the data, create intents for the use cases, map user utterances to intents, and write the dialog flow to handle every intent. With SQL Dialog skills, you don't need to do these steps. Instead, you map the users' mental models of the data to the physical data source, and the skill uses the map to dynamically generate SQL from natural language utterances.
For example, users might know that employees belong to departments that are in various locations, and they have employee IDs, job titles, hire dates, and sometimes commissions. Given their mental model, they can retrieve the data by asking the skill "What is James Smith's job?", "When was James Smith hired?", "How many employees are in New York?", and "Who has the highest commission?
You build a SQL Dialog skill differently than regular skills. To enable the skill to understand and respond to natural language utterances, you create a logical model from physical model and you teach that model by using natural language terms to describe the physical model.
How SQL Dialogs Work
To implement a SQL Dialog skill, you create a visual dialog skill and import information about the physical model (database schema) from the data service. Oracle Digital Assistant uses this information to create a query entity for each table you imported (the logical model). The query entities contain attributes that model the table columns.
If a table in the physical model has a foreign key, then the query entity has an attribute that links to the related query entity. For example, if an Emp
table has a foreign key to the Dept
table, then the Emp
query entity has a dept
attribute, which links to the Dept
entity.
As soon as you create the query entities and identify their primary keys, you can train the skill and it's ready to use in a rudimentary way. That is, you can use free form utterances, but, for now, the query must use the exact entity and attribute primary names, which are initially derived from the physical model's names (the canonical names). This will change as you enhance the logical model to more closely reflect natural language.
To enable the end users to use natural language to ask about the data, you map end-user terminology to the physical model by changing the primary names and adding synonyms for both the query entities and their attributes. For example, you might change the primary name for the Emp
table to "employee" and add the "staff member" synonym. Adding primary names and synonyms are two of the ways that you train the natural language parser (NLP) to resolve utterances into Oracle meaning representation query language (OMRQL) queries. OMRQL queries are like SQL queries but are based on the canonical names of the object models (the query entities). For example, if you change the primary name for empno
to "employee number" then "what is Joe Smith's employee number" resolves to SELECT empno FROM emp WHERE ename = 'Joe Smith'.
To even further improve the natural language processor (NLP) resolution, you also can associate the attributes with value lists, which are automatically populated from the data service upon creation.
For example, let's say that you import Emp
and Dept
tables from a data service, which results in Emp
and Dept
query entities. Immediately after you import the tables and train the skill, you can query the query entities using utterances like following:
Show all Emp in dept 10
After you change the primary names for the entities and attributes to more natural language terms, such as Employees
for the entity and department
for the attribute, you can use utterances like this one:
Show all the employees in department 10
You can further add synonyms to model all the ways people typically refer to the entity or attribute. For example, you might add the synonyms district
and territory
for department
so that the NLP recognizes this utterance:
Show all employees in district 10
With more complex utterances, you can teach the skill how to resolve the queries to OMRQL by adding custom training data that associates the utterances with specific OMRQL statements.
If your skill has intents to handle non-SQL use cases or is included in a DA, you'll want to add routing utterances to the query entities dataset to help the skill differentiate between SQL related utterances and non-SQL related utterances.
When the skill outputs a query result, it lets the user give a thumbs up or thumbs down to indicate whether the result is correct. The Insights page shows the thumbs up (correct queries) and thumbs down (incorrect queries) counts so you can see how well the skill is doing.
Supported Queries
The SQL Dialogs natural language processing model supports queries that translate to the basic SQL clauses: SELECT, FROM, WHERE, GROUP BY, HAVING, ORDER BY and LIMIT.
A query can involve up to 3 different entities. That is, it can have up to 2 joins. Queries that involve 3 joins might resolve to correct results depending on the use case. For 4 or more joins, you'll have to add custom training data to ensure the skill gives the correct results.
SQL Dialogs doesn't support the more complex queries that involve sub-queries and SET operators (INTERSECT, UNION, EXCEPT, and NONE). It also doesn't support non-English queries, queries that request a yes or no response, queries with pronouns, and follow-up queries. To learn about these and other SQL Dialogs query limitations, see Troubleshooting SQL Queries.
For some of the queries that aren't supported, you might be able to resolve the issue by using database views or creating virtual attributes as described in Add a Custom Attribute.
Below is a table that describes the types of queries that SQL Dialogs supports. It uses the following database values:
Employee Table
employee_id | name | role | salary | department_id |
---|---|---|---|---|
1 | Alex Smith | VP | 500000 | 1 |
2 | Samyra Kent | Sales Associate | 60000 | 1 |
3 | Laticia Fan | Sales Associate | 80000 | 1 |
Department Table
department_id | name | location |
---|---|---|
1 | Sales | Dallas |
Here are the types of queries that SQL Dialogs supports:
Category | Description | Examples |
---|---|---|
Display |
You can request an entity, attributes, and attribute aggregations. The supported aggregations are avg, sum, min, max, count(attribute), count(column), count(distinct attribute) If the query doesn't name any attributes, then the skill displays those listed for Default Attributes in the entity's Presentation tab. |
|
Filters |
You can filter the rows of the table by conditions on specific attributes Text attributes can be equal to a value, contain a value, or begin or end with a value. You can use different comparators to filter numeric and date time columns (=, <, <=, >, >=). You use AND and OR to combine multiple conditions. |
|
Filters with dates |
When filtering by a date or datetime attribute, consider these points:
|
|
Ordering and limiting the number of rows | You can explicitly request that the skill sort the resulting rows by a particular attribute. You can also request a certain number of the highest or lowest values of an attribute or entity. |
|
Group by | You can request different aggregates after grouping by either an attribute or an entity. |
|
Group by and filter | You can filter attributes or entities based on aggregations. |
|
Group by and order with optional limit | You can sort attributes or entities based on aggregations, and optionally request to see a number of the top or bottom rows. |
|
Tutorial: Getting Started with SQL Dialogs
You can get a hands-on look at SQL Dialogs by walking through this tutorial:
SQL Dialogs Workflow
How you build a SQL Dialog skill differs from regular skills. Here are the major steps to build a SQL Dialog skill and train it so that people can use natural language to query the data services.
The participants in the following steps are the skill developer, service administrator, database expert, and AI trainer.
-
The skill developer gathers the skill requirements (user personas, use cases, and tasks) and training corpus (sample user utterances), and creates the skill. The developer also helps define how the results are displayed. This person is sometimes referred to as the conversation designer.
-
The service administrator adds a connection to the data service.
-
The database expert analyzes the skill requirements and training corpus to identify the tables and attributes that provide the answers. The expert then creates the base logical model by importing information from the physical model into the skill. The expert also assists the skill developer and AI trainer with tasks such as adding SQL-expression based attributes, associating attributes with value lists uploaded from tables, associating attributes with regulare expressions, and performing custom training.
- The AI trainer adds primary names and synonyms to teach the natural language parser (NLP) how to understand the natural language utterances. For utterances that the skill can't translate to OMRQL, the AI trainer adds custom training to teach the natural language parser how to understand these utterances. The trainer continually monitors and tests the skill to increase the accuracy of translating natural language into database queries.
To help illustrate the workflow, we'll use an example accounts payable data service with the following tables. For brevity, we just show the columns mentioned in this topic.
Table | Columns |
---|---|
invoices |
|
payment_schedules |
|
suppliers |
|
-
Define the Requirements: The skill developer gathers the use cases and tasks that the SQL Dialog skill is expected to support. For example, an accounts payable department might have this use case:
-
Use Case: Pay all invoices with outstanding balances that are due within 30 days so that we can avoid penalties.
-
Task: Find all unapproved invoices that are due within 30 days so that we can approve them in time.
-
Task: Find all outstanding approved invoices due within 30 days so that we can schedule to pay them in time.
-
As part of this requirements phase, the skill developer compiles a representative list of the different ways people ask for this information. This list serves as the set of example utterances that the AI trainer uses for the training corpus.
-
-
Set Up the Skill: The service administrator, skill developer, and database expert work together to set up the basic skill.
-
Integrate with the Service: The service administrator creates a connection from Oracle Digital Assistant to the data service. See Connect to the Data Service.
-
Create the SQL Dialog Skill: The skill developer creates the SQL Dialog skill, ensuring that the dialog mode is set to Visual in the Create Skill dialog. See Create the SQL Dialog Skill.
-
Import the Schema: The database expert identifies the tables and fields that are necessary to support the use cases and then, from the skill's Entities page, imports them from the data service as described in Create Query Entities to Model the Data Service. This creates a base logical model that contains a query entity for each imported table.
In our example, the database expert imports the
invoices
,payment_schedules
, andvendors
, tables.At this point, the skill is ready for use with limited functionality. For the base logical model, the entity and attribute names are derived from the physical model's table and field names. For example, if the table name is
payment_schedules
, then the primary name ispayment schedules
. The AI trainer can test queries from the Entities page or use the conversation tester (Preview) to try out the SQL functionality.In our example data service, they can use test utterances such as "show invoices with pmt status flag N", "show invoice num 17445", or "show payment schedules with due date before 2022-08-30".
-
-
Train: Add training data through primary names, synonyms, value lists, regular expressions, and natural language queries mapped to OMRQL.
-
Add Natural Language Terminology: To help associate natural language phrases with the underlying data structure, the AI trainer teaches the skill the different ways that end users refer to the entities and attributes. That is, the names that people will use in their natural language utterances. The trainer starts by analyzing the phrases that the skill developer gathered to identify the utterances that the skill should handle (the training corpus). Additionally, they can consult a thesaurus for synonyms and crowd-source for similar phrasing. Then the AI trainer records the equivalent terms by changing the primary names and adding synonyms. See Provide Training Data Through Names and Synonyms.
In our example, one of the utterances gathered during the requirements phase is "Give me list of invoices with an outstanding balance greater than zero." The attribute that contains the balance is
amount remaining
, so the AI trainer adds the synonymoutstanding balance
to that attribute. -
Associate with Value Lists: To improve accuracy, the AI trainer can, where appropriate, create value lists that contain sample values from the data service. The skill automatically associates the lists with their respective attributes, which helps the natural language parser understand the kinds of values those attributes can hold. See Provide Training Data Through Value Lists.
In our example, they associate the
vendor_name
attribute with a value list retrieved from the data service. If the value list includes "Seven Corporation" and a user asks "show summary flag for Seven Corporation", the NLP will deduce that Seven Corporation is a vendor name. -
Associate with Regular Expressions: When an attribute's values must match a specific pattern, the AI trainer can create a regular expression entity and associate it with that attribute. See Provide Training Data Through Regular Expressions.
For example, the AI trainer can associate an
ip_address
attribute with the regular expression(\\d{1,2}|(0|1)\\d{2}|2[0-4]\\d|25[0-5])
. -
Map Complex Queries: In cases where the skill isn't able to translate a valid utterance into OMRQL, the AI trainer adds that utterance to the training data and maps it to OMRQL as described in Provide Training Data Through Utterances. For example, you can map "show unpaid invoices" to
SELECT * payment_schedules WHERE payment_status_flag = 'Y'
. -
Provide Autocomplete Suggestions: To help users learn what the logical model is capable of answering, add examples utterances as described in Provide Query Suggestions for SQL Dialog Users.
-
Provide Routing Data: If your SQL Dialog skill has intents or if it's in a DA, then you'll need to add utterances to help the skill distinguish database queries. See Route Utterances to the SQL Dialogs Conversation.
-
Train the NLP Model: To incorporate training data into the NLP model, the skill developer or AI trainer clicks the Train icon and clicks Submit.
-
-
Configure How Information is Displayed: The database expert and skill developer work together to fine tune how each entity's results are displayed, as described in Configure Presentation of Entities and Attributes. For example, they do things like can set an entity's default sort order, display as form or table, set the minimum attributes to include in the output, add buttons and links to results, and add attributes that display derived or calculated data.
In our example, they might set both the invoice entity's default sort order and minimum attributes to
invoice_num
, and set the default attributes toinvoice_num
,invoice_date
,pmt_status_flag
, andinvoice_amount
. They might also add anage
attribute that is calculated using the difference between today's date and the invoice date. -
Configure Query Rules: The database expert and AI trainer work together to set the query rules, such as when to use partial matching and what attribute to use for measuring when someone asks to compare rows without specifying an attribute to compare with. See Define Query Rules.
In our example, they anticipate end users asking for the 10 most payments to make, so they'll configure the
payment schedules
entity to usedue_date
for comparisons, and they'll invert comparisons for that attribute so that earlier dates rank higher than later dates. -
Test and Repair: The AI trainer uses the query tester from the Entities page to verify that the test utterances resolve to the desired OMRQL, and that the skill can translate the OMRQL to executable SQL. When the query tester can't translate the OMRQL to SQL, it requests training data. In many cases, you can resolve this by adding the utterance to the training data and associating it with an OMRQL statement. See Test and Repair.
-
Monitor and Improve: After the skill enters the beta testing phase and beyond, the AI trainer, skill developer, project manager, and stakeholders can continually monitor batch tests and Insights data to see how well the skill is performing and to identify areas for improvement. See Monitor and Improve.
Connect to the Data Service
Before you can access a data service from any SQL Dialog skill, you need to add a data service integration that enables Oracle Digital Assistant to access the data service. You only need one integration per data service.
Integrations have been tested with Oracle Database Cloud Service Enterprise Edition 12c and 19c Oracle Autonomous Transaction Processing and and MySQL HeatWave Database Service with MySQL version 8.
After you create the service, you can't change it. Should the password change, you'll need to delete and recreate the data service integration.
Oracle Data Service
To connect to an Oracle database, follow these steps:
-
In Digital Assistant, click to open the side menu, click Settings, click Additional Services, and click the Data tab.
-
Click + Add Service.
-
In the New Data Service dialog, provide this basic information:
Field Name Description Database Type Select Oracle. Name A unique name for the service. Data Service Description An optional description of the data service integration such as a description of the database or the purpose. User Name Ask your database administrator for the user name and password that gives access to the tables that the skill developers need to create the composite entities for their SQL Dialog skill as described in Create Query Entities to Model the Data Service. Password The user's password. Note that for Oracle Digital Assistant integration, a password must be at least 14 characters and no more than 30 characters, and it must contain at least one upper case character, one lowercase character and one number. It also can't start with a digit. -
Click Continue to configure end-user authentication if your data service is configured for role-based access. Here is a description of the fields on that page:
Field Name Description End-User Authentication is required Select this option if your data service is configured for role-based access. Authentication Service Select an authentication services that you configured in Settings > Authentication Services.
End-User Identifier Select the type of end-user identifier. Custom Expression If the selected end-user identifier type is custom, enter a FreeMarker expression to a user profile variable which represents the end user identifier.
See Expressions for OICD Profile Claims for more information and examples.
-
Click Continue to add the connection details.
-
On the Connection Details page, select Basic or Cloud Wallet Connection for the connection type.
- For single-node databases and RAC-enabled databases, you need to select Basic.
- If you are connecting to an ATP database, you must select Cloud Wallet Connection.
-
If the connection type is Basic, enter these values, which you can get from the database administrator:
Field Name Description Use TLS Move this switch to the ON position if you want to use TLS (Transport Layer Security) to secure the connection. Note
If you are using a private CA certificate to connect to the database, this option needs to be switched on.Host Name Enter the host for the data service. Leave out the
https://
prefix. For example:example.com
.Port The port that allows client connections to the database. Service Identifier Do one of the following:
-
Select SID and enter the Oracle system identifier of the database instance.
-
Select Service Name and enter the service name for the database.
Private Endpoint This option only appears if you have private endpoints configured in your Digital Assistant instance. If you are connecting to a private endpoint to access the service, switch the Private endpoint toggle to the ON position and then select from a list of private endpoints that are associated with the instance.
(Using a private endpoint enables you to access a service that is not accessible directly from the public Internet. See Private Endpoint for the details.)
-
-
If the connection type is Cloud Wallet Connection, enter these values, which you can get from the database administrator:
Field Name Description Wallet File Find and select the Cloud Wallet file that contains the client credentials or drag and drop it into the field. Wallet Password Enter the password that was provided when the wallet file was downloaded. Note that for Oracle Digital Assistant integration, a wallet password must be at least 15 characters and no more than 30 characters, and it must contain at least one upper case character, one lowercase character, one special character, and one number. It also can't start with a digit. Service Select the name of the database service. Be sure to select a service that has a sufficiently high service concurrency so that queries don't take longer than 30 seconds (at which point they time out). The service names with the suffixes _tp
and_tpurgent
are generally the most suitable choices here. You can read more about these considerations at Database Service Names for Autonomous Database and Service Concurrency.Private Endpoint This option only appears if you have private endpoints configured in your Digital Assistant instance. If you are connecting to a private endpoint to access the service, switch the Private endpoint toggle to the ON position and then select from a list of private endpoints that are associated with the instance.
(Using a private endpoint enables you to access a service that is not accessible directly from the public Internet. See Private Endpoint for the details.)
- On the Advanced Properties page, if you need a
private CA certificate to connect to the database, switch the Use Private
Trust toggle to the ON position and fill in the rest of required
fields.
Field Name Description Use Private Trust If you are using a private CA certificate to connect to the database, switch this toggle to the ON position. This switch is only enabled if you have selected Use TLS on the Connection Details page. Certificates Resource Select Self Managed. Choose PEM File and Paste PEM Text Select one of these options for providing the certificate. -
Click Add Service.
You now can import the database schema into a skill to create query entities, which enable users to query the database using natural language.
Expressions for OICD Profile Claims
If you have a connection to a data service with role-based access and you selected Custom as the user identifier type, you need to provide a FreeMarker expression to a user profile variable which represents the end user identifier, either as a standard or custom OpenID Connect (OIDC) claim. Here are some examples:
-
${userProfile.MyAuthService1.value.sub}
-
${userProfile.MyAuthService1.value["http://acme.com/custom_identifier"]}
For more information on how profile claims in OIDC work and some example claims, see the following resources:
MySQL Data Service
-
In Digital Assistant, click to open the side menu, click Settings, click Additional Services, and click the Data tab.
-
Click + Add Service.
-
In the New Data Service dialog, provide this basic information:
Field Name Description Database Type Select MySQL. Name A unique name for the service. Data Service Description An optional description of the data service integration such as a description of the database or the purpose. Authentication Type Your database administrator will tell you whether to select Default, Kerberos, or OS. User Name Ask your database administrator for the user name and password that gives access to the tables that the skill developers need to create the composite entities for their SQL Dialog skill as described in Create Query Entities to Model the Data Service. Password The user's password. Note that for Oracle Digital Assistant integration, a password must be at least 14 characters and no more than 30 characters, and it must contain at least one upper case character, one lowercase character and one number. It also can't start with a digit. -
Click Continue to configure the connection details listed in this table:
Field Name Description Use TLS Move this switch to the ON position if you want to use TLS (Transport Layer Security) to secure the connection. Note
If you are using a private CA certificate to connect to the database, this option needs to be switched on.Host Name Enter the host for the data service. Leave out the
https://
prefix. For example:example.com
.Port The port that allows client connections to the database. Database The name of the database.
Private Endpoint This option only appears if you have private endpoints configured in your Digital Assistant instance. If you are connecting to a private endpoint to access the service, switch the Private endpoint toggle to the ON position and then select from a list of private endpoints that are associated with the instance.
(Using a private endpoint enables you to access a service that is not accessible directly from the public Internet. See Private Endpoint for the details.)
- On the Advanced Properties page, if you need a
private CA certificate to connect to the database, switch the Use Private
Trust toggle to the ON position and fill in the rest of required
fields.
Field Name Description Use Private Trust If you are using a private CA certificate to connect to the database, switch this toggle to the ON position. This switch is only enabled if you have selected Use TLS on the Connection Details page. Certificates Resource Select Self Managed. Choose PEM File and Paste PEM Text Select one of these options for providing the certificate. -
Click Add Service.
You now can import the database schema into a skill to create query entities, which enable users to query the database using natural language.
Create the SQL Dialog Skill
To create a SQL Dialog skill, you simply create a skill with the Dialog mode set to Visual.
Create Query Entities to Model the Data Service
To enable data service queries in a SQL Dialog skill, you import information about a data service's physical model (the tables or views and their columns) to create a base logical model. During the import, the skill adds query entities to the logical model, where each query entity represents a physical table.
A skill can have no more than 50 query entities and attributes. For example you can have 5 query entities that combined have 45 attributes.
When you train your skill, it uses the information from the query entities to build a model for the natural language parser, which enables the skill to translate user utterances into OMRQL. OMRQL is a query language that's similar to SQL but is based on object models, which, in this case, are the query entities.
Before you begin, you need the following:
-
A skill that was created using Visual mode.
-
A data service integration for connecting to the data service as described in Connect to the Data Service.
To create query entities for the desired tables in your data service:
-
From the Entities page, click More, and then select Import from Data Service.
The Import Query Entities dialog appears.
-
Select the data service, and then select the tables and attributes that you want to use in the skill.
-
Click Import.
The skill adds query entities for the selected tables. It sets the entity and attribute primary names based on the canonical names. For example, if the canonical name is "invoice_num", the primary name will be "invoice num".
-
For each query entity that was added, select the entity, click the Configuration tab, and verify that the primary key is set in the Backend Mapping section.
At this point, you can test the queries using the primary names for the entities and attributes, such as "show invoices where invoice num is 12345". But first, you must click , and then, after it completes, you can click Test Queries to try out utterances, or click Preview to test in the conversation tester.
Because you are working with a minimal SQL dialog skill, you can train with either Trainer Ht or Trainer Tm. However, after you add autocomplete suggestions, routing data, and custom training data, Trainer Tm produces more accurate results.
Your next step is to teach the skill how the end users refer to the entities and attributes. See Train the Skill to Convert Natural Language Utterances into SQL.
Train the Skill to Convert Natural Language Utterances into SQL
As an AI trainer, your job is to enable the natural language parser to translate natural language utterances such as "how many invoices have a due date before 12/15/22" into an OMRQL query for retrieving the answer from the underlying data source (the physical model). You do this by building an intuitive logical model of the data that closely reflects natural language.
After the logical model is created by importing from the data source, you use primary names, synonyms, value lists, and utterances to help the skill's natural language parser associate natural language phrases with the physical model's tables and columns.
-
To teach the skill about the different ways that people refer to the objects, you add primary names and synonyms as described in Provide Training Data Through Names and Synonyms. For example, you might want to teach the skill that people use "invoice number" to refer to the
invoice_num
column, and you might also want to add "invoice no" and "ref number" as synonyms. -
To help the skill identify attribute values in an utterance, you create sample value lists and associate them with attributes as described in Provide Training Data Through Value Lists. For example, you might create a value list that contains actual payment statuses and associate the list with the invoice's
payment status
attribute. -
To help the skill identify attribute values based on patterns, you create regular expression entities and associate them with attributes as described in Provide Training Data Through Regular Expressions. For example, you might create a regular expression entity with the expression
(\\d{1, 2}|(0|1)\\d{2}|2[0-4]\\d|25[0-5])
and associate it with theip_address
attribute. -
When the skill isn't able to correctly translate an utterance into OMRQL, you can add an utterance-to-OMRQL mapping to the query entity dataset as described in Provide Training Data Through Utterances and Test and Repair. You can also add utterances to help the skill know when to route an utterance to the flow that processes it as an SQL execution (that is, translates to OMRQL and then sends an SQL query to the data source).
Provide Training Data Through Names and Synonyms
To help a SQL Dialogs skill associate natural language phrases with the underlying data structure (physical model), start by taking the identified utterances that the skill should handle (the training corpus), and analyzing them to discover the different ways that end users refer to the entities and attributes.
For example, suppose that you have these utterances in your training corpus:
-
Show me the invoices with outstanding balances greater than zero.
-
What's the amount due for reference 12656?
Here, you see that people use "outstanding balance" and "amount due" to refer to the amount_remaining
column. You also see that "reference" is one way people refer to invoice_num
.
In addition to the training corpus, you also might want to crowdsource utterances from your target users to get more phrases and analyze those as well.
After you compile your list of the ways people refer to the entities and attributes, pick which term you want to use for the primary name of each entity and attribute. They should be names that are closest to the most common usages. When you choose the name, consider that the out-of-the-box NLP model won't likely understand domain-specific relationships. For example, it won't automatically understand that invoice number and reference refer to the same thing. Because invoice number is commonly used and also is closest to other commonly used terms such as invoice no and bill number, you would make it the primary name.
Treat the rest of the terms as synonyms. In the above example, you would add reference, invoice no, and bill number to the synonym list.
Note that the primary name is the default for the result column headers and labels but you can change that in the Presentation tab.
Using your list, you create the training data in the Entities page.
-
To set the entity's Primary Name and Synonyms, open the entity's Configuration tab and expand Natural Language.
-
To set the attribute's Primary Name and Synonyms, open the attribute's Natural Language tab.
When processing utterances, the natural language parser doesn't consider the physical model's canonical names, that is, it doesn't look at table and column names. It only uses the natural language mappings that you define using names and synonyms (the logical model).
Provide Training Data Through Value Lists
You can improve the natural language parser's accuracy by associating attributes with value lists or dynamic entities. This helps the parser identify an attribute based on its known values. You use Referenced Entity on the attribute's General Information tab to associate the attribute with the reference entity's values. For value list entities, you can automatically create the entity, import the data service's values, and associate it as a referenced entity all in one step.
When deciding whether to use a value list or a dynamic entity to store the values, consider whether the entity is open or closed.
-
An open list is one that is infinite or dynamic (or both). For open lists, consider creating and maintaining a dynamic entity instead of a value list. If you choose to use a value list, then you should curate the list to make sure that it at least contains the most commonly used values. For example, for a vendors list that most likely grows over time, you'll want the list to include your most frequently used vendors. This is because queries about a vendor without using the word "vendor", such as "show the summary flag for Seven Corporation", won't match if that value isn't in the value list. You thus increase the frequency of correct resolutions by at least including the most frequently used values.
-
A closed list is a static finite list. These are ideal for value list entities.
For both value lists and dynamic entities, add both singular and plural (when applicable) versions of synonyms for each entity value to indicate the ways that end users will refer to the value.
Synonyms are especially important when the list contains values that end users don't typically use. Take, for example, this list of valid payment statuses. Users will be much more likely to use words like paid, unpaid, and partially paid than to use Y, N, and P. Adding these words as synonyms helps to insure that the NLP recognizes that the users are referring to the payment status attribute.
Payment Status Values | Synonyms |
---|---|
Y | paid |
N | unpaid, not paid |
P | partial, partially paid, unpaid |
When a term describes more than one entity value, then add that term as a synonym to each one. For example, both N
and P
indicate that the invoice is unpaid. If you add "unpaid" as a synonym for both statuses, then "show unpaid invoices" will retrieve invoices with a payment_status value of N
or P
.
For dynamic entities, you create the entity and then use Referenced Entity on the attribute's General Information tab to associate the attribute with the list.
For value lists, you can create a value list from the data service and associate with an entity by following these steps:
-
On the Entities page, edit the attribute and go to the General Information tab.
-
Select Entity from the Type drop-down list.
-
Click If the desired entity doesn't exist, you can generate a value-list entity based on the background mapping by clicking here. The value list is created and populated from the data service, and the Referenced Entity points to the new value list.
-
(Optional) To increase the chances of user input matching a value from the list, open the value list entity and switch Fuzzy Match to on. Otherwise it uses strict matching, meaning that the user input must be an exact match to the values and synonyms. For example, “cars” won’t match “car”.
Fuzzy matching uses word stemming to identify attributes from query. For example "pound" matches "pounds", "hold" matches "on hold", "needed approval" matches "needs approval", and "rent-lease" matches "rent lease".
Note that fuzzy matching doesn't work for "_" and "?". Also, partial matching doesn't work. For example, "Seven" doesn't match "Seven Corporation". If you need to enable matching for such strings, add them to the synonyms list.
-
Click Apply to save your changes.
If any value in the data service's physical table ends with a period, a question mark, or spaces, then those characters are not included in the value list because they are not allowed for canonical names.
Provide Training Data Through Regular Expressions
If an attribute's values must match a certain pattern, then you can help the natural language parser identify that attribute's values by associating the attribute with a regular expression entity.
This can be helpful when all values must follow a specific pattern and the set of valid values is too large for a value list or is infinite. For example, for an ip_address attribute, you could associate it with a regular expression entity named IpAddress, which has the regular expression (\\d{1,2}|(0|1)\\d{2}|2[0-4]\\d|25[0-5])
.
Note that, similar to value lists, if more than one attribute in the model can be associated with the same regular expression, the users will need to provide sufficient context in their query distinguish between the two (or more) attributes.
To associate an attribute with a regular expression:
-
On the Entities page, click + Add Entity, select Regular Expression from the Type drop down, enter the regular expression, and click Create.
Ensure that you craft the regular expression in a way that prevents the natural language parser from association non-related values with the attribute.
-
Select the entity with the attribute that you want to associate with the regular expression, edit the attribute and go to the General Information tab.
-
Select Entity from the Type drop-down.
-
From the Referenced Entity drop-down, select the regular expression entity.
-
Click Apply to save your changes.
Provide Training Data Through Utterances
As an AI trainer, you'll encounter natural languages utterances that the skill can't translate to OMRQL. For example, the model may not be able to handle domain-specific synonyms that don't seem to be closely related to the primary name. Another example is when the model is not able to distinguish between two similar entities. When this happens, you can use training data to teach the skill how to correctly parse the utterance into OMRQL.
Adding to the training data is often referred to as custom training. You use custom training to teach the model to associate words and phrases with attributes, entities, and OMRQL keywords in the context of a full utterance by mapping the utterance to OMRQL.
For each scenario that you are fixing, start with 20 utterances and add more as needed. Because too many examples might cause the model to over predict attributes and operators, you should focus on a smaller set of diverse utterances rather than a large set of similar, lesser quality ones. Note that there is a limit of 120 utterances per skill.
All the values in the OMRQL statement must exactly match the database value and format. Take, for example, the utterance "who is the employee whose name is Jones". If the database values for the name attribute are all capital letters, then the name value must also be all capital letters. That is "SELECT * FROM Emp WHERE name = 'JONES'".
When the utterance that you are mapping uses a synonym for the actual database value, then that synonym must be defined for the value in a value list, and the OMRQL must use the actual database value. For example, if the utterance is "show the department whose location is the big apple", then "big apple" must be defined in the dept_loc value list as a synonym for the value "NEW YORK", and the OMRQL must be "SELECT * FROM Dept WHERE loc = 'NEW YORK'".
You can add utterances that contain absolute dates, such as "invoices due on 5 Jan 2022", but don't use utterances with relative dates or dates without the year. For example, if the utterance is "invoices due today", then today's date would be hard-coded into the OMRQL as SELECT * FROM invoices WHERE due_date = '2022-01-01'. Also, if you use a relative date such as "today", then you may get a error that relative dates aren't supported.
Here are some best practices for custom training utterances:
-
Balance the number of utterances: Some of the more complex scenarios may need more utterances than the simple ones, but try to balance the number of utterances per scenario.
-
Balance the training of similar attributes and entities: If you have two similar attributes, and you need to provide custom training data for one of them, then you also should provide the same amount of training data for the other. When the training data concentrates only on one of the similar attributes, then the model might over predict that attribute over its counterpart. The same is true for similar entities. For example, payment currency and invoice currency are similar attributes. If payment currency is over-represented in the training data, the model might predict payment currency even when the utterance asks for invoice currency.
When you need to teach the model how to distinguish between two similar or closely-related attributes, balance the weighting of importance by providing half the utterances for one attribute and half the utterances for the other.
Vary the utterances that refer to these similar attributes. For example, here are contrasting pairs of utterances to help the model distinguish between
amount_remaining
andamount_paid
:- tell me the amount remaining for approved invoices
- show us the amount paid for approved invoices
- view total amount due to be paid to vendor AAD
- calculate the total amount that was paid to vendor AAD
- what is the amount due on invoices to vendor AAD
- list the paid amount on invoices to vendor AAD
-
Balance the training of values that match primary names or synonyms: Say, for example, that your model has a
manager
attribute and "manager" is also a value for the employeejob
attribute. If you want to add "How many managers" to the training data, then you should balance this training data with utterances that use themanager
attribute, such as "Who is the manager of employee Adam Smith", as well as utterances that use the managerjob
, such as "Show all managers". That way, the model can learn to differentiate between the two usages. If you don't include examples for both types of usage, then the skill might over predict one usage over the other. -
Diversify phrases: The best practices for diverse phrasing for custom data are similar to those for intent utterances:
-
Use full sentences.
-
Use different verbs. For example: view, list, show, tell, and see.
-
Use various synonyms and paraphrases in addition to the entity or attribute name.
-
Use different pronouns. For example: show me, can we see, tell us, I want.
-
Vary the sentence structure. For example, put the attribute value near the beginning, middle, and end of the sentences.
-
If you have utterances with an aggregation, such as
AVG
, then also add utterances with other operators as well. -
If possible, use different clauses, such as group by and where clauses with AND and OR conditions.
-
-
Diversify Values: When you use more than one value in your scenario's utterances, the model is better able to recognize different values. Include values with different word lengths. Include some values with special characters such as '/' and "-". Include a few values with special keywords such as 'and'.
-
Include a mix of known and unknown values. For value-list attributes, use a representative set of attribute values (but not all) to train that value-list matches are important signals. Also, for value lists that aren't closed lists, include values that aren't in the value list to teach it to also associate particular phrasings with the attribute.
To add a mapped utterance to the training data:
-
If the Train button has a red badge, click , and train using Trainer Tm.
-
In the Entities page, go to the Dataset tab and click Query Entities.
-
Click the Training Data tab.
-
Click Add Utterance.
The Create Utterance dialog displays.
-
Enter the utterance and click Continue.
The dialog displays the OMRQL query for the utterance. If it can't translate the utterance into the query, the query will be blank.
Note that if the skill hasn't been trained, it can't translate the utterance into an OMRQL query.
-
Review the query and correct it if it's wrong.
For OMRQL keywords and examples, see OMRQL Reference.
-
Click Done to add the mapped utterance to the training data.
Provide Query Suggestions for SQL Dialog Users
You can help users learn about the database queries that they can make by providing autocomplete suggestions. These suggestions provide hints about what kinds of questions the logical model is capable of answering. The utterances also help the skill with routing.
To create autocomplete suggestions for a SQL Dialogs skill:
-
If the Train button has a red badge, click and train using Trainer TM.
-
In the Entities page, go to the Dataset tab and click Query Entities.
-
Click the Autocomplete Suggestions tab.
-
Click Add Utterance.
The Create Utterance dialog displays.
-
Type the utterance, click outside the text box, and then click Continue.
The dialog displays the OMRQL query for the utterance. If it can't translate the utterance into a query, the query will be blank.
Note that if the skill hasn't been trained, it can't translate the utterance into an OMRQL query.
-
Review the query and correct it if it's wrong.
For OMRQL keywords and examples, see OMRQL Reference.
-
Click Done to add the mapped utterance to the autocomplete suggestions.
Route Utterances to the SQL Dialogs Conversation
If your skill has intents, or it is in a DA, then, just like with intents, your skill needs utterances to help it route SQL queries to the SQL Dialogs conversation. The routing mechanism uses the autocomplete suggestions, training data, generated routing utterances, and handcrafted routing utterances to learn how to recognize SQL queries. You can see each type of utterance in the separate tabs on the Query Entities Dataset page.
From the Generated Routing Data tab, you can quickly generate 100 routing utterances that are based on the logical model as described in Generate SQL Dialogs Routing Data. You can then review them, edit if necessary, and approve or unapprove. Those that you approve are added to the Combined Routing Data tab and are marked as either synthetic or, if you edited them, refined.
The Combined Routing Data tab lists all the dataset types. In addition, that is where you can manually add handcrafted routing data as described in Handcraft SQL Dialogs Routing Data.
Note that the total number of autocomplete, training, generated, and handcrafted utterances can't exceed 10,000. If you exceed that limit, you'll see the message "The maximum number of corpus examples for this bot (10000) has been reached." There's also a limit of 120 training utterances.
To learn about autocomplete suggestions and training data, see Provide Query Suggestions for SQL Dialog Users and Provide Training Data Through Utterances.
Tip:
Each entity has a dataset tab where you can see the utterances that use attributes from that specific entity.Generate SQL Dialogs Routing Data
If your skill has intents or is in a DA, then, just like with intents, your skill needs utterances to help it route SQL queries to the SQL Dialogs conversation. In addition to the autocomplete suggestions, training data, and handcrafted routing data, the routing mechanism uses generated routing utterances that you create from the Generated Routing Data tab on the Query Entities Dataset. The generated utterances represent a broad coverage of questions about all the query entities in the logical model.
To generate routing data:
-
If the Train button has a red badge, click and train using Trainer TM.
-
In the Entities page, go to the Dataset tab and click Query Entities.
-
Click the Generated Routing Data tab.
-
Click Generate.
The Generate routing data dialogue displays.
-
In the Select entities field, select All. The first time you generate the routing data, you must generate data for all the entities. After you generate the initial set, you can come back and generate for specific entities if there is a need.
-
Click Generate.
The skill generates 100 utterances, which reflect questions that the logical model can answer.
-
Review the generated data and edit any that need refining.
Tip:
The utterance is not editable if it has been approved. If you want to change an approved utterance, unapprove it, edit it, and then approve it again. -
Delete entries where needed and approve the rest.
The approved utterances are added to the combined routing data. If you edited an utterance, then its routing subtype in the Combined Routing Data tab is Refined. Otherwise it is Synthetic.
Handcraft SQL Dialogs Routing Data
If there are valid SQL queries that the DA or skill is not routing to the SQL conversation, then you need to add those utterances to the routing data from the Combined Routing Data tab in the Query Entities Dataset page.
To add handcrafted routing data:
-
If the Train button has a red badge, click and train using Trainer TM.
-
In the Entities page, go to the Dataset tab and click Query Entities.
-
Click the Combined Routing Data tab.
-
Click Add Utterance.
The Create Utterance dialogue displays.
-
Type the utterance and then click outside the text box.
-
Click Continue.
-
Review the OMRQL query to verify that its results would answer the query. If it doesn't, correct the query and then click Reinterpret. See OMRQL Reference for the OMRQL query keywords.
-
Click Done.
The utterance is added to the data with the routing subtype set to Handcrafted.
Configure Presentation of Entities and Attributes
Here are the things you can do to control when and how the entity rows and attributes are displayed in the results:
- Configure Whether to Display Form or Table
- Show One or Two Horizontal Sections in Form
- Set the Title for the Results
- Define an Entity's Default Sort Order
- Define Which Attributes to Include When Not Specified by the Utterance
- Define Which Attributes to Always Include in the Results
- Configure the Results Page Size
- Add Buttons and Links to Results
- Add a Custom Attribute
- Dynamically Configure Presentation Using Event Handlers
Typically, the database expert and the conversation designer work together on this task, as one has database schema expertise and the other has familiarity with user expectations.
You can test your changes by clicking Preview to open the conversation tester and entering an utterance to retrieve the appropriate data.
Tip:
Most of the changes that you make will require natural language parser (NLP) retraining. When you test your changes, if the Train icon has a red badge (), you'll first have to click Train and complete the training process.Configure Whether to Display Form or Table
The skill can display the entity's results as a table, a form, or a table form (where you can expand a row to see more details in form mode). You use the layout conversion fields on the entity's Presentation tab to configure when the results should be displayed in each mode.
By default, the skill displays each row in the response as a form unless the number of rows exceeds a threshold that you specify for Use form layout for this number of rows or less. Here are examples of a response in form mode and table mode:
In the case where the number of columns exceeds a threshold, the skill displays a table form. With a table form, only the specified number of columns are displayed and the user can expand the form to see the other attributes. Use Switch to table form layout when number of columns exceeds this number to specify the threshold. Here's an example of a table form layout for the column threshold of 2.
Show One or Two Horizontal Sections in Form
By default, in form mode, the skill displays all the result attributes one below the other. To save room, you can set Number of Horizontal Sections in Form Layout to 2 to display two columns of attributes.
Set the Title for the Results
By default, the skill uses the query entity's name for the results title, but you can use the Display Name on the Presentation tab to set a different title.
Note that after you set the display name, you can't clear the field.
Define an Entity's Default Sort Order
You can specify a default sort order for the skill to use whenever the user's utterance doesn't specify one. To set the default, go to the entity's General tab, click Add Attribute Order, select an attribute and select its order (Ascending or Descending). You can continue clicking Add Attribute Order to add more attributes to the sort order.
Define Which Attributes to Include When Not Specified by the Utterance
If the utterance doesn't name any attributes, then you probably want the results to include some essential fields. You can use Default Attributes in the entity's Presentation tab to specify these fields. For example, for an invoices
entity, you might want to display invoice_num, invoice_date
, and invoice_amount
when no attributes are named.
Note that you can't add attributes of type query entity to the default attributes list.
Define Which Attributes to Always Include in the Results
When an utterance identifies specific attributes, you might want the result to include not only the requested attributes, but also some context. For example, if someone enters "show invoice amounts", the data won't make sense if it only shows the invoice_amount
values, and not some identifying context like invoice_num
. Use Minimum Attributes on the entity's Presentation tab to identify the minimum attributes.
You cannot add attributes of type query entity to the minimum attributes list.
Configure the Results Page Size
Use the Maximum number of rows per page on the entity's Presentation tab to set how many rows to display at once.
The user can click buttons to page through the results.
Add Buttons and Links to Results
You can add buttons and links to a query entity's results at both the global level and the row level. A row action appears in each row, and a global action appears below the results.
For example, for an employee entity, you could add a global action that links to the company's employee search page. At the row level, you could add an action for a common follow-up query, such as a query about the employee's department.
You add actions from the entity's Presentation tab. If you have more than one action, you can indicate the sequence in which the actions appear. For QUERY action types, you'll need to provide an OMRQL query. For URL action types, you'll need to set the URL.
For row level follow-up actions, you can use ${row.attributeName}
to reference each row’s attribute values. For example, select * from Emp WHERE dept.loc = "${row.loc}"
. At run time, each row’s button will have a different value for the query. This syntax is only available for row level actions.
You can optionally restrict when the action appears. For example, you might have a row action to show an employee's direct reports, which should appear only if the employee's job is manager. To do that, switch Visibility Expression to On and provide a FreeMarker expression, such as ${row.job = 'MANAGER'}
.
Row actions appears as a buttons or links in each row in a form or table form layout. However, they do not appear in table layouts.
Add a Custom Attribute
You can add your own custom attributes to display additional information, such as derived or calculated values.
-
From the Attributes tab on the entity page, click + Add Attribute, and provide a canonical name and type.
-
On the Natural Language tab, provide a primary name and optionally add synonyms.
-
On the Backend Mapping tab, select SQL Expression and add the expression.
If the expression references a column, use the column name from the physical model
(database schema) and prepend ${alias}
. For example, for an
invoices
entity, you might add an amount_to_pay
attribute with the expression ${alias}invoice_amount +
${alias}discount_taken
where:
invoice_amount
anddiscount_taken
are existing physical column names in theinvoices
table.- The new derived column
amount_to_pay
is the sum of values from theinvoice_amount
anddiscount_taken
physical columns.
You can use this table to determine what type to use for the attribute:
Type | When to Use | Examples |
---|---|---|
Number | The values are only numeric and are not restricted to a set list. | Numeric employee ID, invoice amount |
Date | The value is a date without a time. | Hire date |
Date/time | The value can have both a date and a time. | Departure date and time |
Entity | The attribute is associated with a value list entity. Note that if the value list enumerates all the valid values (that is, a closed list) and the values are rarely used in natural language utterances, you should add synonyms for the values in the list. | status (closed), supplier names (open) |
String | Use for text that can contain numbers and characters where it doesn't make sense to associate with a value list. | Alpha-numeric invoice number, product description |
Query entity | Only use when you need to link to another query entity. | No examples |
Boolean | Do not use. | Not applicable |
Dynamically Configure Presentation Using Event Handlers
If you'd like the skill to dynamically change the way that the skill presents SQL query results, you can add data query event handlers to a custom component package, add the package to the skill as a custom component service, and then associate your entities with their specific handlers from the entity Presentation tabs. The skill triggers an entity's data query event when that query entity is the first named entity in the FROM clause (the root entity).
For example, you can dynamically add a row count to the header text, add a row to the table to show a sum, or determine when to show or hide an attribute.
To learn how to build data query event handlers, see Writing SQL Query Event Handlers.
Define Query Rules
Here's how you use an entity's settings on the Entities page to control the ways in which end-users can ask about the data and how to evaluate the results.
You can test your changes by clicking Preview to open the conversation tester and entering an utterance to retrieve the appropriate data.
Tip:
Some of the changes that you make will require natural language parser (NLP) retraining. When you test your changes, if the Train icon has a red badge (), you'll first have to click Train and complete the training process.-
Identify Which Attribute to Use for Measuring or Comparing: If the utterance asks to compare entity items to a number or asks to rank the entities using a superlative like greatest or least, which measurable attribute, if any, should the skill use to perform the comparison? Say, for example, the users ask about the greatest supplier, you might want the skill to use the
rating
attribute for comparisons. To specify which attribute to use for measuring or comparing, go to the entity's General tab and select the attribute from the Measure By drop-down. If the ranking is opposite of numerical order, such as 5 being better than 1, then you should also set the attribute's Invert Comparison to true on its General Information tab. -
Specify How to Compare Measurable Attributes: By default, measurable attribute values are compared using numerical order, where 1 is less than 5. However, sometimes it is more appropriate to invert the comparison where 1 is better than 5. For example, when looking at race results, the 5 best times are the lowest values in the results. To invert comparisons for an attribute, set the attribute's Invert Comparison to true on its General Information tab. Note that this setting also affects the attribute's sort order.
-
Allow Partial Matching for Strings: If you expect that users will frequently leave out leading or trailing characters or values, such as "manager" instead of "department manager", then consider enabling partial matching. When partial matching is turned on, the generated SQL "where clause" uses
upper (<column-name>) LIKE UPPER(%<string>%)
instead of= <string>
. You can enable partial matching on the attribute's General Information tab. Note that the partial matching behavior for entity attributes is different from fuzzy matching behavior for value lists. -
Specify how to resolve ambiguous dates and times: For attributes of type date or datetime, you can specify whether ambiguous values, such as "Wednesday", should resolve to the past, the future, or the nearest date or time. You can set this using the Temporal Preference on the attribute's General Information tab.
WARNING:
Keep in mind that setting the Temporal Preference to the nearest date or time only works for input of fixed dates and time, such as "Wednesday". If a user enters a duration value, such as "two days", the query will not resolve, since a duration value is the same for both past and future. Unless you are fairly certain that a user will never enter a duration value, you should only set the Temporal Preference to past or future.Tip:
If an attribute can sometimes default to the past and sometimes the future depending on the context, then consider creating custom attributes with different temporal preferences. For example, for adue_date
attribute, you could add adue
attribute with a future preference and anoverdue
attribute with a past preference.
Enable Natural Language Queries for Denormalized Columns
If you have a denormalized attribute with a name that uses a pattern to identify the attributes that the column represents, such as PTD_LBR_CST, you can make the denormalized attribute understandable to the natural language model by mapping a normalized entity to it through the use of a column expansion backend mapping.
For example, say that you have a costToSales query entity with the attributes PTD_LBR_CST, QTD_LBR_CST, YTD_LBR_CST, PTD_SUB_CST, QTD_SUB_CST, YTD_SUB_CST.
To enable the skill to associate natural language queries with these attributes, you create a Cost query entity that contains the uniquely-identifying attributes, such as project_num, plus period, type, and cost. The period and type attributes are of type entity and reference the period (PTD, QTD, YTD) and type (LBR, SUB) value lists. The cost attribute's backend mapping is a column expansion with the expression "${period}_${type}_CST". The final step is to add the cost attribute to the costToSales entity, which references the Cost query entity to link the two entities.
When the query is "what are my YTD labor costs", the backend column expansion mapping tells the skill to retrieve the value from the YTD_LBR_CST attribute, which is in the costToSales entity (assuming that the necessary primary names and synonyms are set).
Test and Repair
As you define and add training data to your entities and attributes through names, synonyms, value lists, and the training data in the query entities dataset, you'll want to test how well the training data helps the natural language parser translate the end user's utterances into SQL queries.
Tip:
If the Train icon has a red badge (), you'll have to click Train and complete the training process before you can test the utterances.The Entities page has a Test Queries link, which opens the query tester for trying out your use-case utterances. In the tester, you can enter your test utterance and review the OMRQL query that the skill generated.
If the tester translates the utterance to a query, review the OMRQL query to verify that it will produce the desired results. If the OMRQL query isn't correct, you'll need to repair the skill by using the appropriate fix:
-
Add synonyms for an entity or attribute. See Provide Training Data Through Names and Synonyms.
-
Associate an attribute with a value list or add items to a value list. See Provide Training Data Through Value Lists.
-
Add the utterance and corrected OMRQL to the training data in the query entities dataset to teach the model to associate words and phrases with attributes, entities, and OMRQL keywords in the context of a full utterance. See Provide Training Data Through Utterances.
Tip:
Consider using Save as Test Case to save some of your valid queries to the batch tester, which you can use to ensure that changes you make don't negatively impact other areas. See Monitor with Query Entity Batch Testing.Note that some utterances might not translate correctly because of limitations in the SQL Dialogs feature. In some cases you can workaround these limitations by adding custom training data. See Troubleshooting SQL Queries.
If the query tester reports that there's insufficient training data, you can click View JSON to get information about how it parsed the utterance. The translatable
value indicates whether the model supports the query. The confusionSpanText
may give you a clue about what part of the query isn't supported.
For utterances that can't be translated, first check if you introduced a typo, your query is too vague, or your query is outside of the model's scope. These issues can't be resolved by training. Otherwise, you might be able to resolve insufficient training data by adding a synonym or adding the utterance to the custom training data in the query entities dataset. Here are some examples of the kinds of insufficient training data issues that you might be able to resolve by adding custom training data.
-
Attribute confusion: For example, does status refer to payment status or approval status.
-
Attribute-value confusion: For example, "how many managers are there" (is it referring to the
manager
attribute's value or the employee's job value?). -
Search values that are also keywords or operators: For example, distinguishing the synonym "total" from the operator
SUM
.
If the OMRQL is valid, you can test how the skill translates the OMRQL to SQL by clicking Click to test this in the conversation tester. The Conversation Tester displays along with the results.
In the conversation tester, you can see the OMRQL and SQL statements on the SQL Dialogs tab. When it can't translate the query, it indicates that the query isn't translatable and shows what text caused the confusion.
Troubleshooting SQL Queries
When a query doesn't resolve as you expect, it might be because the model doesn't have enough information or the utterance is out of scope. It also might be because of SQL Dialogs limitations.
For cases where the model doesn't have sufficient information, see Test and Repair to learn how to resolve the issues. There are slso SQL Dialogs limitations that can prevent the natural language parser from translating the utterance to OMRQL correctly. This section provides information on these limitations and ways to work around them, where possible.
General Limitations in SQL Dialogs
The table below outlines general limitations in SQL Dialogs that you should be aware of. These limitations don't have workarounds.
Category | Limitation | Examples of Unsupported Queries |
---|---|---|
Number of entities and attributes supported | The logical model can have a total of 50 attributes plus entities. This limit includes any virtual attributes and entities that are created. | |
Non-English query | Any query in a language other than English. | numero total de empleadas |
Use of pronouns | Using pronouns such as "I", "me", and "my" in an utterance. |
|
Yes and no questions | Any question for which the answer is a yes or a no. SQL Dialogs only supports queries for which the answer is a set of results from a data table query. |
|
Negation |
Utterances that contain negation words such as "not" and "no" or queries for values that indicate negation or null values. |
|
Complex SQL operators | SQL Dialogs doesn't support the more complex queries
that involve sub-queries, SET operators (INTERSECT, UNION, EXCEPT, and
NONE), queries that require arithmetic operators, and the EXISTS and NOT
keywords.
While, on a rare occasion, you might find a complex query that does resolve correctly, to ensure consistent results, you should consider using database views or creating virtual attributes as described in Add a Custom Attribute. |
|
Implicit SQL operators |
SQL Dialogs doesn't support SQL clause functions that aren't explicitly requested. For example:
|
Distinct:
Aggregation:
Order by
|
Limited support for follow up questions | SQL Dialogs doesn't support follow-up questions out of
the box. That is, users can't utter a new follow-up question to update
the response.
Tip: You can add quick actions to the results in the form of links or buttons that perform common follow-up queries. See Add Buttons and Links to Results. |
Here are examples of follow up queries for the original
utterance "show all employees in Seattle"
|
Troubleshooting Basic Query Issues
Category | Description of Issue | Examples of Unsupported Queries | Workaround |
---|---|---|---|
Select attribute | Selecting more than 3 attributes | Show the name, job title, salary and commission of all employees | Add custom training data. The training data can include examples covering different entities, and a few different sets of 4 (or more) attributes. |
Select entity | Requesting more than one entity |
|
Use custom training data to teach the entity to output
one attribute from the second entity. For example, for "show the
supplier of each invoice", you can add training data that maps the query
to the OMRQL: select invoiceid, vendor.vendor_name from
invoices
|
Where | Three or more conditions in the where clause | show employees who were hired after 2000 and work in the Finance department as an analyst | Add training data with examples that contain multiple conditions |
Order by | Ordering by more than a single attribute or entity | show employees sorted by their job title and name | Add training data with examples that contain ordering by 2 or more attributes |
Group by | Group by more than a single attribute or entity | show average salary per job and department location | Add training data with examples that contain grouping by 2 more attributes or entities |
Group by + Having | More than one condition in the having clause | show jobs that have at least 20 employees and an average salary of above 30000 | Add training data with examples that contain more than one condition in the having clause |
Self joins | If an entity has a link to itself, then that computation may not be possible for the model to do, even with custom training data. | Here, the queries are requesting employee data that
links to employee data.
|
There is no verified workaround. |
Troubleshooting Date and Time Issues
Category | Description of Issue | Examples of Unsupported Queries | Workaround |
---|---|---|---|
Implicit date and datetime values | When you filter by a date or datetime, you must
explicitly provide context for the where clause.
For example, instead of saying "which invoices are overdue", you must say something like "which invoices have a due date before today." |
|
Create a virtual attribute (e.g. to indicate whether an event is upcoming) and then use custom training to teach the model the expected behaviour. |
Implicit past or future reference | For date and datetime attributes, you use the
attribute's Temporal Preference on the
attribute's General Information tab to specify
how to resolve ambiguous dates or datetimes.
Utterances that imply that an ambiguous value must be resolved as a past or future date or datetime are ignored and the Temporal Preference is used. |
employees that will be hired on Wednesday
|
You might be able to create 2 derived attributes to solve this issue for your scenario. |
"Past" context with < and
> operators
|
SQL Dialogs doesn't support the use
of< and > operators on past
dates or datetimes containing a duration.
|
|
No reliable workarounds. Trying to teach something like this with custom training may cause the model to start incorrectly predicting this in other cases. |
Time without a date | SQL Dialogs doesn't support queries that have times but not dates. | orders that are delivered at 3 pm | No known workaround. |
Recurring dates | SQL Dialogs doesn't support dates that specify a repeating value over a specific interval. | which meeting takes place on the first Monday of every month? | No known workaround |
Troubleshooting Attribute Selection Issues
Category | Description of Issue | Examples of Unsupported Queries | Workaround |
---|---|---|---|
Limited performance for domain specific entity/attribute synonyms | For domain specific or technical synonyms, not commonly used as synonyms the model may struggle to map it to the correct attribute |
Attribute: ip_address Synonym: devices |
Add custom training data. Include examples using the synonyms of the attribute, and another set of examples with the primary name to ensure the model doesn't forget existing functionality |
Identification attribute for entities | Implying an attribute by referring only to the entity | Show invoices containing 1234
|
Add custom training data.
|
Disambiguation | In ambiguous cases with multiple possibilities, the model can't disambiguate | Show amount for all invoices
|
Add custom training data.
|
Implicit attribute reference for values not in the value list entity | If we refer to an attribute only by value and that value is not present in the value list (either canonical value/synonym) |
|
You can add custom training data, but it won't be reliable in all cases. For example, the model can learn that "invoices issued by VALUE" should be mapped to the vendor name attribute. But the model can't learn "invoices for VALUE" or "invoices by value" because the words for, by, in, etc are very general and can be used in a wide variety of contexts. |
Order of value and attribute name | The odel is less robust when the value is mentioned before the attribute name in the utterance. (This is more of a problem when the values are not in the value list and for multi-word values). | show approved invoices | Add custom training data.
|
Troubleshooting Group By Issues
Category | Description of Issue | Examples of Unsupported Queries | Workaround |
---|---|---|---|
Group by across greater than 2 entities | Grouping across multiple entities with aggregations |
|
You can try adding custom training data. However, trying to workaround this issue is risky and would require a lot of custom training data. |
Group by + Order by + Min or Max | Sorting entities based on the minimum or maximum values of the attribute after grouping by that entity. |
|
Add custom training data. |
Group by + Order by + Min/Max + Limit 1 or Limit N | First group by the attribute or entity, sort by the minimum or maximum of a numeric attribute, then select the first row | which department has the highest minimum salary | Add custom training data. |
Select and Having clauses have different aggregations | Select and Having clause have different aggregations | show the average salary for each department that has at
least 10 employees
|
Add custom training data. |
Select and Order by clauses have different aggregations | Select and Order by clauses have different aggregations | show the name and average invoice of each vendor, and
sort the vendors in alphabetical order of the vendor name
|
Add custom training data. |
Multiple aggregations in Select clause | SQL Dialogs supports Select clauses with a single
aggregation, average plus sum, and min plus max.
It doesn't support other combinations such as average plus min, average plus sum plus max, and count plus sum. |
|
Add custom training data. |
Having + Where clauses | Group by query with both a Having and a Where clause | which vendors of type LEGAL have more than 6 invoices? | Add custom training data. |
Troubleshooting Entity Issues
Category | Description of Issue | Examples of Unsupported Queries | Workaround |
---|---|---|---|
Typos |
|
No workaround. Typos in values will not work even with custom training. | |
Entities other than value lists and regular expressions | Associating any attributes with any entity type other than value list (eg: custom ML entities) | No workaround. | |
Fuzzy match | For fuzzy matching, only stemming is supported. | Invoices from Amazon
|
Add synonyms in the value list. |
Fuzzy match | Fuzzy match will not work for _ and
? characters
|
Invoices paid using DBX EFT
|
Add synonyms in the value list. |
Numbers in non-numerical form | SQL Dialogs supports a limited list of numbers that can be represented in other forms (0-10, 20 and 50). All other numbers, if referenced in any format other than numerical, aren't supported |
|
No workaround. |
Troubleshooting Other Issues
Category | Description of Issue | Examples of Unsupported Queries | Workaround |
---|---|---|---|
2 or more numerical filters | Two Where clauses with numbers (whether this is the same attribute or a different attribute) |
|
Add custom training data. |
Order by superlatives |
Asking for the top or bottom N entities. Note
The model is more robust with top than bottom |
Show the best employee
|
Add custom training data. |
Attributes where aggregations are precomputed | If the schema has precomputed aggregations like
total_amount that already stores the sum, or
total_servers that stores the total count of
servers, the model might get confused between needing to use the
SUM/COUNT function or the attribute with the precomputed
aggregation.
|
Show the total amount for invoice 1234
|
Add custom training data. |
Default select | The model sometimes predicts the name or id of the
entity instead of select *.
This is a rare error, and the impact is not critical, as the user sees the minimum attribute instead of default attributes of the entity. |
Show the invoices that are approved.
|
Add custom training data, if this is indeed a problem |
Monitor and Improve
As you build and test your skill, you can use Insights and batch testing to monitor how well the skill is meeting its goals and expose areas that need improvement. As you enter the testing phase and eventually release the skill to the public, you'll want to continue to perform periodic monitoring and testing.
Monitor Using Insights
The skill's Insights page provides several metrics you can use to measure how well your SQL Dialog skill is performing and to determine where to make improvements.
As a business analyst, you might be interested in these data query metrics on the Overview tab:
-
Performance: Conversations Trend by Status (Line) shows the number of conversations over a given time period and whether the traffic is tending up, down, or sideways.
-
The ratio between Correct Queries and Incorrect Queries indicates how satisfied the bot users are with the accuracy of translating utterances to SQL queries.
-
The ratio between Completed and Incomplete conversations shows the extent to which technical issues impact the users' experiences.
-
The ratio between Total Conversations and Unresolved (OOD/OOS) Queries helps measure the extent to which the skill meets the end users expectations.
-
Both Conversations Trend by Type and the ratio between Total Conversations and Data Queries Conversations show the proportion of utterances that are SQL queries.
-
Data Query Entities show which entities are queried the most.
As an AI trainer, you can examine the user messages on the Conversations tab to discover areas for improvement. For example, you can review these user messages:
-
Type: Intent, Outcome: Incomplete user messages indicate problems with translating the utterance to an SQL query. Often, you can fix these issues by adding synonyms or, for more complex queries, adding mapped utterances to the query entities dataset. Note that you also can see these messages by selecting System Handled Errors from the Errors drop-down list.
-
Type: Intent, Intent: unresolvedIntent user messages indicate both out of scope utterances and utterances that the skill doesn't recognize as a data query utterance. For the utterances that are valid data queries but the skill doesn't recognize as such, you often can fix the problems by adding synonyms or mapping the utterances to OMRQL in the query dataset.
-
Type: Data Query, Entities shows the user messages by query entity.
-
Type: Data Query, Outcome: Incorrect shows the messages that the users thought returned incorrect results. You should verify that the results are incorrect, and, if so, add synonyms, value lists, and query dataset entries as appropriate.
Monitor with Query Entity Batch Testing
As an AI trainer, it is good practice to create batch tests to ensure that improving one area doesn't negatively impact another. You can also use batch tests to ensure that changes to the logical model don't have adverse effects on custom training or routing to SQL conversations.
Just as with batch testing for intent utterances, you might want to set aside about 20% of the real-world queries that you gathered to use for query batch testing. You can run the batch test periodically as well as after you make changes to the logical model, custom training, or routing data.
Each test case must belong to a test suite, so before you create a test case, you may want to first create a test suite that reflects some aspect of query testing, such as failure testing, in-domain testing, or out-of-domain testing. We provide a suite called Default Test Suite. You can assign test cases to this test suite if you haven't yet created any others.
You can add a test case to a batch test in two ways:
-
After you test an utterance from the Query Tester, you can select a test suite from the Save as Test Case drop-down to save it to that suite.
-
You can click + Test Case on the Test Suites tab in the batch tester.
To access the batch tester:
-
On the Entities page, click Test Queries.
The Query Tester opens.
-
Click Go to Test Cases.
On the Test Suites tab, you select a test suite and either run all test cases or select and run specific cases. The results are shown on the Test Results page. It takes some time for the tests to complete. You know the run has completed when In Progress shows
0
.
OMRQL Reference
Here are the keywords that you can use when you define OMRQL queries for the utterances that you add to the query entities dataset. Note that you use the canonical names and not primary names and synonyms
Component | OMRQL Keywords | OMRQL Example | Constraints |
---|---|---|---|
Basic Components |
|
SELECT * FROM Emp | The OMRQL can't name attributes that aren't referenced in the utterance. |
Filtering | WHERE | Employee WHERE salary > 100000 | None. |
Linking Entities See Link Attributes for an explanation of how this works. |
. (period) | SELECT * FROM Employee WHERE Departments.location = 'NYC' | None. |
Ordering |
|
SELECT name FROM Employee ORDER BY salary DESC LIMIT 10 | The OMRQL can order data using ORDER BY <ATTR> [LIMIT N] only if the utterance includes the word order or its natural language synonyms such as sorted, ordered, highest, and smallest. |
Ordering without a Specific Attribute See Order by * for an explanation of how it works. |
ORDER BY * | SELECT name FROM Employee ORDER BY * DESC LIMIT 10 | ORDER BY * only works end to end when the "measure_by" value is set for the entity in the UI |
Aggregate Functions |
|
SELECT AVG(sal) from Employee | The OMRQL can contain DISTINCT only if the utterance contains that word or a natural language synonym such as different or unique. |
Grouping |
|
SELECT location, AVG(Employees.salary) FROM Department GROUP BY location | The FROM clause should contain the entity that the group by attribute belongs to |
Grouping by Entity | GROUP BY * | SELECT *, MAX(Employees.salary) FROM Department GROUP BY
*
Note: This groups by the entity in the from clause (The backend converts Group By * to Group By the primary key of the root entity) |
|
Grouping and Filtering | HAVING | SELECT location FROM Department GROUP BY location HAVING AVG(Employees.salary) < 4000 | |
Comparison Operators |
|
SELECT * from Department WHERE name IN ('Sales', 'HR') | For the >, >=, <, and <= operators, the utterance must contain an equivalent natural language synonym such as greater than, at least, less than, and at most.
If the utterance doesn't contain an operator synonym, then the OMRQL must contain =. The OMRQL can contain LIKE only if the utterance contains that word or a natural language synonym such as includes, contains, or substring. The OMRQL can contain BETWEEN only if the utterance contains that word or a natural language synonym such as in the range of. |
Logical Operators |
|
SELECT name FROM Employee WHERE salary > 10000 AND role = 'VP' | None. |
All the values in the OMRQL statement must exactly match the database value and format. Take, for example, the utterance "who is the employee whose name is Jones". If the database values for the name attribute are all capital letters, then the name value must also be all capital letters. That is "SELECT * FROM Emp WHERE name = 'JONES'".
When the utterance that you are mapping uses a synonym for the actual database value, then that synonym must be defined for the value in the value list, and the OMRQL must use the actual database value. For example, if the utterance is "show the department whose location is the big apple", then "big apple" must be defined in the dept_loc value list as a synonym for the value "NEW YORK", and the OMRQL must be "SELECT * FROM Dept WHERE loc = 'NEW YORK'".
Here are some examples of how to write OMRQL for your utterances:
Utterance | SQL | OMRQL | Comments |
---|---|---|---|
Show me all employees who work as a clerk | SELECT * FROM Emp WHERE job = 'CLERK' | SELECT * FROM Emp WHERE job = 'CLERK' | OMRQL is identical to SQL. |
Show me the number of employees who work in sales department | SELECT COUNT(*) FROM Emp AS T1 JOIN Dept AS T2 ON T1.deptno = T2.deptno WHERE T2.dname = 'SALES' | SELECT COUNT(*) FROM Emp WHERE dept.dname = 'SALES' | Instead of a JOIN, use "link_attribute.attribute_name" to refer to an attribute from another entity. |
Adams is a member of what department? | SELECT * FROM Dept AS T1 JOIN Emp AS T2 ON T1.deptno = T2.deptno WHERE T2.ename = 'Adams' | SELECT * FROM Dept WHERE emp.ename = 'ADAMS' | Instead of a JOIN, use "link_attribute.attribute_name" to refer to an attribute from another entity. |
What is the department location and job role of employee Adams | SELECT T1.LOC, T2.JOB FROM DEPT T1 JOIN EMP T2 ON T1.DEPTNO = T2.DEPTNO WHERE T2.ENAME = 'ADAMS' | SELECT loc, emp.job FROM Dept WHERE emp.ename = 'ADAMS' | Notice how the OMRQL is simpler to write than the SQL. |
How many employees are there for every job role? | SELECT COUNT(*), job FROM Emp GROUP BY job | SELECT COUNT(*), job FROM Emp GROUP BY job | OMRQL is identical to SQL. |
Which employee has the highest salary? | SELECT * FROM Emp ORDER BY salary DESC LIMIT 1 | SELECT * FROM Emp ORDER BY salary DESC LIMIT 1 | OMRQL is identical to SQL. |
Show employee name and department name ordered by the salary in ascending order | SELECT T1.ename, T2.dname FROM Emp AS T1 JOIN Dept AS T2 ON T1.deptno = T2.deptno ORDER BY T1.sal ASC | SELECT ename, dept.dname FROM Emp ORDER BY salary ASC | Notice how the OMRQL is simpler to write than the SQL. |
Number of employees in each location | SELECT COUNT(*), loc FROM Emp AS T1 JOIN Dept AS T2 ON T1.deptno = T2.deptno GROUP BY T2.loc | SELECT loc, COUNT(emp) from Dept GROUP BY loc | With GROUP BY, when we are counting a linked entity, we use a new count(LINK) syntax instead of COUNT(*). This makes OMRQL more readable than SQL. |
View the locations with average salary at least 40000 | SELECT T2.name FROM Emp AS T1 JOIN Dept AS T2 ON T1.deptno = T2.deptno GROUP BY T2.name HAVING AVG(T1.sal) >= 40000 | SELECT loc from Dept GROUP BY loc HAVING AVG(emp.sal) <= 40000 | An example of GROUP BY with HAVING clause. |
Average salary of employees in each department | SELECT AVG(T1.sal), T2.dno FROM Emp AS T1 JOIN Dept AS T2 ON T1.deptno = T2.deptno GROUP BY T2.dno | SELECT *, AVG(sal) from Dept GROUP BY * | The goal here is to group by a unique attribute in the
"department" entity. The primary key is the most suitable candidate, but
displaying the primary key might not always be useful.
In OMRL, we instead group by *. The backend will group by the primary key and also display the minimum attributes of the entity to make the result more user-friendly |
Show the location and minimum salary of employees for each department | SELECT T2.dno, T2.loc, MIN(T1.sal) FROM Emp AS T1 JOIN Dept AS T2 ON T1.deptno = T2.deptno GROUP BY T2.dno, T2.loc | SELECT *, loc, MIN(sal) from Dept GROUP BY *, loc | Here, we still want to group by the department entity, but the utterance is also specifically requesting to display the location of the departments. Note how both the SELECT and GROUP BY clause have a * and the requested display attributes. |
View the name and location of the department that has the highest average salary | SELECT T2.name, T2.loc FROM Emp AS T1 JOIN Dept AS T2 ON T1.deptno = T2.deptno GROUP BY T2.name ORDER BY AVG(T1.sal) LIMIT 1 | SELECT *, name, loc from Dept GROUP BY *, name, loc ORDER BY AVG(emp.sal) LIMIT 1 | Another example of grouping by an entity, and displaying the requested attributes, this time with ORDER BY LIMIT 1 |
Show the top 10 employees | SELECT * from Emp ORDER BY rating LIMIT 10 | SELECT * from Emp ORDER BY * LIMIT 10 | Assuming that "best employees" imply ordered by their rating, the rating will be set as the "measure_by" attribute for the Emp entity |
Link Attributes
With the exception of linking entities, the OMRQL components are similar to SQL. Instead of an SQL JOIN, you use a pair of link attributes to link one entity to another. Link attribute have primary names and synonyms that define the relationship between the entities. For example an employee/department attribute link with a 1-1 relationship can have a primary name "department" and synonyms "works in", "belongs to", and "team". A department/employees attribute link with a 1-many relationship can have a primary name "employees" and synonyms "members", and "workers".
Besides the typical primary key/foreign key link attributes you also can have these types of link attributes:
- Multiple link attributes from one entity to another that define multiple semantic relationships.
- A link attribute from an entity to itself that implies a self join.
- A link attribute for an intersection table due to a many-to-many join
Order by *
The asterisk (*
) is used in conjunction with ORDER BY when a user
asks to order something without specifying what to order by. For example, "show the top
10 employees" (unclear what attribute we need to order by). The backend then replaces
the *
with a default, pre-specified attribute
(measure_by
).