In Data Integration, data flow operators represent input sources, output targets, and transformations that can be used in a data flow.
From the Operators panel, drag operators onto the canvas to design a data flow. Then use the Details tab on the Properties panel to configure basic and required properties for each operator. Where applicable, use the Advanced options tab to specify other properties.
In general, a data flow operator can have one or more inbound ports, and one or more outbound ports for data to flow through. For example, you can connect the same source outbound port to inbound ports on a filter, a join, and a target operator. You can also connect another source operator to the same join inbound port.
A data flow must include at least one source operator and one target operator to be valid. While Data Integration supports multiple target operators in a data flow, a target operator can have only one inbound port.
To connect operators, hover over an operator until you see the connector (small circle) on the right side of the operator. Then drag the connector to the next operator you want to connect to. A connection is valid when a line connects the operators after you drop the connector.
A connection line symbolizes how data flows from one node to the other. While you can drag a visible connector from one object to another, you can't have more than one inbound connection line to a filter, expression, aggregate, distinct, sort, and target operator.
Important
For complex data types, see Hierarchical
Data Types to understand what is supported. You might not be able to perform
some configurations on the Attributes tab and Data tab in the Properties panel.
Data Operators
Data Integration provides the source operator and
target operator for adding input and output data entities to serve as the input to data
flows, and the output for transformed data.
To configure the input or output data entity for a data operator in a data flow, you
begin by selecting a data asset, connection, and schema (or bucket).
You make the selections only in the order as displayed on the
Details tab of the Properties panel, by clicking
Select when the option is enabled next to the resource
type.
For example, when you first add a source operator, only Data asset
is enabled for selection. The select option for the subsequent resource type,
Connection, is enabled only after you make a selection for
the preceding object.
When you click Select next to a resource, a panel appears for you to select the object you want. You can use the resource menu to select the object, or click View all to use another panel to browse or search, and then select the object.
Upon selecting a resource object, each subsequent selection is based on the parent-child
relation inherited from the preceding selection. A breadcrumb at the top of the
selection panel shows the selection hierarchy. For example, let's assume that you have
selected the data asset "Oracle Database data asset 1," and the connection "Default
connection." When you select the Schema, the breadcrumb displays
as "From Oracle Database data asset1 using connection Default connection."
After selecting a data asset, connection, and schema (or bucket), you select a data
entity from the list of available data entities.
In general, you can do the following in the Browse data entities
panel:
Browse the available data entities and select an entity by its name.
Filter the available list to search and then select. In the search field, enter a
partial or full entity name and press Enter to start the search. Search is
case-sensitive. For example, if the available data entities include
BANK_US and BANK_EU, enter
BANK and then select from the filtered list.
Use one or more parameters in the search string. For example:
CUSTOMERS_${COUNTRY}
Depending on the data asset type of a resource, after selecting a data entity, more configuration might be necessary on the Details tab or Advanced options tab in the Properties panel.
To keep a resource object such as a data asset or connection in a data flow from being
permanently bound to a specific resource, you assign a parameter to that object.
You can assign the parameter after or before you select the resource object.
In the data flow, add the source or target operator. After making a selection for
a resource type, click Assign parameter next to the
resource name to use another panel to select and assign a parameter for the
selected object. If a suitable parameter type is not available, you can add a
parameter and then assign it.
Note
Assign parameter is not available for a data entity
whose name includes a parameter (such as BANK_${REGION}). You
cannot bind a parameterized entity resource to another parameter.
Alternatively, add the source or target operator, then click Assign
parameter to use one panel to assign a parameter, and select the
resource for the object at the same time. In the Add
parameter panel, you select a resource based on the parent-child
relation inherited from the preceding selection. The breadcrumb at the top of
the panel shows the selection hierarchy.
You can include one or more data flow parameters in the name of the data entity resource
you specify for a data operator.
The syntax to use for a data flow parameter in a data entity name is
${PARAMETER_NAME}. For example:
CUSTOMERS_${COUNTRY}
A parameter name is case-sensitive, and each parameter must have a default value.
For example, CUSTOMERS_${COUNTRY} might return the database table
CUSTOMERS_USA, and BANK_${COUNTRY}/* might return
the Object Storage files in
BANK_EU.
To use parameters in data entity names when you configure a data operator, you can:
Add the parameter to the data flow before you add the data operator
Add the parameter at the time you're configuring the operator's data entity
How to add a parameter
In a data flow, select Parameters from the View menu on the canvas toolbar to open the Parameters panel.
In the Add parameter panel, configure a parameter of the
appropriate data type, for example VARCHAR or NUMERIC, and add a default value.
During data entity configuration in a data flow, you can search available data entities by entering the parameter name in the Browse data entities panel. In the search field, start typing ${ followed by any character. If the front part of the parameter name matches existing parameters in the data flow, a list of suggested names displays. To select a parameter from the list, click the parameter name, complete the syntax by adding }, then press Enter.
How to add a parameter at the time of data entity configuration
In the Browse data entities panel, you can do the
following:
From the More actions menu, select Add data
flow parameter to use the Add data flow
parameter panel. Specify the data type, default value, and
other properties for the parameter you want to add and use.
In the search field, start typing ${ followed by any character. If the front part of the parameter name matches existing parameters in the data flow, a list of suggested names displays. To select a parameter from the list, click the parameter name, complete the syntax by adding }, then press Enter.
In the search field, enter the parameter name for example ${PARAMETER_NAME}). If the parameter name does not yet exist in the data flow, and you press Enter, Data Integration displays the Add data flow parameter panel. Alternatively, after entering the parameter name, select Add data flow parameter from the More actions menu.
In the Add data flow parameter panel, specify the data
type, default value, and other properties for the parameter you want to add
and use.
Source Operator 🔗
Use the source operator to specify the data entities that serve as the input to the data flow.
You can add multiple source operators on a data flow.
Important
If you're using a hierarchical data entity for a source operator, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab or Data tab in the Properties panel.
From the Operators panel, drag a Source operator onto the canvas.
With the source operator in focus, on the Details tab of
the Properties panel, enter a name for the source operator in the
Identifier field, or leave the default name
as-is.
To configure the input source for this operator, select the data asset, connection, and schema (or bucket) by clicking Select as it becomes available next to the resource type. See the following descriptions for more information about the data asset type you're selecting for this source operator.
Data asset type
Description
Object Storage
Amazon S3
HDFS
Click Select next to Bucket to select the compartment, and then select the bucket.
After selecting the data asset, connection, and bucket, proceed to step 5 to select a data entity.
Generic REST data asset
After selecting a connection, the base URL that you entered while creating the data asset is displayed. Follow these steps to configure the source:
Click Select next to Resource to select an endpoint.
Click Select next to Operation to select an operation for the endpoint.
For more properties, see step 7.
Oracle Fusion Applications using Oracle BI Cloud Connector (BICC)
For Schema, select the BICC offering. The selected BICC offering provides the BICC view objects (VO) for data extraction.
After selecting a data asset, connection, and schema, proceed to step 4 to select a data entity.
Oracle Fusion Applications using Oracle BI Publisher (BIP)
For Data asset:
To use a BIP data asset in a data flow, you need an Object Storage bucket as the staging location. If you select a data asset that doesn't have a default staging location specified in that data asset, Data Integration displays a warning notification. When you see the notification, choose one of the following options:
Click Update data asset to add a default staging location to that data asset.
Click Select next to Staging location in the Properties panel to select the Object Storage data asset that has the bucket you want to use for staging.
For Schema:
Select the BIP folder that contains the report as the schema.
After selecting a data asset, connection, and schema, proceed to step 4 to select a data entity.
Other data assets that are not Object Storage, Amazon S3, or HDFS
After selecting a data asset, connection, and schema, proceed to step 4 to select a data entity.
To select a Data entity that's not from an Object Storage, Amazon S3, or HDFS data asset:
Note the following in the Select data entity panel. Where applicable, you can:
Use the Data entity menu to select a data entity by its name.
Click Browse data entities to use another panel to browse or search the available list, and then select a data entity.
See Selecting a Data Entity, if you need help with searching the list by using a full or partial name.
You can include one or more data flow parameters in the name of a selected data entity, by using the syntax ${PARAMETER_NAME}. For example: BANK_${REGION}
Depending on the data asset type for this source operator, select the data entity and complete more configuration, if applicable.
Data asset type
Description
Oracle Database
Oracle Database on Amazon RDS
Oracle Siebel
Oracle Peoplesoft
Oracle E-Business Suite
Oracle ATP and ADW
MySQL
MySQL Heatwave
MySQL on Amazon RDS
Amazon RDS for SQL Server
Microsoft SQL Server
Microsoft Azure SQL Database
IBM DB2
On the Details:
Select the data entity by name.
If applicable, click Enter custom SQL to enter a single SQL statement in the editor.
By providing a SQL query statement, you can define a database source entity and the shape of the entity at the same time. For example, you can define and filter the data in a SQL statement instead of adding a filter operator after you define the entity to use for the source operator.
After entering the SQL query, click Validate to check the statement for errors.
If available and applicable, follow these steps to load only new or updated records from the source to the target:
Select the Incremental load checkbox to identify and load only the rows that have been created or modified since the last time the load process was run.
For Watermark column, select the column that's used to mark the rows that have been incrementally loaded. Only DATE, TIMESTAMP, and DATETIME columns can be used as a watermark column.
Oracle Fusion Applications using Oracle BI Cloud Connector (BICC)
On the Details tab: Select a BICC view object (VO) as the data entity from which to extract data. Then use the Extract strategy menu to choose how the source data is extracted.
Full: Extracts and loads all data from the optional Initial extract date that you specify. If you don't provide a date in Initial extract date, then all data is extracted.
Incremental: Extracts and loads only data that's created or modified after a specific date.
Select Managed to use the last date that's tracked and managed by Data Integration. You don't explicitly specify a date.
Important
The Managed option fetches an extra 24 hours of data into the past.
Data Integration adds the parameter SYS.LAST_LOAD_DATE to track the date of the last load or task run, and stores the last successful load date as a watermark across consecutive runs. When the task is run, you're allowed to change the value of the last date that's stored in the system.
Select Custom to provide the date yourself in the Last extract date field.
Next, select a different BICC external configuration storage for the extracted data, if you don't want to use the Data asset default. The Data asset default is the BICC external storage that's configured within the BICC Fusion Applications data asset when the asset was created or edited.
Optionally complete the following on the Advanced options tab:
Select Enable BI broker if you want the BICC extractor to read input rows directly from the Fusion Applications database instead of a regular extraction path.
Important
BI broker mode is a performance optimization option for advanced users.
BI broker mode is a BICC feature supported only for some data stores. Extraction fails for data stores that aren't supported.
With the BI broker option, extract performance might improve because queries are executed directly against the data stores in the underlying database. The performance improvement depends on the data store, current workload on the BI server, current workload on database, and other factors.
In some situations, enabling the BI broker mode changes the source column data types.
If you don't select Enable BI broker, the data is extracted through the BI server. For more information, see Broker Extract Mode in Configure Offerings to Extract of the Creating a Business Intelligence Cloud Extract guide. The BICC guide is part of the Applications Common book list in the Oracle Fusion Cloud Applications Suite documentation.
For BICC column properties, select the type of columns from the BICC source to include as attributes in the target.
All: All columns of the BICC PVO are included. However, columns with names that start with ExtnAttribute or columns that have the same name and label don't appear.
Default only: Only the columns enabled for extract by default in the BICC PVO are included. It displays all columns including the Flex columns.
Default and primary keys: Default and primary columns of the BICC PVO are included. It displays all columns including the Flex columns.
Primary keys only: Only the primary columns of the BICC PVO are included. It displays all columns including the Flex columns.
Oracle Fusion Applications using Oracle BI Publisher (BIP)
Specify the staging location, which is the Object Storage bucket for staging the extracted data. If the selected data asset has a specified default staging location, Data Integration automatically uses the Object Storage data asset and bucket that are specified as the default staging location in that data asset. If the selected data asset doesn't have a specified default staging location, you can select the Object Storage data asset that has the bucket you want to use for staging. Otherwise, you can update the data asset to add a default staging location, or you can select a different data asset.
Then complete the following on the Advanced options tab:
Select Enable row limit to specify the maximum number of rows to retrieve.
In the Row limit field, specify the row limit.
If you expect huge data volume, specify the size in the Chunk size field under Enable chunking. Chunk size is mandatory if the specified row limit is greater than 100000.
The BIP properties section displays the BIP report parameters from the report that you selected for the data entity. Select a property each in the Number of rows returned and Starting offset fields. You cannot select the same property for both fields.
To load only new or updated records from the source to the target:
Select the Incremental load checkbox to identify and load only the data that's created or modified since the last time the load process was run.
For Watermark column, select the property that's used to mark data that's been incrementally loaded.
After selecting a data entity, the name of the selected data entity is displayed next to Data entity on the Details tab in the Properties panel.
If you provided a SQL statement, the label SQL_ENTITY<nnnnnnnnn> is displayed, for example: SQL_ENTITY123456789
If you used a parameter in the entity name, the label includes the parameter name, for example: BANK_${REGION}
Note
For a data entity that includes a parameter in the name, you cannot bind the entity resource to another parameter. The Assign parameter option is unavailable.
To select a Data entity that's from an Object Storage, Amazon S3, or HDFS data asset:
Use the Select data entity panel to select a single entity by name, or select one or more entities by using a pattern expression.
Click Browse by name to use another panel to browse or search the available list, and then select one data entity by name.
See Selecting a Data Entity, if you need help with searching the list by using a full or partial name.
You can include one or more data flow parameters in the name of a selected data entity, by using the syntax ${PARAMETER_NAME}. For example: BANK_${REGION}.csv
Click Browse by pattern to use a regular expression (including wildcards) to select one or more entities that match a specific directory and file pattern. For example, enter CUSTOMERS* in the field and press Enter to filter the list. See Testing and Using a Pattern.
You can also include parameters in the pattern expression, by using the syntax ${PARAMETER_NAME}. For example: CUSTOMERS_${REGION}/*
When you use a pattern expression, all existing files that match the pattern are assumed to have the same structure. The matched files are treated as a single entity in the data flow. Any future new files that match the pattern are also processed.
For a data entity that includes a parameter in the name, you cannot bind the entity resource to another parameter. The option to assign a parameter is unavailable.
After making the data entity selection by name or by pattern, complete the entity configuration in the Select data entity panel.
How you complete the configuration depends on the File type you choose for the selected data entity.
File type
Description
CSV
If you used Browse by name to select the data entity, you can click Preview raw data to display the first 10 lines of the file.
Select the Compression type (compression method) that's used. If you don't know the compression method that's used to compress the file, select Auto (Default).
The default Encoding is UTF-8, which cannot be changed.
Other settings you can configure are:
If the first row in the file is a header row, select Yes for Has header.
If the values in the data rows span multiple lines, select Yes for Multi-line.
Specify the Escape character that escapes other characters found in data values. For example: \
Select the Delimiter character that separates data fields. For example: COLON (:), COMMA (,), PIPE (|), SEMICOLON (;), or TAB (\t)
If a column delimiter is included at the end of a data row in the file, select Yes for Trailing delimiter.
Specify the Quote character that treats other characters as literal characters. For example: "
JSON
Select the Compression type (compression method) that's used. If you don't know the compression method that's used to compress the file, select Auto (Default).
The default Encoding is UTF-8, which cannot be changed.
Select Use custom schema to paste or upload a custom sample schema that's used to infer the entity shape. When this checkbox is selected, schema drift is no longer applicable in the source entity.
If you select Upload, drop a custom schema file in the box provided, or click Select a file to select the schema file to upload.
If you select Paste in schema, copy the schema text file content and paste it in the box provided.
PARQUET
The default Compression type is Auto, which cannot be changed.
AVRO
The default Compression type is Auto, which cannot be changed.
EXCEL
Only XLSX files are supported.
The default Compression type is Auto, which cannot be changed.
By default, Data Integration treats the first row in a file as a header row. If the first row in a file is not a header row, select No for Has header.
For Select entity by, you can choose the criteria as Sheet name, Sheet index, or Table name. Then enter a Value for the worksheet name, worksheet index, or table name. Sheet index is zero-based.
For Sheet name or Sheet index, enter the area of the file to use as the Data range for selection. If you don't enter a data range value, the default is the data range A1, which corresponds to the entire sheet. If the file has a header row, enter a value that starts from the header row, for example, A1:K56.
After completing the data entity configuration, click Select.
You're brought back to the Details tab in the Properties panel. The name of the selected data entity or entity pattern expression is displayed next to Data entity. For example, CUSTOMERS_EU.csv, CUSTOMERS*, or CUSTOMERS_${REGION}/*
Note
For a data entity that includes a parameter in the name, you cannot bind the entity resource to another parameter. The Assign parameter option is unavailable.
If available and applicable, select the Incremental load checkbox to identify and load only the data that's created or modified since the last time the load process was run successfully.
For file data assets such as Object Storage, Data Integration performs incremental extraction based on the last modified date (timestamp) of the objects that are selected in a folder or selected by a file pattern.
For example, suppose on January 5 there are three files in the Object Storage bucket: EMPLOYEE_1.csv, EMPLOYEE_2.csv, EMPLOYEE_3.csv
If a load job is successfully run on January 5, all three files are processed.
On January 6, a new file EMPLOYEE_4.csv is added to the bucket; this file has the last modified date (timestamp) of January 6. If a load job is run on January 6, only EMPLOYEE_4.csv is picked up and processed for incremental data loading.
On January 6, a new file EMPLOYEE_4.csv is added to the bucket; this file has the last modified date (timestamp) of January 6. The file EMPLOYEE_2.csv is also modified on January 6. If a load job is run on January 6, the files EMPLOYEE_2.csv and EMPLOYEE_4.csv are picked up for processing.
(Optional)
On the Details tab, select the refresh icon next to the selected entity name to ensure that you're using the entity's latest schemas during the current working session in the designer. Only the metadata of the entity is fetched, not the actual data changes.
(Optional)
On the Advanced options tab, depending on the data asset type for this source operator, you can select or clear applicable options.
Item
Description
Fetch file metadata as attributes
Select the checkbox to include file metadata (such as file name and file size) as attributes in the returned data.
Allow schema drift
By default, schema drift is enabled to let Data Integration dynamically handle schema definition changes during data flow design time and runtime. Schema drift is changes to schema definitions in the specified data entity. Changes include attributes that are added or removed, and attribute data types or entity names that are changed.
If you clear the Allow schema drift checkbox, you disable schema drift to lock the schema definitions when the data flow is defined. When schema drift is disabled, Data Integration uses a fixed shape of the specified data entity even when the underlying shape has changed.
If a custom schema is selected, schema drift isn't applicable and this option can't be selected.
Treat missing file as empty
Only for an Object Storage data asset.
This checkbox is available only when Allow schema drift is not selected.
Allow pushdown
By default, pushdown is enabled to let Data Integration use the source database system to process transformation operations, if the system supports pushdown. Only Oracle databases, Oracle Autonomous Data Warehouse, Oracle Autonomous Transaction Processing, and MySQL support pushdown. Less data is pulled from the source when Data Integration uses the source database system to process operations.
By clearing the Allow pushdown checkbox, you disable pushdown. When pushdown is disabled, Data Integration pulls all data directly from the source system and then processes the transformations.
Parameter and Header settings
Only for a REST data asset.
Use the Parameter and Header settings to configure more source properties.
In the Attributes tab, view the attributes from the data
entity you selected. For a BICC Fusion Applications data asset, the attributes
are the BICC columns from the VO you selected.
You can select individual attributes or use the filter icons to filter the
list of attributes. Then apply bulk exclude rules to the selected or
filtered attributes.
Click the filter icon in the Name column to
filter attributes by a name pattern. In the filter field, enter a
simple regular expression. You can use wildcards ?
and * in the regex pattern.
Click the filter icon in the Type column to
filter attributes by a data type. Use the menu to select the type to
use as a filter.
Note
You can apply only one name pattern filter,
but multiple type filters at a time. For example, to filter by
the name pattern *_CODE and by the type numeric or varchar, you
apply one name pattern filter (*_CODE) and two type filters
(numeric, varchar).
Use the Actions menu to apply bulk exclude
rules to the selected attributes or filtered attributes. Select
Exclude by selection or
Exclude by applied filter.
Click View rules to open the Rules panel. You
can view and manage the rules that have been applied to the data
entity. By default, the first rule in the Rules panel includes
everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type,
value type). See Hierarchical Data
Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
For a BICC Fusion Applications data asset, only 10 rows of data are shown.
The Data Profile does not display for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
When configuring OCI Object Storage as a data source, you can use a regular expression to specify a file pattern for selecting one or more data entities.
A file pattern is a rule for finding files that match a directory and file name and how to handle the matched files when found.
Syntax to use
Data Integration supports the glob pattern syntax for specifying
a file pattern.
An asterisk, *, matches any number of characters (including none).
Two asterisks, **, works the same as * but crosses directory boundaries to match complete paths.
A question mark, ?, matches exactly one character.
Braces specify a collection of sub patterns. For example:
{sun,moon,stars} matches "sun", "moon", or "stars".
{temp*,tmp*} matches all strings beginning with "temp" or "tmp".
Square brackets convey a set of single characters or, when the hyphen character (-) is used, a range of characters. For example:
[aeiou] matches any lowercase vowel.
[0-9] matches any digit.
[A-Z] matches any uppercase letter.
[a-z,A-Z] matches any uppercase or lowercase letter.
Within the square brackets, *, ?, and \ match themselves.
All other characters match themselves.
To match *, ?, or the other special characters, you can escape them by using the backslash character, \. For example: \\ matches a single backslash, and \? matches the question mark.
Examples
*.html
Matches all strings that end in .html
???
Matches all strings with exactly three letters or digits
*[0-9]*
Matches all strings containing a numeric value
*.{htm,html,pdf}
Matches any string ending with .htm, .html, or .pdf
a?*.java
Matches any string beginning with a, followed by at least one letter or digit, and ending with .java
{foo*,*[0-9]*}
Matches any string beginning with foo or any string containing a numeric value
directory1/20200209/part-*[0-9]*json
Matches all files in the folder where the file name starts with part- and have any number of 0-9 numbers and ends with json
directory3/**.csv
Matches all files with an extension of csv that are in the folder directory3 and its subfolders
directory3/*.csv
Matches all files with an extension of csv that are only in the main folder directory3. Files in subfolders are not included.
You can test the expression to ensure that the pattern you want to use retrieves the Object Storage files for one or more data
entities.
In the Select data entity panel, click Browse
by pattern.
In the Browse data entities by pattern panel, select
Test pattern from the More
actions menu.
In the Test pattern panel, in the Search pattern field, enter a pattern expression you want to test before using.
For example, enter department1/2020/*.json to find all files with the extension .json that are in the directory department1/2020. You can use parameter syntax ${} in the Search pattern field.
To test the search pattern, in the Test filenames block, provide a file name or multiple file names delimited by a new line. For example, for the pattern BANK_C*/*, the file names might be:
Verify that the test file names are returned in the Resulting file name block.
Click Use pattern to add the pattern expression to the Browse data entities by pattern panel.
You're returned to the Browse data entities by pattern panel. The files that match the pattern expression are displayed in the table.
Click Select pattern.
You're returned to the Select data entity panel. The pattern expression is displayed next to Data entity.
When you use a pattern expression, all existing files that match the pattern are
assumed to have the same structure. The matched files are treated as a single
entity in the data flow. Any future new files that match the pattern are also
processed.
Incremental load is loading only new or updated data from a source into a target. In Data Integration, when you configure BICC Oracle Fusion Applications as a source data, you can use the Managed incremental extract strategy to do incremental load.
When you choose to use the incremental extract strategy, only new or updated records from the source are extracted based on a last extract date. Data Integration provides two last extract date options:
Custom: You provide a last extract date for every task run.
Managed: Data Integration manages the date for you by tracking the task run timestamp and storing the last successful load date as a watermark across consecutive runs.
With the Data Integration managed last extract date option, you don't need to explicitly specify a date for a task run. You can, however, override the date at runtime.
Task runs initiated by a task schedule 🔗
Data Integration tracks a task run independently of a task run that's initiated by a task schedule. So if you use the Managed last extract date option and also set up a task schedule, Data Integration automatically keeps track of the last successful task run timestamp of task-scheduled runs separately from the last successful task run timestamp of task runs that are not initiated by a task schedule. This means the last date that's managed within a task schedule or the last date that's managed within a task is not modified by the other run operation.
Incremental load for different data entities in a data flow 🔗
Suppose you want to set up incremental load on different data entities in a data flow. You can achieve this by using parameters and creating a task schedule for each data entity. The general steps are:
In the data flow, assign parameters for the source schema (BICC offering) and data entity (BICC VO).
Create and publish an integration task for the parameterized data flow.
Create a task schedule for the integration task. On the Configure parameters page, specify the schema and data entity values for the task schedule.
Create another task schedule for the same integration task. On the Configure parameters page, set the schema and data entity values for this task schedule.
Incremental load multiple times in one day 🔗
To perform an incremental load more than one time in a day, add a filter operator immediately after the BICC source operator in the data flow. Then create a condition expression to filter out data that has already been processed. For example, if the BICC VO's last_update_date column is LastUpdateDateField, then the expression might be the following:
You can add multiple target operators to a data flow. Each target can have only one inbound port.
Important
If you're using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes, Map, and Data tabs in the Properties panel.
From the Operators panel, drag a Target operator onto the canvas.
With the target operator in focus, do the following on the Details tab of the Properties panel:
Item
Description
Identifier
Enter a name for the target operator, or leave the default name as-is.
Create new data entity
(Optional) Select the checkbox to create a new entity on the target, instead of updating an existing entity on the target. In a later step, you are prompted to enter a name for the data entity that's to be created on the target.
Do not select the checkbox if you want to use an existing data entity as the target. You are prompted to select the data entity after you select the target data asset, connection, and schema.
Data asset
Connection
Schema or Bucket
In the resource order as presented, click Select next to the resource to select the data asset, connection, and schema or bucket.
Take note of the following:
Not all data source types are available as target data assets. For example, PostgreSQL and Hive are not available as targets.
For the Merge Integration strategy, you cannot use an Object Storage data asset as the target.
If you're using an Object Storage data asset, select the compartment first and then select the bucket.
To use Autonomous Data Warehouse or Autonomous Transaction Processing as a target, you need an Object Storage bucket as the staging location. If you select a data asset that doesn't have a default staging location specified in that data asset, Data Integration displays a warning notification. When you see the notification, choose one of the following options:
Click Update data asset to add a default staging location to that data asset.
Click Select next to Staging location in the Properties panel to select the Object Storage data asset that has the bucket you want to use for staging.
The PAR_MANAGE request permission is required. For more information, see step 5.
Data entity
If you selected the Create new data entity checkbox, enter a name for the entity that's to be created as the target. For more information, see step 3.
If you did not select the Create new data entity checkbox, select an existing data entity as the target. For more information, see step 4.
(Only for data assets of type Autonomous Data Warehouse or Autonomous Transaction Processing)
Staging location
Data Integration uses the staging location to stage data before loading the data into the target.
For more information, see step 5.
Integration strategy
How data is moved into the target data entity depends on the integration strategy you choose. The options are:
Insert: Inserts new records, or appends the records when the data exists on the target.
If the Create new data entity checkbox is selected, the only strategy option that's available for selection is Insert.
Overwrite: Performs a truncate on the target before inserting new records.
If you select the Overwrite strategy, you cannot create a new data entity for the target entity.
Merge: Inserts new records and merges existing records into an existing data entity.
Take note of the following when using the Merge strategy:
The Merge strategy is available for database targets only. You cannot use an Object Storage data asset as the target if you want to use this integration strategy.
The Merge key identifies the attribute or attributes to determine which existing rows or new rows to update or insert. You can select a primary/unique key or a non-primary key as the Merge key.
Click Select or Edit. In the Merge key panel, do one of the following actions:
Select Primary/Unique key attributes and select a key.
Select Other attributes to use other (non-primary) attributes. Enter a name for the merge key and select the attributes from the list.
If a parameterized target data entity is configured to use the Merge strategy, you can change the Merge key selection:
In an integration task at design time and runtime
In an integration task in a pipeline at design time, and at runtime (pipeline task)
In a task schedule (integration task and pipeline task)
(For Object Storage only)
Create output as a single file
This step is optional for Object Storage targets.
Select the Create output as a single file checkbox to write the output to a single file, otherwise the operation creates multiple files. The single output file is overwritten every time the task is run.
Creating a single output file might affect the performance of Data Integration. Do not use the single output file option for large datasets.
You can choose to write the output to a single file only if one of the following conditions applies:
When the Integration strategy is Insert, and the Create new data entity checkbox is selected. Do not add the forward slash (/) at the end of the new entity name. For more information, see step 3.
When the Integration strategy is Overwrite, and you use an existing data entity (Create new data entity checkbox is not selected). Select an existing data entity that does not have the forward slash (/) at the end of the entity name. For more information, see step 4.
If you selected the Create new data entity checkbox: Enter a name for the entity that's to be created. A new data entity is created only if the name does not exist.
Depending on the data asset type, take note of the following when you enter a name for the data entity:
Target type
Description
For Oracle Database, Autonomous Data Warehouse, or Autonomous Transaction Processing
Enter the new data entity name.
When creating a database data entity as a target output, if the entity name you provide does not exist, a data entity is created and data is inserted into the output. If the entity name you provide exists, the outcome of the operation depends on the shape of the target on the data flow and the shape of the existing entity.
A shape is based on the number and name of attributes, data types, type characteristics such as length and precision, and whether data is present. For example, data type varchar2(4000) is not the same shape as varchar2(2000) even though the attribute name on the data flow target is the same as the name of the attribute on the existing data entity.
If the shape of the target entity on the data flow is the same as the shape of the existing entity, the database operation:
Inserts data, if no data is present
Updates or inserts data, if data is present
If the shape of the target entity on the data flow is not the same as the shape of the existing entity:
The operation adds the attribute to the existing entity when the attribute name does not exist.
For example, the target on the data flow has the attribute EMPLOYEE_ID NUMBER(10), but EMPLOYEE_ID NUMBER is not present in the existing entity.
When the attribute name is present in the existing entity, but the data type shape is not the same:
When attribute data is present, the database operation fails when the data shape is smaller. For example, when the target on the data flow has EMPLOYEE_COMMENTS VARCHAR2(4000) and the existing entity has EMPLOYEE_COMMENTS VARCHAR2(2000), the operation fails. When the target on the data flow has EMPLOYEE_COMMENTS VARCHAR2(2000) and the existing entity has EMPLOYEE_COMMENTS VARCHAR2(4000), the operation succeeds.
When attribute data is not present, the operation modifies the attribute and inserts data.
If the target is an autonomous database, you might encounter the database error ORA-22859: invalid modification of columns when an attempt is made to modify a column object, REF, VARRAY, nested table, or LOB column type. Create a column of the preferred type and copy the column data to the new type using the appropriate type constructor.
Note also that conversion between TEXT and CLOB is not supported in Data Integration. With the limit of 4000 bytes for a VARCHAR2 column in an autonomous database, you might have to use an existing target data entity that has a CLOB data type instead of selecting the Create new data entity checkbox in certain use cases.
For Object Storage
Enter the new data entity name, followed by a forward slash (/). For example, enter newdirectory/.
However, if you select the Create output as a single file checkbox because you want to create a single file output, enter the new entity name without the forward slash (/) at the end.
When writing output to a single file, you can include parameters in the file name using the syntax ${PARAMETER_NAME}. For example, the file name can be written as:
Also select the File type and Compression type. The compression types that are available for selection depends on the file type you choose. If you don't know the compression method that's used to compress the file, select None (Default).
For CSV and JSON file types, the default Encoding is UTF-8, which cannot be changed.
If you choose the file type CSV, other settings you can configure are:
If the first row in the file is a header row, select Yes for Has header.
If the values in the data rows span multiple lines, select Yes for Multi-line.
Specify the Escape character that escapes other characters found in data values. For example: \
Select the Delimiter character that separates data fields. For example: COLON (:), COMMA (,), PIPE (|), SEMICOLON (;), or TAB (\t)
If a column delimiter is included at the end of a data row in the file, select Yes for Trailing delimiter.
Specify the Quote chracter that treats other characters as literal characters. For example: "
If you did not select the Create new data entity checkbox: Select an existing data entity.
To ensure that you're using the entity's latest schemas during the current working session in the designer, select the refresh icon next to the entity name. Only the metadata of the entity is fetched, not the actual data changes.
Depending on the data asset type, take note of the following when configuring to use an existing data entity:
Target type
Description
For Object Storage
Select the Create output as a single file checkbox to use a single output file, otherwise the operation creates multiple files. The single output file is overwritten every time the task is run.
Creating a single output file might affect the performance of Data Integration. Do not use the single output file option for large datasets.
If the Create output as a single file checkbox is selected: When selecting the data entity, select an entity name that does not have the forward slash (/) at the end.
Select the File type. If CSV is the file type, you can click Preview raw data to display the first 10 lines of the file.
Select the Compression type. The compression types that are available for selection depends on the file type you choose. If you don't know the compression method that's used to compress the file, select None (Default).
For CSV and JSON file types, the default Encoding is UTF-8, which cannot be changed.
If the file type is CSV, other settings that you can configure are:
If the first row in the file is a header row, select Yes for Has header.
If the values in the data rows span multiple lines, select Yes for Multi-line.
Specify the Escape character that escapes other characters found in data values. For example: \
Select the Delimiter character that separates data fields. For example: COLON (:), COMMA (,), PIPE (|), SEMICOLON (;), or TAB (\t)
If a column delimiter is included at the end of a data row in the file, select Yes for Trailing delimiter.
Specify the Quote character that treats other characters as literal characters. For example: "
If the file type is JSON, select Use custom schema to paste or upload a custom sample schema that's used to infer the entity shape.
If you select Upload, drop a custom schema file in the box provided, or click Select a file to select the schema file to upload.
If you select Paste in schema, copy the schema text file content and paste it in the box provided.
For Autonomous Data Warehouse or Autonomous Transaction Processing, select the staging location if a default location is not defined for the asset, or edit the location to use a different staging location.
Note
If the target is Autonomous Data Warehouse or Autonomous Transaction Processing, the staging location option is enabled for you to select an Object Storage bucket. Data Integration uses the staging location to stage data before loading the data into the target.
If the target data asset is configured with a default staging location, Data Integration automatically uses that bucket for staging unless you explicitly edit the value to specify a different staging location.
To use Autonomous Data Warehouse or Autonomous Transaction Processing as the target, you must first enable the PAR_MANAGE request permission on the staging bucket, for example:
allow any-user to manage buckets in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.permission='PAR_MANAGE', request.principal.id='<workspace-ocid>'}
(For edit) In the Select staging location panel, do not select the Use default staging location settings checkbox, if you want to select a different staging location.
Click the Data asset menu and select an Object Storage data asset.
Then select the Connection, Compartment, and the Bucket, and click Select.
(Optional)
On the Advanced options tab, depending on the data asset type for this target operator, you can enter values or use default values.
Item
Description
Reject limit
For Autonomous Data Warehouse or Autonomous Transaction Processing
Enter a value.
You can specify the maximum number of erroneous rows that can fail to load into the target before the integration task fails. For example, if the data source has 1,000 rows and you set the reject limit at 200, the task fails immediately after the 200th erroneous row is rejected.
If you don't specify a value, the default is zero, which means the task fails upon the first erroneous row being rejected.
If you encounter a task failure, check the logs for the table names where the rows had been rejected. Then query the affected tables and rows in the autonomous database.
Load order
Enter a value.
Only positive integers are allowed. The default is 0. The load order specifies the order of the target nodes that data is loaded into. The target with value 0 is loaded first, then target with value 1, and so on. If multiple nodes have the same value, the targets are loaded in no particular order.
Under the Attributes tab, you can select individual attributes or use the filter icons to filter the list of attributes by a name pattern or by an attribute type. Then apply bulk exclude rules to the selected or filtered attributes.
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on any rules you have applied in the Attributes tab. Transformations cannot be applied on a target operator as the data is read-only.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Map tab, specify how the incoming attributes map to the target data entity's attributes. Note that mapping attributes is not applicable if you selected the Create new data entity checkbox.
For target entities with hierarchical data types, only first-level fields can be mapped. See Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Data Integration checks and provides warnings in the following situations:
Data truncation might occur when data is loaded from the source attributes to the mapped target attributes.
The data type of a source attribute does not match the data type of the mapped attribute.
The Data tab displays the transformed data based on the operators applied in the data flow.
Important
If you're using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported.
You can filter data in the target entity by a name pattern or data type. To filter data by a name pattern, enter a simple regex pattern or wildcards ? and * in the Filter by Pattern field. To filter data by a type, select the data type from the menu next to the pattern field.
Transformations cannot be applied to the target operator as the data is read-only.
The Map tab is only available for a target operator.
Important
If you're using a hierarchical data entity for a target operator, see Hierarchical Data Types to understand what is supported.
When you're creating a new target data entity, the Map tab is unavailable. The incoming attributes are used to create the table or file structure with a 1-to-1 mapping.
When using an existing target data entity, map the incoming attributes to the target data entity's attributes. The actions you can perform are:
Maps incoming attributes to target attributes based on simple, user-defined regex rules.
From the Actions menu, select Map by pattern. Enter a source pattern and a target pattern. Then click Preview mapping to test the source and target patterns.
To define a pattern, you can use asterisk (*) and question mark (?) symbols. Use an asterisk to indicate a wildcard of any number of characters in a string pattern. Use a question mark to indicate a single character wildcard. For example, *INPUT? maps any matching attribute starting with n number of characters containing the string INPUT followed by a single character, such as NEWINPUTS.
By default, pattern matching is case-insensitive. For example, the source pattern *Name matches the target name CUSTOMER_NAME and Customer_Name.
To indicate different capture groups, use $n. For example, let's say you want to map LAST_NAME, FIRST_NAME, and USERNAME from a source or upstream operator to TGT_LAST_NAME, TGT_FIRST_NAME, and TGT_USERNAME in the target data entity. You would enter *NAME in the Source pattern field and TGT_$1 in the Target pattern field. The asterisk (*) in *NAME means the character string before NAME is to be identical to the character string found in $1 of the target pattern. $1 refers to the first capture group in the source pattern, which in this case is the asterisk (*).
In cases where you need case-sensitive pattern matching, add the (?c) prefix to the source pattern. For example, let's say you want to map the source attribute CustomerName, which uses camel case lettering in its name, to the target attribute with the name Customer_Name. You would enter (?c)([A-Z][a-z]+)([A-Z][a-z]+) as the source pattern and $1_$2 as the target pattern. When the (?c) prefix is added to the beginning of a source pattern, case-sensitive pattern matching is switched on for the mapping. Data Integration detects that the "N" in CustomerName is the start of a new pattern and thus treats ([A-Z][a-z]+)([A-Z][a-z]+) as two different words (capture groups) when matching.
Drag an incoming attribute from the source list to an attribute in the target list to create a mapping.
Alternatively, you can select Manual map from the Actions menu. Then use the Map Attribute dialog to create a mapping by selecting a source attribute and a target attribute.
Select View rules. In the Rules panel, select one or more rules and click Remove. Alternatively, you can select Remove from the rule's Actions menu () to clear that rule.
From the Actions menu, select Reset mappings. All manual and auto-map rules are removed.
Shaping Operators 🔗
Important
For complex data types, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab and Data tab in the Properties panel.
Filter Operator 🔗
Use the filter operator to select a subset of data from the inbound port to continue downstream to the outbound port based on a filter condition.
From the Operators panel, drag a Filter operator onto the canvas.
With the filter operator in focus, on the Details tab of
the Properties panel, enter a name in the Identifier
field, or leave the name as-is.
Next to Filter condition, click
Create.
In the Create Filter Condition panel, Condition Builder section, you can double-click or drag attributes, parameters, and functions to add to the editor to build a condition. You can also enter a condition expression manually into the editor, and validate the expression.
Note
In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
Click Create.
(Optional)
Click Assign parameter to use parameters so that the filter condition
is not bound to the compiled code when you publish the data flow. See Assigning a Parameter.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Use the Condition Builder to visually select elements to create a filter condition. You can also enter a filter condition manually in the editor.
Creating a filter condition lets you select a subset of data from an upstream operator based on the condition.
The elements that you can use in a filter condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.
Incoming displays the attributes from the upstream operator that are coming into this filter operator.
For example, to filter data by a city name, you can create the condition expression as:
FILTER_1.ADDRESSES.CITY='Redwood Shores'
Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.
Suppose you want to use a parameter for the city name in the filter condition. You can create a VARCHAR parameter with the name P_VARCHAR_CITY, and set the default value to Redwood Shores. Then you can create the filter expression as:
FILTER_1.ADDRESSES.CITY=$P_VARCHAR_CITY
Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.
For example, to filter data by a city name or by population, you can use the OR function to create the filter condition expression as:
FILTER_1.COUNTRIES.CITY=$P_VARCHAR_CITY OR FILTER_1.COUNTRIES.POPULATION>100000000
FILTER_1.COUNTRIES.CITY=$P_VARCHAR_CITY OR FILTER_1.COUNTRIES.POPULATION>$P_NUMERIC
Here's a list of functions that are available for you to add when you construct conditions:
Returns the smallest integer not greater than the numeric value
CEIL(-1,2)
FLOOR(numeric)
Returns the largest integer not greater than the numeric value.
FLOOR(-1,2)
MOD(numeric1, numeric2)
Returns the remainder after numeric1 is divided by numeric2.
MOD(8,2)
POWER(numeric1, numeric2)
Raises numeric1 to the power of numeric2.
POWER(2,3)
ROUND(numeric1, numeric2)
Returns numeric1 rounded to numeric2 decimal places.
ROUND(2.5,0)
TRUNC(numeric1, numeric2)
Returns numeric1 truncated to numeric2 decimal places.
TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])
Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.
Supported format patterns:
0: A digit
#: A digit, zero shows as absent
.: Placeholder for decimal separator
,: Placeholder for grouping separator
E: Separates mantissa and exponent for exponential formats
-: Default negative prefix
¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator
DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date)
Returns the date's day in the year.
DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date)
Returns the date's week in the year.
WEEKOFYEAR('2022-07-28') returns 30
WEEKOFYEAR('2022-07-28 13:24:30') returns 30
HOUR(datetime)
Returns the datetime's hour value.
HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date)
Returns the date's last day of the month.
LAST_DAY('2020-12-25') returns 31
MINUTE(datetime)
Returns the datetime's minute value.
HOUR('2020-12-25 15:10:30') returns 10
MONTH(date)
Returns the date's month value.
MONTH('2020-06-25') returns 6
QUARTER(date)
Returns the quarter of year the date is in.
QUARTER('2020-12-25') returns 4
SECOND(datetime)
Returns the datetime's second value.
SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr])
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31
TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17
TO_TIMESTAMP(expr, format_string[, localeStr])
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)
Returns the date's week value.
WEEK('2020-06-25') returns 4
YEAR(date)
Returns the date's year value.
YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months)
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.
Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS
A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.
Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.
If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss
year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '1' YEAR returns an interval of 1 year
INTERVAL '200' YEAR(3) returns an interval of 200 years
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH
Returns a period of time in years and months. Use to store a period of time using year and month fields.
year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]
Returns a period of time in months.
month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of days, hours, minutes, and seconds.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]
Returns a period of time in terms of days, hours, and minutes.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]
Returns a period of time in terms of days and hours.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]
Returns a period of time in terms of days.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of hours, minutes, and seconds.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]
Returns a period of time in terms of hours and minutes.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]
Returns a period of time in terms of hours.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]
Returns a period of time in terms of minutes.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of minutes and seconds.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] )
Returns the value evaluated at the row that's the first row of the window frame.
FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL.
LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ])
Returns the value evaluated at the row that is the last row of the window frame.
LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL.
LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause)
Returns the rank of the current row with gaps, counting from 1.
RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Returns the unique number of the current row within its partition, counting from 1.
ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.
INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])
Returns the (1-based) index of the first occurrence of substring in string.
INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)
Returns the string with all letters changed to lowercase.
LOWER('ORACLE') returns oracle
LENGTH(string)
Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.
LENGTH('Oracle') returns 6
LTRIM(string)
Returns the string with leading spaces removed from the left.
Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.
Replaces all occurrences of search with replacement.
If search is not found in string, then string is returned unchanged.
If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.
REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)
Returns the string with leading spaces removed from the right.
RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])
Returns the substring starting at position.
SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])
Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
Number example: TO_CHAR(123) returns 123
Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.
UPPER(string)
Returns a string with all letters changed to uppercase.
UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])
Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.
LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])
Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.
Parses a column containing a JSON string into one of the following types, with the specified schema.
Map, with String as the key type
Struct
Array
FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}
FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}
TO_JSON(column)
Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.
TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)
Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.
TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}
TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}
TO_STRUCT(string,column[,string,column]*)
Creates a new column of Struct type. The input columns must be grouped as key-value pairs.
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
Aggregate
Expression
Filter
Join
Lookup
Split
Pivot
Function
Description
Example
TRANSFORM(column, lambda_function)
Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.
For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.
For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys.
For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)
Only the Expression operator supports ARRAY_SORT.
Takes an array and sorts according to the given function that takes 2 arguments.
The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.
If the function is omitted, the array is sorted in ascending order.
array_sort(to_array(5, 6, 1),
(left, right) -> CASE WHEN left < right THEN -1
WHEN left > right THEN 1 ELSE 0 END)
The returned array is:
[1,5,6]
Join Operator 🔗
Use the join operator to link data from multiple inbound sources.
From the Operators panel, drag a Join operator onto the canvas.
With the join operator in focus, on the Details tab of
the Properties panel, enter a name in the Identifier
field, or leave the name as-is.
Select the Join type. The types of join supported
are:
Join Types
Icon
Join Type
Description
Inner
Selects the data from the two inbound sources where the
join condition is met.
Left outer
Selects all the data from the inbound source 1 and the
data from inbound source 2 where the join condition is
met.
Right outer
Selects the data from inbound source 1 where the join
condition is met and all the data from inbound source
2.
Full outer
Selects all data from the two inbound sources, matching
rows where matches can be made and inserting nulls where no
matches are found.
Next to Join condition, click
Create.
In the Create Join Condition panel, Condition Builder section, you can double-click or drag attributes, parameters, and functions to add to the editor to build a condition. You can also enter a condition expression manually into the editor, and validate the expression.
Note
In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
Click Create.
(Optional)
Click Assign parameter to use parameters so that the join condition is
not bound to the compiled code when you publish the data flow. See Assigning a Parameter.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Use the Condition Builder to visually select elements to create a join condition. You can also enter a join condition manually in the editor.
Creating a join condition lets you select data from two inbound sources based on the condition.
The elements that you can use in a join condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.
Incoming displays the attributes from the upstream ports connected to the join operator as two separate JOIN folders. View the attributes from each port by expanding or collapsing the appropriate JOIN folder. For example, JOIN_1_1 and JOIN_1_2.
Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.
Suppose you want to join two sources and retain only the rows where BANK_NAME='ABC Bank'. You can create a VARCHAR parameter with the name P_VARCHAR, and set the default value to ABC BANK. Then you can create the join expression as:
JOIN_1_1.ADDRESSES.BANK_ID = JOIN_1_2.BANK.BANK_ID AND JOIN_1_2.BANK.BANK_NAME = $P_VARCHAR
Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.
Here's a list of functions that are available for you to add when you construct conditions:
Returns the smallest integer not greater than the numeric value
CEIL(-1,2)
FLOOR(numeric)
Returns the largest integer not greater than the numeric value.
FLOOR(-1,2)
MOD(numeric1, numeric2)
Returns the remainder after numeric1 is divided by numeric2.
MOD(8,2)
POWER(numeric1, numeric2)
Raises numeric1 to the power of numeric2.
POWER(2,3)
ROUND(numeric1, numeric2)
Returns numeric1 rounded to numeric2 decimal places.
ROUND(2.5,0)
TRUNC(numeric1, numeric2)
Returns numeric1 truncated to numeric2 decimal places.
TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])
Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.
Supported format patterns:
0: A digit
#: A digit, zero shows as absent
.: Placeholder for decimal separator
,: Placeholder for grouping separator
E: Separates mantissa and exponent for exponential formats
-: Default negative prefix
¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator
DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date)
Returns the date's day in the year.
DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date)
Returns the date's week in the year.
WEEKOFYEAR('2022-07-28') returns 30
WEEKOFYEAR('2022-07-28 13:24:30') returns 30
HOUR(datetime)
Returns the datetime's hour value.
HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date)
Returns the date's last day of the month.
LAST_DAY('2020-12-25') returns 31
MINUTE(datetime)
Returns the datetime's minute value.
HOUR('2020-12-25 15:10:30') returns 10
MONTH(date)
Returns the date's month value.
MONTH('2020-06-25') returns 6
QUARTER(date)
Returns the quarter of year the date is in.
QUARTER('2020-12-25') returns 4
SECOND(datetime)
Returns the datetime's second value.
SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr])
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31
TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17
TO_TIMESTAMP(expr, format_string[, localeStr])
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)
Returns the date's week value.
WEEK('2020-06-25') returns 4
YEAR(date)
Returns the date's year value.
YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months)
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.
Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS
A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.
Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.
If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss
year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '1' YEAR returns an interval of 1 year
INTERVAL '200' YEAR(3) returns an interval of 200 years
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH
Returns a period of time in years and months. Use to store a period of time using year and month fields.
year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]
Returns a period of time in months.
month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of days, hours, minutes, and seconds.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]
Returns a period of time in terms of days, hours, and minutes.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]
Returns a period of time in terms of days and hours.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]
Returns a period of time in terms of days.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of hours, minutes, and seconds.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]
Returns a period of time in terms of hours and minutes.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]
Returns a period of time in terms of hours.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]
Returns a period of time in terms of minutes.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of minutes and seconds.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] )
Returns the value evaluated at the row that's the first row of the window frame.
FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL.
LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ])
Returns the value evaluated at the row that is the last row of the window frame.
LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL.
LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause)
Returns the rank of the current row with gaps, counting from 1.
RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Returns the unique number of the current row within its partition, counting from 1.
ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.
INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])
Returns the (1-based) index of the first occurrence of substring in string.
INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)
Returns the string with all letters changed to lowercase.
LOWER('ORACLE') returns oracle
LENGTH(string)
Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.
LENGTH('Oracle') returns 6
LTRIM(string)
Returns the string with leading spaces removed from the left.
Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.
Replaces all occurrences of search with replacement.
If search is not found in string, then string is returned unchanged.
If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.
REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)
Returns the string with leading spaces removed from the right.
RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])
Returns the substring starting at position.
SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])
Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
Number example: TO_CHAR(123) returns 123
Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.
UPPER(string)
Returns a string with all letters changed to uppercase.
UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])
Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.
LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])
Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.
Parses a column containing a JSON string into one of the following types, with the specified schema.
Map, with String as the key type
Struct
Array
FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}
FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}
TO_JSON(column)
Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.
TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)
Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.
TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}
TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}
TO_STRUCT(string,column[,string,column]*)
Creates a new column of Struct type. The input columns must be grouped as key-value pairs.
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
Aggregate
Expression
Filter
Join
Lookup
Split
Pivot
Function
Description
Example
TRANSFORM(column, lambda_function)
Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.
For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.
For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys.
For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)
Only the Expression operator supports ARRAY_SORT.
Takes an array and sorts according to the given function that takes 2 arguments.
The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.
If the function is omitted, the array is sorted in ascending order.
array_sort(to_array(5, 6, 1),
(left, right) -> CASE WHEN left < right THEN -1
WHEN left > right THEN 1 ELSE 0 END)
The returned array is:
[1,5,6]
Expression Operator 🔗
Use the expression operator to perform one or more transformations on a single row of data to create new, derivative fields.
From the Operators panel, drag an Expression operator onto the canvas.
With the expression operator in focus, on the Details
tab of the Properties panel, enter a name in the
Identifier field, or leave the name as-is.
Under Expressions, click Add
expression.
In the Add Expression panel, enter a name for the expression in the Identifier field, or leave the name as-is.
(Optional)
To apply the expression to two or more attributes, select Allow bulk selection. Then select the attributes by name or by using a regular expression pattern. For example, suppose you have three name attributes in the source dataset and you want to apply the UPPER function to all three source attributes. You can use a pattern to select this group of source attributes.
For Source attributes, select Pattern and click Edit pattern. Then enter the regular expression that matches a group of source attributes. For example, enter *NAME to match attributes FIRST_NAME, LAST_NAME, and FULL_NAME.
Select a data type from the menu.
For Target attributes, enter the regular expression to define the target attribute names in Pattern. By default, $0 is entered, which refers to the original source attribute names. Update the pattern to name the target attributes according to requirements. For example, enter $0_UPPER to append the text UPPER to the original source attribute names.
By default, Use source attribute data types is selected. You can retain the selection. If you clear the Use source attribute data types checkbox, then select a Data type and complete the fields corresponding to the type selected.
(Optional)
If you did not select Allow bulk selection, under
Expression data type, you can select Infer
data type if you want the expression builder to detect the data
type from the expression that you enter. You can click Preview data
type and Validate to preview and refresh
the data type, and validate the expression.
Infer data type is useful when you're working with complex data types such as Map, Array, and Struct that might have levels of nested types.
If you did not select Allow bulk selection and Infer data type, select a value from the Data type menu, and then complete the fields corresponding to the type selected.
In the Expression builder section, visually construct an expression by double-clicking or dragging incoming attributes, parameters, or functions to add to the editor to build the expression. Or you can manually write the expression yourself. See Adding an expression. You can also validate the expression.
You can use parameters in expressions, including user-defined parameters that have been created in the data flow, and system parameters generated by Data Integration at runtime.
If you selected Allow bulk selection, then use %MACRO_INPUT% in the expression to indicate the attributes on which the function is to be applied. For example, suppose you used the pattern *NAME to match the source attributes FIRST_NAME, LAST_NAME, and FULL_NAME. Now you can specify a function as UPPER(%MACRO_INPUT%) to apply the function on all the attributes that match the pattern.
To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes checkbox. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you're adding the expression the first time. When you edit the expression, the Exclude incoming attributes checkbox is unavailable.
Note
To replace a placeholder in a function with another element, highlight the placeholder and then double-click another element from the list to add it to the function.
Click Add.
You can repeat the steps from step 3 to add more expressions, if needed.
Under the Attributes tab, select to view the incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes, and then apply bulk exclude rules to the selected or the filtered attributes.
Click the filter icon in the Name column to filter attributes by a name
pattern. In the filter field, enter a simple regular expression. You can
use wildcards such as ? and * in the
regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
To change the data type of a group of attributes, select Change data type. In the dialog that appear, select the new Data type for the selected or filtered attributes. Then select any other properties that are applicable to the new data type you have selected. When you click Apply, an expression is added to the Details tab instead of a rule.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
You can use the Expression operator to change the data type of attributes using a bulk transformation action.
Instead of using a CAST function to change an attribute data type one attribute at a time, you can add the Expression operator and apply a bulk transformation action on several attributes at the same time.
Connect the expression operator with the appropriate incoming object.
With the expression operator in focus on the canvas, under the Attributes tab of the Properties panel, select to view the incoming attributes (linked to the expression operator on the left side).
If you have a few attributes to transform, use the checkboxes to select the attributes you want to transform.
If you have many attributes to transform, apply a filter on the attributes:
Click the filter icon in the Name column. In the filter field that's displayed, enter a simple regex pattern using wildcards (? and *) to filter the attributes by a name pattern. For example, enter ADD* to match attributes ADDRESS1, ADDRESS2, and ADDRESS_KEY.
Click the filter icon in the Type column. In the menu that's displayed, select the current attribute type of the attributes you want to change. For example, suppose the type of ADDRESS1 and ADDRESS2 is VARCHAR, and the type of ADDRESS_KEY is NUMERIC. You can select VARCHAR to transform ADDRESS1 and ADDRESS2 only.
From the Actions menu, select Change data type.
In the Change data type dialog, select the new Data type for the selected attributes or the filter-applied attributes.
Then specify any other properties that are applicable to the new data type you have selected.
To keep the original attributes in the dataset, select Keep source attributes. If the checkbox is not selected, only the new attributes are available in the resulting data.
Click Apply.
Select the Details tab.
Review the expression that's added to the operator for you.
Use the Expression Builder to visually select elements to build an expression in the editor. You can also manually write an expression yourself.
Take note of the following when you create expressions:
Enclose a string literal within single quotation marks. For example: CONCAT('We ', 'like') and UPPER('oracle').
Enclose an attribute name within double quotation marks. For example: UPPER("Sales") and CONCAT(CONCAT("EXPRESSION_1.EMP_UTF8_EN_COL_CSV.EN_NAME", ' '), "EXPRESSION_1.EMP_UTF8_EN_COL_CSV.NAME").
Using double quotation marks around attribute names is mandatory for multibyte characters, and names that have special characters in the fully qualified name.
The Add Expression panel has two sections: Expression information and
Expression builder. The Expression information fields
let you specify a name and data type for the expression. You can also create the
expression to apply to two or more attributes. When working with complex data types such
as Map, Array, and Struct that might have levels of nested types, you can choose to let
the builder detect the data type from the expression that you enter. When you let the
builder infer the data type, you can preview and refresh the data type, and validate the
expression.
The Expression builder section lists the elements for building an expression. The elements
that you can use in an expression include incoming attributes, parameters, and
functions. Double-click or drag an element from the list to add to the editor to build
the expression, or manually write the expression yourself. You can validate the
expression before creating it.
Incoming displays the attributes from the upstream operator that are coming into this expression operator. Below the attributes list is a checkbox that lets you apply Exclude rules. To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes checkbox. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you're adding the expression the first time. When you edit the expression, the Exclude incoming attributes checkbox is unavailable.
Parameters include user-defined parameters and system-generated
parameters.
User-defined parameters are the expression
parameters that have been added in the data flow using the Condition Builder
(filter, join, lookup, and split operators) or Expression Builder (expression and
aggregate operators). See Adding an Expression Parameter. The syntax
is $PARAMETER_NAME. For example:
EXPRESSION_1.ADDRESSES.POSTAL_CODE=$P_CODE
Data Integration generates system parameters such as
SYS.TASK_START_TIME. The values of system parameters can be used in
expressions to log system information. The syntax is
${SYSTEM_PARAMETER}. For example:
${SYS.TASK_RUN_NAME}
Functions are the functions available in Data Integration that you can use in an expression. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments. You can also add user defined functions that you have created in the workspace. For example: MYLIBRARY.MYFUNCTION
The list of functions available in Data Integration for
you to use is as follows:
Returns the smallest integer not greater than the numeric value
CEIL(-1,2)
FLOOR(numeric)
Returns the largest integer not greater than the numeric value.
FLOOR(-1,2)
MOD(numeric1, numeric2)
Returns the remainder after numeric1 is divided by numeric2.
MOD(8,2)
POWER(numeric1, numeric2)
Raises numeric1 to the power of numeric2.
POWER(2,3)
ROUND(numeric1, numeric2)
Returns numeric1 rounded to numeric2 decimal places.
ROUND(2.5,0)
TRUNC(numeric1, numeric2)
Returns numeric1 truncated to numeric2 decimal places.
TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])
Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.
Supported format patterns:
0: A digit
#: A digit, zero shows as absent
.: Placeholder for decimal separator
,: Placeholder for grouping separator
E: Separates mantissa and exponent for exponential formats
-: Default negative prefix
¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator
DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date)
Returns the date's day in the year.
DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date)
Returns the date's week in the year.
WEEKOFYEAR('2022-07-28') returns 30
WEEKOFYEAR('2022-07-28 13:24:30') returns 30
HOUR(datetime)
Returns the datetime's hour value.
HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date)
Returns the date's last day of the month.
LAST_DAY('2020-12-25') returns 31
MINUTE(datetime)
Returns the datetime's minute value.
HOUR('2020-12-25 15:10:30') returns 10
MONTH(date)
Returns the date's month value.
MONTH('2020-06-25') returns 6
QUARTER(date)
Returns the quarter of year the date is in.
QUARTER('2020-12-25') returns 4
SECOND(datetime)
Returns the datetime's second value.
SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr])
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31
TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17
TO_TIMESTAMP(expr, format_string[, localeStr])
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)
Returns the date's week value.
WEEK('2020-06-25') returns 4
YEAR(date)
Returns the date's year value.
YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months)
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.
Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS
A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.
Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.
If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss
year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '1' YEAR returns an interval of 1 year
INTERVAL '200' YEAR(3) returns an interval of 200 years
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH
Returns a period of time in years and months. Use to store a period of time using year and month fields.
year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]
Returns a period of time in months.
month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of days, hours, minutes, and seconds.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]
Returns a period of time in terms of days, hours, and minutes.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]
Returns a period of time in terms of days and hours.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]
Returns a period of time in terms of days.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of hours, minutes, and seconds.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]
Returns a period of time in terms of hours and minutes.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]
Returns a period of time in terms of hours.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]
Returns a period of time in terms of minutes.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of minutes and seconds.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] )
Returns the value evaluated at the row that's the first row of the window frame.
FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL.
LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ])
Returns the value evaluated at the row that is the last row of the window frame.
LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL.
LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause)
Returns the rank of the current row with gaps, counting from 1.
RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Returns the unique number of the current row within its partition, counting from 1.
ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.
INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])
Returns the (1-based) index of the first occurrence of substring in string.
INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)
Returns the string with all letters changed to lowercase.
LOWER('ORACLE') returns oracle
LENGTH(string)
Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.
LENGTH('Oracle') returns 6
LTRIM(string)
Returns the string with leading spaces removed from the left.
Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.
Replaces all occurrences of search with replacement.
If search is not found in string, then string is returned unchanged.
If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.
REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)
Returns the string with leading spaces removed from the right.
RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])
Returns the substring starting at position.
SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])
Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
Number example: TO_CHAR(123) returns 123
Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.
UPPER(string)
Returns a string with all letters changed to uppercase.
UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])
Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.
LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])
Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.
Parses a column containing a JSON string into one of the following types, with the specified schema.
Map, with String as the key type
Struct
Array
FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}
FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}
TO_JSON(column)
Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.
TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)
Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.
TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}
TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}
TO_STRUCT(string,column[,string,column]*)
Creates a new column of Struct type. The input columns must be grouped as key-value pairs.
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
Aggregate
Expression
Filter
Join
Lookup
Split
Pivot
Function
Description
Example
TRANSFORM(column, lambda_function)
Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.
For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.
For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys.
For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)
Only the Expression operator supports ARRAY_SORT.
Takes an array and sorts according to the given function that takes 2 arguments.
The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.
If the function is omitted, the array is sorted in ascending order.
array_sort(to_array(5, 6, 1),
(left, right) -> CASE WHEN left < right THEN -1
WHEN left > right THEN 1 ELSE 0 END)
You can change the order of expressions that have been added to an expression operator.
On the data flow canvas, select the expression operator.
Under the Details tab of the Properties panel, review the list of expressions.
For the expression you want to move, select Move up, Move down, Move to top, or Move to bottom from the Actions menu ().
Only relevant move actions are available, depending on the current position of the expression. For example, Move up and Move to top are unavailable for the first expression in the list.
Aggregate Operator 🔗
Use the aggregate operator to perform calculations such as sum or count, on all rows or a group of rows to create new, derivative attributes.
From the Operators panel, drag an Aggregate onto the canvas.
With the aggregate operator in focus, on the Details tab
of the Properties panel, enter a name in the Identifier
field, or leave the name as-is.
Under Group by, select one of the following:
Attributes: Use the menu to select one or more attributes to group by.
Pattern: Click Add pattern to add a pattern
that selects the attributes to group by. You use a regular expression
pattern to match two or more attributes.
Note
The expressions you add are performed on each selected attribute. The attributes that continue downstream include the attributes you group by plus the attributes derived from the expressions you add.
Under Expressions, click Add
expression.
In the Add Expression panel, enter a name for the aggregate expression in the Identifier field, or leave the name as-is.
(Optional)
To select the source attributes, where you want to apply the expression operation, using regular expression patterns, select Allow bulk selection and then select Pattern. For example, consider that you have three order attributes in the dataset and you want to apply the SUM operator function to all three of them. You can use a pattern to select this group of source attributes.
For Source attributes, enter the regular expression that matches a group of source attributes in Pattern. For example, enter *ORDER to match attributes FIRST_ORDER, LAST_ORDER, and FINAL_ORDER.
Then for Data type, select a value from the menu and complete the fields corresponding to the type selected.
For Target attributes, enter the regular expression to define the target attribute names in Pattern. By default, $0 is entered to refer to the original source attributes. Update the pattern to name the target attribute according to requirements. For example, enter $0_SUM to append text SUM to the original source attribute names.
By default, Use source attribute data types is selected. You can retain the selection. If you clear the Use source attribute data types checkbox, then select a Data type and complete the fields corresponding to the type selected.
(Optional) If you did not select Allow bulk selection, under Expression data type, you can select Infer data type if you want the Expression Builder to detect the data type from the expression that you enter. You can click Preview data type and Validate to preview and refresh the data type, and validate the expression.
Letting the builder infer data type is useful when you're working with complex data types such as Map, Array, and Struct that might have levels of nested types.
If you did not select Allow bulk selection, then for
Data type, select a value from the menu, and then
complete the corresponding fields according to the type selected.
(Optional) Under Expression data type, you can select Infer data type if you want the Expression Builder to detect the data type from the expression that you enter. You can click Preview data type and Validate to preview and refresh the data type, and validate the expression.
Letting the builder infer data type is useful when you're working with complex data types such as Map, Array, and Struct that might have levels of nested types.
If you did not select Allow bulk selection and Infer data type, select a value from the Data type menu, and then complete the fields corresponding to the type selected.
In the Expression builder section, visually construct an expression by double-clicking or dragging incoming attributes, parameters, or functions to add to the editor to build the expression. Or you can manually write the expression yourself. You can also validate the expression.
If you selected Allow bulk selection, then use %MACRO_INPUT% in the expression to indicate the attributes on which the function is to be applied. For example, consider you used the pattern *NAME to match source attributes FIRST_NAME, LAST_NAME, and FULL_NAME. Now you can specify a function as UPPER(%MACRO_INPUT%) to apply the function on all the attributes that match the pattern.
To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes checkbox. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you're adding the expression the first time. When you edit the expression, the Exclude incoming attributes checkbox is unavailable.
Note
To replace a placeholder in a function with another element, highlight the placeholder and then double-click another element from the list to add it to the function.
Click Add.
You can repeat the steps from step 4 to add more expressions, if needed.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab. You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Use the Expression Builder to visually select elements to build an expression in the editor. You can also manually write an expression yourself.
The Add Expression panel has two sections: Expression information and
Expression builder. The Expression information fields
let you specify a name and data type for the expression. You can also create the
expression to apply to two or more attributes. When working with complex data types such
as Map, Array, and Struct that might have levels of nested types, you can choose to let
the builder detect the data type from the expression that you enter. When you let the
builder infer the data type, you can preview and refresh the data type, and validate the
expression.
The Expression builder section lists the elements for building an expression. The elements
that you can use in an expression include incoming attributes, parameters, and
functions. Double-click or drag an element from the list to add to the editor to build
the expression, or manually write the expression yourself. You can validate the
expression before creating it.
Incoming displays the attributes from the upstream operator that are coming into this expression operator. Below the attributes list is a checkbox that lets you apply Exclude rules. To exclude one or more incoming attributes from the output of this expression, select the Exclude incoming attributes checkbox. Then use the menu to add the incoming attributes that you want to exclude from the output. An Exclude rule is applied to each attribute that you select to exclude. You can exclude attributes only when you're adding the expression the first time. When you edit the expression, the Exclude incoming attributes checkbox is unavailable.
Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.
Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments. Here's a list of functions available to use to build the expressions:
Returns the number of rows for which one or more supplied expressions are all non-null.
COUNT(expr1)
COUNT(*)
Returns the total number of retrieved rows, including rows containing null.
COUNT(*)
MAX(value)
Returns the maximum value of the argument.
MAX(expr)
MIN(value)
Returns the minimum value of the argument.
MIN(expr)
SUM(numeric)
Returns the sum calculated from values of a group.
SUM(expr1)
AVG(numeric)
Returns the average of numeric values in an expression.
AVG(AGGREGATE_1.src1.attribute1)
LISTAGG(column[, delimiter]) WITHIN GROUP (order_by_clause)
Concatenates values of the input column with the specified delimiter, for each group based on the order clause.
column contains the values you want to concatenate together in the result.
The delimiter separates the column values in the result. If a delimiter is not provided, then an empty character is used.
order_by_clause determines the order that the concatenated values are returned.
This function can only be used as an aggregator, and can be used with grouping or without grouping. If you use without grouping, the result is a single row. If you use with a grouping, the function returns a row for each group.
Consider a table with two columns, id, name. The table has three rows. The id column values are 101, 102, 102. The name column values are A, B, C.
+-----+--------+
| id | name |
+-----+--------+
| 101 | A |
+-----+--------+
| 102 | B |
+-----+--------+
| 102 | C |
+-----+--------+
Example 1: Without grouping
LISTAGG(id, '-') WITHIN GROUP (ORDER BY id) returns the name column with the value A-B-C
+--------+
| name |
+--------+
| A-B-C |
+--------+
Example 2: Group by the id
LISTAGG(id, '-') WITHIN GROUP (ORDER BY id) returns the name column with the values A and B-C in two groups.
+--------+
| name |
+--------+
| A |
+--------+
| B-C |
+--------+
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
Aggregate
Expression
Filter
Join
Lookup
Split
Pivot
Function
Description
Example
TRANSFORM(column, lambda_function)
Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.
For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.
For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys.
For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)
Only the Expression operator supports ARRAY_SORT.
Takes an array and sorts according to the given function that takes 2 arguments.
The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.
If the function is omitted, the array is sorted in ascending order.
array_sort(to_array(5, 6, 1),
(left, right) -> CASE WHEN left < right THEN -1
WHEN left > right THEN 1 ELSE 0 END)
The returned array is:
[1,5,6]
Distinct Operator 🔗
Use the distinct operator to return distinct rows with unique values.
From the Operators panel, drag an Distinct operator onto the canvas.
With the distinct operator in focus, on the Details tab
of the Properties panel, enter a name in the Identifier
field, or leave the name as-is.
The Details tab configurations are set by default for the distinct
operator.
To view the output of the operator, click the Data tab.
Distinct rows with unique values are displayed. You can also apply transforms to the data, or select an attribute to view a data profile or an attribute profile. Learn more about Data Transformations.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes. The distinct operator operates only on the filtered attributes, and outputs distinct rows with unique values.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Sort Operator 🔗
Use the sort operator to perform sorting of data in ascending or descending order.
When using a sort operator, we recommend that you apply the sort operator after you apply other operators. This ensures that the sort operator stays immediately before the target operator, therefore enabling data to be inserted into the target in a specific order.
When using a sort operator, apply the sort operator after other shaping operators, and before the operator that requires data to be sorted.
For example, apply the sort operator before a target operator to insert data in the target in a specific sort order.
From the Operators panel, drag a Sort operator onto the canvas.
With the sort operator in focus, on the Details tab of
the Properties panel, enter a name in the Identifier
field, or leave the name as-is.
In the Sort conditions section, click
Add.
The Add Sort Condition panel displays all the attribute fields fetched from
the source table.
In the Add sort condition panel, select the attribute to sort the data by.
To filter attributes using name patterns, enter a name pattern, for example,
*CITY*.
For Sort order, select Ascending
or Descending, and then click
Add.
Each condition you add is displayed in the sort condition list.
Note
You can add multiple sort conditions one-by-one. Sort operates based on
the order of sort conditions in the list. For example, the sorting begins
based on the first condition in the list. Then the sorted data is resorted
based on the second condition, and so on.
For string data types, the sort
operation occurs based on the lexicographic order.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
After you add a sort operator on the canvas, and connect it with another operator, you can add a sort condition.
From the Add Sort Condition panel, you can select attributes to sort from the fields listed, or filter attribute names using patterns. For string data types, the sort operation occurs based on the lexicographic order.
To add sort conditions:
Under the Details tab, go to the Sort
conditions section, and click Add.
The Add
Sort Condition panel displays all the attribute fields fetched from the source
table.
In the Add sort condition panel, select the attribute to sort
the data by.
To filter attributes using name patterns, enter a name pattern, for example, *CITY*.
For Sort order, select Ascending or
Descending, and then click
Add.
Each condition you added is displayed in the sort
condition list.
Note
You can add multiple sort conditions one-by-one. Sort
operates based on the order of sort conditions in the list. For example, the
sorting happens first based on the first condition in the list and then the
sorted data is re-sorted based on the second condition and so on.
Move the sort conditions up or down in the list to prioritize the sort.
Moving sort conditions up or down lets you sort first by a high priority sort condition, and then re-sort the sorted data by the next condition in the list. For example, to sort first by address and then by the postal code, move the sort condition with address to the top.
To prioritize sort conditions:
Under the Details tab, go to the Sort
conditions section.
Click the Actions menu () corresponding to the sort condition you want to move up or down.
To make a sort condition to be applied first, select Move
up.
To make a sort condition to be applied later, select Move
down.
Sort is applied to the moved-up condition first, and then the moved-down
condition.
You can perform a union operation on up to ten source operators. You must configure at least two source inputs. You can choose to perform the union operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.
Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are combined during a union by name operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are combined during a union by position operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
To add a union operator to the data flow, drag the Union operator from the Operators panel onto the canvas. You can perform the union operation on up to ten data sources.
The Details tab for the union operator displays in the Properties panel.
On the Details tab, enter a name for this union
operation in the Identifier field, or leave the name
as-is.
Specify how you want the union operation to be performed using the Match input attribute by field. You can choose from the following two options:
Name: The attribute names from the sources are matched to perform the union operation. The attribute names and data types, and the number of attributes across the inputs must be identical. The match is case-sensitive. Review the examples to understand how the union by name operation works.
Position: The attributes from the sources are matched based on their positions. The number of attributes and the attribute data types across the inputs must be identical. Review the examples to understand how the union by position operation works.
For Union all, select the checkbox if you want the operation to return all rows from all the connected data entities without eliminating any duplicates.
For Primary input, select the source that should be
considered as the key source in the union operation. This primary input
determines the attribute names, attribute data types, and attribute order for
the resulting attributes from the union operation. By default, the first source
added to the data flow is set as the primary input.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Minus Operator 🔗
Use the minus operator to compare two data entities and return the rows that are present in one entity but not present in the other entity.
You can choose to keep or eliminate duplicate rows in the resulting data.
You can perform a minus operation on two source operators only. You can choose to perform the minus operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.
Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are subtracted during a minus by name operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are subtracted during a minus by position operation. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
To add a minus operator to the data flow, drag the Minus operator from the Operators panel onto the canvas.
The Details tab for the minus operator displays in the Properties panel.
On the Details tab, enter a name for this minus
operation in the Identifier field, or leave the name
as-is.
Specify how you want the minus operation to be performed using the Match input attributes by field. You can choose from the following two options:
Name: The attribute names from the sources are matched to perform the minus operation. The attribute names and data types, and the number of attributes across the inputs must be identical. The match is case-sensitive. Review the examples to understand how the minus by name operation works.
Position: The attributes from the sources are matched based on their positions. The number of attributes and the attribute data types across the inputs must be identical. Review the examples to understand how the minus by position operation works.
For Minus all, select the checkbox if you want the operation to return all rows found in one data entity that are not present in the other data entity, without eliminating any duplicates.
For Primary input, select the source to be used as the
primary source for the minus operation. The primary input determines the
attribute names, attribute data types, and attribute order in the resulting
output. By default, the first source that is connected to the minus operator is
set as the primary input.
Icon
Primary Input
Description
Left minus
Selects all the remaining distinct rows from inbound
source 1 after eliminating rows that match inbound source
2.
Right minus
Selects all the remaining distinct rows from inbound
source 2 after eliminating rows that match inbound source
1.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Intersect Operator 🔗
Use the intersect operator to compare two or more data entities and return the rows that are present in the connected entities.
You can choose to keep or eliminate duplicate rows in the resulting data.
You can perform an intersect operation on two or more source operators. You can choose to perform the operation by matching attribute names across the source input attributes, or you can match the attributes by their attribute position.
Consider the following two data entity examples. Data entity 1 is set as the primary input. The resulting data entity shows how the data from the two inputs are intersected by attribute name. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
Consider the following two data entity examples. Data entity 2 is set as the primary input. The resulting data entity shows how the data from the two inputs are intersected by attribute position. The resulting data entity uses the attribute name, order, and data type from the primary input data entity.
To add an intersect operator to the data flow, drag the Intersect operator from the Operators panel onto the canvas.
The Details tab for the intersect operator displays in the Properties panel.
On the Details tab, enter a name for this intersect
operation in the Identifier field, or leave the name
as-is.
Specify how you want the intersect operation to be performed using the Match input attributes by field. You can choose from the following two options:
Name: The attribute names from the sources are matched to perform the intersect operation. The attribute names and data types, and the number of attributes across the inputs must be identical. The match is case-sensitive. Review the examples to understand how the intersect by name operation works.
Position: The attributes from the sources are matched based on their positions. The number of attributes and the attribute data types across the inputs must be identical. Review the examples to understand how the intersect by position operation works.
For Intersect all, select the checkbox if you want the operation to return all rows in the resulting output, without eliminating any duplicates.
For Primary input, select the source to be used as the primary source for the intersect operation. The primary input determines the attribute names, attribute data types, and attribute order in the resulting output. By default, the first source that's connected to the intersect operator is set as the primary input.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, if you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar), the attributes are filtered by the name pattern *_CODE and the type numeric or varchar.
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Split Operator 🔗
Use the split operator to divide one source of input data into two or more output ports based on split conditions that are evaluated in a sequence.
Each split condition has an output port. Data that satisfies a condition is directed to the corresponding output port.
By default a split operator is configured with the Unmatched condition, which is always available in the sequence as the last condition. You cannot add your own condition to the Unmatched condition. You also cannot delete the Unmatched condition.
The operator evaluates the conditions one at a time. After all conditions in the sequence have been evaluated, data that does not meet a condition is directed to the Unmatched output port.
When you add a split operator on the canvas, by default the split operator icon is displayed as expanded, showing the Unmatched split condition. The Unmatched condition directs all data that does not meet the other conditions that you add to the sequence.
From the Operators panel, drag a Split operator onto the canvas.
With the split operator in focus, on the Details tab of
the Properties panel, enter a name in the Identifier
field, or leave the name as-is.
Under Match, specify how you want to handle incoming data that meets a split condition. You can choose from the following two options:
First matching condition: Data that matches the first condition in the sequence is removed from further processing by subsequent conditions.
For example, consider a sequence with four conditions, including the last Unmatched condition: Condition1, Condition2, Condition3, and Unmatched.
For Condition1, the operation does not find matches. For Condition2, matches are found. The operation routes the matched data to the output port that's specified for Condition2, and then evaluates the unmatched data for Condition3. If no matches are found for Condtion3, all the data is routed to the output port that's specified for the Unmatched condition. However, if matches are found for Condition3, the operation routes the matched data to the output port for Condition3, and then routes the remaining data to the port for the Unmatched condition.
All matching conditions: Data that matches any condition is evaluated for all conditions in the sequence.
For example, consider a sequence with Condition1 and Condition2. Matches are found after Condition1 is evaluated. The matched data is routed to the output port that's specified for Condition1. All data (matched and unmatched) is then evaluated by Condition2. Data that matches Condition2 is routed to the corresponding output port. Data that does not meet Condition1 and Condition2 is routed to the output port for the Unmatched split condition.
Under Split conditions, click Add condition to add a split condition.
In the panel to add a split condition, edit the Identifier value, or leave it as-is.
In the Condition builder section, you can double-click or drag attributes, parameters, and functions to add to the editor to build a condition. You can also enter a condition expression manually into the editor, and validate the expression.
Note
In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
Click Add.
The first condition you add is inserted before the Unmatched condition. You can also see the sequence of conditions in the expanded operator icon on the canvas.
Repeat the steps to add conditions to form a sequence.
The second condition is inserted into the sequence between the first and Unmatched conditions. Subsequent conditions are always inserted before the Unmatched condition. The sequence of split conditions is also reflected in the expanded split operator icon on the canvas.
To connect an output port on a split condition to another operator on the canvas:
Expand the split operator icon to display the conditions sequence.
If the split operator icon is not expanded, you can still connect the operator to the next operator but the link is created from the port of the Unmatched condition.
Hover over the end of a condition and then drag the small circle on the right side of the condition to the next shaping operator or a target.
Connect each condition to the appropriate operator or target.
Each condition output port can be linked to multiple shaping operators and target operators. If the split operator icon is not expanded on the canvas, you can hover over a link line to see the condition name of that link.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Conditional output port menu shows the same number of items as the
number of conditions configured on the operator, including the Unmatched
condition. By default, the output from the first condition port is shown. To
change the output view, select the port of the condition output from the
menu.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Use the Condition Builder to visually select elements to create and add a split condition. You can also enter a split condition manually in the editor.
You can add split conditions to an existing sequence of conditions. Split conditions are added to the end of the sequence, before the Unmatched condition. You cannot add your own condition to the Unmatched condition.
The elements that you can use in a split condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.
Incoming displays the attributes from the upstream port. For example:
SPLIT_1.BANK.BANK_NAME='ABC Bank'
Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.
Suppose you create a VARCHAR parameter with the name P_VARCHAR_NAME, and set the default value to ABC BANK. Then you can use the parameter in a split condition as:
SPLIT_1.BANK.BANK_NAME=$P_VARCHAR_NAME
Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.
Suppose you create a VARCHAR parameter with the name P_VARCHAR_LIKE, and set the default value to B%. Then you can use the parameter in a split condition as:
SPLIT_1.BANK.BANK_NAME LIKE $P_VARCHAR_LIKE
Here's a list of functions that are available for you to add when you construct conditions:
Returns the smallest integer not greater than the numeric value
CEIL(-1,2)
FLOOR(numeric)
Returns the largest integer not greater than the numeric value.
FLOOR(-1,2)
MOD(numeric1, numeric2)
Returns the remainder after numeric1 is divided by numeric2.
MOD(8,2)
POWER(numeric1, numeric2)
Raises numeric1 to the power of numeric2.
POWER(2,3)
ROUND(numeric1, numeric2)
Returns numeric1 rounded to numeric2 decimal places.
ROUND(2.5,0)
TRUNC(numeric1, numeric2)
Returns numeric1 truncated to numeric2 decimal places.
TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])
Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.
Supported format patterns:
0: A digit
#: A digit, zero shows as absent
.: Placeholder for decimal separator
,: Placeholder for grouping separator
E: Separates mantissa and exponent for exponential formats
-: Default negative prefix
¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator
DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date)
Returns the date's day in the year.
DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date)
Returns the date's week in the year.
WEEKOFYEAR('2022-07-28') returns 30
WEEKOFYEAR('2022-07-28 13:24:30') returns 30
HOUR(datetime)
Returns the datetime's hour value.
HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date)
Returns the date's last day of the month.
LAST_DAY('2020-12-25') returns 31
MINUTE(datetime)
Returns the datetime's minute value.
HOUR('2020-12-25 15:10:30') returns 10
MONTH(date)
Returns the date's month value.
MONTH('2020-06-25') returns 6
QUARTER(date)
Returns the quarter of year the date is in.
QUARTER('2020-12-25') returns 4
SECOND(datetime)
Returns the datetime's second value.
SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr])
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31
TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17
TO_TIMESTAMP(expr, format_string[, localeStr])
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)
Returns the date's week value.
WEEK('2020-06-25') returns 4
YEAR(date)
Returns the date's year value.
YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months)
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.
Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS
A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.
Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.
If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss
year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '1' YEAR returns an interval of 1 year
INTERVAL '200' YEAR(3) returns an interval of 200 years
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH
Returns a period of time in years and months. Use to store a period of time using year and month fields.
year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]
Returns a period of time in months.
month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of days, hours, minutes, and seconds.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]
Returns a period of time in terms of days, hours, and minutes.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]
Returns a period of time in terms of days and hours.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]
Returns a period of time in terms of days.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of hours, minutes, and seconds.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]
Returns a period of time in terms of hours and minutes.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]
Returns a period of time in terms of hours.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]
Returns a period of time in terms of minutes.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of minutes and seconds.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] )
Returns the value evaluated at the row that's the first row of the window frame.
FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL.
LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ])
Returns the value evaluated at the row that is the last row of the window frame.
LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL.
LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause)
Returns the rank of the current row with gaps, counting from 1.
RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Returns the unique number of the current row within its partition, counting from 1.
ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.
INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])
Returns the (1-based) index of the first occurrence of substring in string.
INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)
Returns the string with all letters changed to lowercase.
LOWER('ORACLE') returns oracle
LENGTH(string)
Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.
LENGTH('Oracle') returns 6
LTRIM(string)
Returns the string with leading spaces removed from the left.
Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.
Replaces all occurrences of search with replacement.
If search is not found in string, then string is returned unchanged.
If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.
REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)
Returns the string with leading spaces removed from the right.
RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])
Returns the substring starting at position.
SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])
Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
Number example: TO_CHAR(123) returns 123
Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.
UPPER(string)
Returns a string with all letters changed to uppercase.
UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])
Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.
LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])
Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.
Parses a column containing a JSON string into one of the following types, with the specified schema.
Map, with String as the key type
Struct
Array
FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}
FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}
TO_JSON(column)
Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.
TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)
Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.
TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}
TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}
TO_STRUCT(string,column[,string,column]*)
Creates a new column of Struct type. The input columns must be grouped as key-value pairs.
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
Aggregate
Expression
Filter
Join
Lookup
Split
Pivot
Function
Description
Example
TRANSFORM(column, lambda_function)
Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.
For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.
For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys.
For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)
Only the Expression operator supports ARRAY_SORT.
Takes an array and sorts according to the given function that takes 2 arguments.
The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.
If the function is omitted, the array is sorted in ascending order.
array_sort(to_array(5, 6, 1),
(left, right) -> CASE WHEN left < right THEN -1
WHEN left > right THEN 1 ELSE 0 END)
You can edit any split condition except the Unmatched condition.
On the designer canvas, select the split operator of the condition you want to change.
The Details tab for the split operator displays in the Properties panel. On the canvas, you can expand and collapse the operator icon to show and hide the existing split conditions.
In the Details tab, under Split conditions, select Edit from the Actions menu () next to the split condition you want to change.
In the Edit split condition panel, edit the
Identifier value, or leave it as-is.
In the Condition builder section, you can double-click to select, or drag incoming attributes or functions to the editor to build the condition. You can also edit the condition expression manually in the editor, and validate the condition before saving it.
You can move a split condition up or down in the sequence. Only the Unmatched condition cannot be moved.
On the designer canvas, select the split operator that has the sequence of conditions you want to change.
The Details tab for the split operator displays in the Properties panel. On the canvas, you can expand and collapse the operator icon to show and hide the existing split conditions.
In the Details tab, under Split
conditions, review the sequence of split conditions.
Next to a condition row, from the Actions menu (), select the action to move that condition up or down the sequence.
Notice that the condition position in the expanded split operator icon on the canvas changes accordingly when you move a condition.
You can delete any split condition except the Unmatched condition.
On the designer canvas, select the split operator of the condition you want to delete.
The Details tab for the split operator displays in the Properties
panel. On the canvas, you can expand and collapse the operator icon to show and
hide the split conditions.
In the Details tab, under Split conditions, select Delete from the Actions menu () next to the condition you want to delete.
To delete two or more conditions, use the column next to the Sequence column to select the conditions, and then click Delete.
If the output port on a deleted condition was connected to another operator or target, ensure that you review the data flow and make the necessary changes after deleting conditions.
Pivot Operator 🔗
The pivot operator lets you take unique row values from one attribute in an input source and pivot the values into multiple attributes in the output.
Using input from multiple rows, the pivot operation performs a transformation based on aggregate function expressions, and the values of an attribute that you specify as the pivot key. The result of a pivot operation is a pivoted or rearranged output of rows and attributes.
The number of rows in the output is based on the selection of attributes to group by.
If you specify one or more attributes to group by, the incoming rows that have the same group by attribute value is grouped into one row. For example, if you specify one group by attribute that has four unique values, the incoming data is transformed and grouped into four rows in the output.
If you don't specify any attributes to group by, all the incoming data is transformed into a single output row.
The number of attributes in the output is:
Based on the number of attributes you select to group by
A multiple of the number of values you select in the pivot key
A result of the number of attributes that the aggregate function expressions transform
For example, if you select one group by attribute and three pivot key values, and add an expression that transforms two attributes, the number of attributes in the output is:
1 + (3 * 2)
The total number of attributes in the resulting pivoted output is calculated as follows:
Number of group by attributes + (Number of pivot key values * Number of attributes that are transformed from expressions)
The names of new attributes in the output are derived from a pattern you add for target attributes when you create the expressions.
The aggregate functions you use with a pivot operator determine the pivoted values in the output. If no data is found, null is inserted where a pivoted value is expected.
Consider the data entity PRODUCT_SALES, that has the attributes STORE, PRODUCT, and SALES. The data entity has five rows. You want to create a pivot on the attribute PRODUCT with an aggregate SUM function on SALES.
You specify to group pivoted rows by the attribute STORE. Each unique STORE value becomes a row in the resulting output. Input rows with the same group by value are grouped into the same row in the pivoted output. If you don't specify a group by attribute, all input rows are transformed into a single row in the resulting output.
You specify PRODUCT as the pivot key, and select all three values to pivot into new attributes in the resulting output.
The aggregate SUM function expression on SALES is:
The pivot operator performs a transformation using one or more aggregate function expressions on one or more values of an attribute that you specify as the pivot key.
You can choose to group the pivoted rows into a single row or select attributes to create multiple output rows based on the same group by value.
From the Operators panel, drag a Pivot operator onto the canvas.
With the pivot operator in focus, on the Details tab of
the Properties panel, enter a name in the Identifier
field, or leave the name as-is.
(Optional) For Group by, specify the attributes whose distinct values are used to arrange the pivoted data in grouped rows. By default, all incoming rows are aggregated and grouped into one row unless you specify one or more attributes to group the rows by.
Select Attributes and then select one or more attributes to group pivoted rows by the same value. For example, if you select an attribute that has two unique values, the resulting output has two rows of aggregated data.
Select Pattern to use a regular expression pattern to select the attributes to group by. Only attributes that match the pattern are used to group pivoted rows by the same value.
For Pivot key, select the attribute whose values can be
used as new attributes in the pivoted output. Only attributes that are of type
VARCHAR, INTEGER, FLOAT, or BOOLEAN can be used as the pivot key.
Under Pivot key values, click Add key
values to select one or more values from the selected pivot key
that you want to become pivoted attributes in the output. Up to 50 values can be
selected.
You can use an alias for a selected pivot key value. An alias cannot start
with a number. Only the _ special character is allowed.
To use an alias for a selected pivot key value, select Edit from the Actions menu () for that key value. In the Edit Pivot Key Alias panel, enter the alias for the key value.
To delete a pivot key value from the selected list, select the checkbox next to it and click Delete. Alternatively, you can select Delete from the Actions menu ().
Under Expressions, click Add expression to add the aggregate function expressions. See Adding a pivot expression.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
One or more aggregate function expressions are required for a pivot operator.
With the pivot operator selected on the data flow canvas, under
Expressions in the properties panel, click
Add expression.
In the Add Expression panel, enter a name for the expression in the Identifier field, or leave the name as-is.
(Optional) To use a pattern to apply an expression to multiple source attributes, select Allow bulk selection.
For example, suppose you have two DISCOUNT_ attributes in the dataset (DISCOUNT_VALUE and DISCOUNT_RATE) and you want to apply the MAX function to both of them.
Under Source attributes, select Pattern and click
Add pattern.
In the Add Source Pattern panel, add a pattern for selecting a group of source attributes that
have names beginning with DISCOUNT_. For example, enter
DISCOUNT* and click Add.
Then select the Data type.
Under Target attributes, use a pattern for the resulting output attribute
names.
By default, the pattern %MACRO_INPUT%_%PIVOT_KEY_VALUE% is already inserted for you. %MACRO_INPUT% corresponds to the names of the sources attributes as selected by the pattern you added. %PIVOT_KEY_VALUE% corresponds to the values as selected in the pivot key.
For example, if %PIVOT_KEY_VALUE% indicates TELEVISION, and %MACRO_INPUT% indicates DISCOUNT_VALUE and DISCOUNT_RATE, then in the output, the pivoted attributes are <pivot_name>.<expression_name>.DISCOUNT_VALUE_TELEVISION and <pivot_name>.<expression_name>.DISCOUNT_RATE_TELEVISION.
You can retain the Use source attribute data types selection. Otherwise, under
Expression data type, select the Data
type and complete the fields corresponding to the type
selected.
If you did not select Allow bulk selection, under Target
attributes, use a pattern for the resulting output attribute
names.
By default, the pattern %PIVOT_KEY_VALUE% is already inserted for you. %PIVOT_KEY_VALUE% corresponds to the values as selected in the pivot key.
For example, if %PIVOT_KEY_VALUE% indicates TELEVISION and TELEPHONE, then in the output, the pivoted attributes are <pivot_name>.<expression_name>.TELEVISION and <pivot_name>.<expression_name>.TELEPHONE.
Under Expression data type, select the Data type
and complete the fields corresponding to the type selected.
In the Expression builder section, double-click or drag incoming attributes, parameters, and aggregate functions to add to the editor to build the expression. You can also manually write the expression yourself, and validate the expression.
The following table shows the list of aggregate functions that are available for constructing pivot expressions.
Function
Description
Example
COUNT(value[, value]*)
Returns the number of rows for which one or more supplied expressions are all non-null.
COUNT(expr1)
COUNT(*)
Returns the total number of retrieved rows, including rows containing null.
COUNT(*)
MAX(value)
Returns the maximum value of the argument.
MAX(expr)
MIN(value)
Returns the minimum value of the argument.
MIN(expr)
SUM(numeric)
Returns the sum calculated from values of a group.
To construct a pivot expression, specify the attribute or attributes and the aggregate function.
If you selected Allow bulk selection, then use %MACRO_INPUT% in the expression to indicate the attributes on which the function is to be applied.
For example, if you used the pattern DISCOUNT* to match source attributes DISCOUNT_RATE and DISCOUNT_VALUE, then you can specify an aggregate function such as SUM(numeric) to apply the function on all the attributes that match the pattern. %MACRO_INPUT% replaces the numeric placeholder in the function:
SUM(%MACRO_INPUT%)
If you did not select Allow bulk selection, then specify the attribute in the function.
For example, the data entity is PRODUCT_SALES and you want to use an aggregate SUM function on the SALES attribute. You can specify the function such as SUM(numeric), replacing the numeric placeholder in the function with the attribute name:
SUM(PIVOT_1_1.PRODUCT_SALES.SALES)
You can use an expression parameter for the aggregate function name in the pivot expression. An expression parameter has a name, type, and default value.
For example, the expression parameter P_VARCHAR has type VARCHAR and MIN as the default value. You can specify the aggregate function as:
$P_VARCHAR(%MACRO_INPUT%)
$P_VARCHAR(PIVOT_1_1.PRODUCT_SALES.SALES)
In the Add Expression panel, click Add.
Lookup Operator 🔗
The lookup operator performs a query and transformation using a lookup condition and input from two sources, a primary input source and a lookup input source.
The query operation uses the condition and a value in the primary input to find rows in the lookup source. The transformation appends attributes from the lookup source to the primary source.
You can specify the action to take when multiple rows and no rows are returned by the lookup query. For example, you can specify that the action is to skip non-matching rows, and to return any one matching row when there are multiple matching rows.
The resulting output is a combination of both input sources based on the lookup condition, a value in the primary input, and the preferred actions to take. The primary input determines the order of attributes and rows in the output, with the attributes from the primary input placed before the attributes from the lookup input.
Consider two source data entities in a data flow. Data entity 1 (PAYMENTS) is set as the primary input. Data entity 2 (CUSTOMERS) is set as the lookup input. The lookup condition is set as:
The resulting lookup output shows how the data from the two inputs are combined and transformed. Attributes from the lookup source are appended to the primary source attributes, with the following behaviors:
If the operation does not find a matching record for a value in the lookup source, the record is returned with null inserted for the lookup attributes. For example, no matched records are found for the CUSTOMER_ID values 103, 104, and 105. Therefore, in the resulting output, null is populated in the appended attributes CUSTOMER_ID and NAME.
If the operation finds multiple matching records for a value in the lookup source, any one matching record is returned.
A lookup operator uses two input sources on a data flow.
The following procedure assumes you have added and configured two source operators.
From the Operators panel, drag a Lookup operator onto the canvas.
With the lookup operator in focus, on the Details tab of
the Properties panel, enter a name in the Identifier
field, or leave the name as-is.
For Primary input, select the source that contains a
value for matching one or more rows in the lookup source. Matching is based on a
lookup condition, which you create in step 5.
The primary input also determines the order of attributes and rows in the
resulting lookup output.
For Lookup input, select the source that the primary
source references to match one or more rows.
Attributes from the lookup input are appended to the primary input attributes
in the resulting output.
For Multiple match, select the action to perform when multiple matching rows are found between the primary source and the lookup source. The action you select determines the rows to use in the resulting output.
Return any one matching row: Default action. Selects any one matching row.
Return the first matching row: Selects the first row from the rows that match.
Return the last matching row: Selects the last row from the rows that match.
Return error: Displays an error. Does not complete the operation.
Return all matching rows: Selects all matching rows.
For No match, select the action to performs when no matching row is found between the primary source and the lookup source. The action you select determines how to handle the primary source row that does not have a matching row in the lookup source.
Skip non-matching rows: Does not include non-matching rows in the resulting output.
Return non-matching rows: Default action. Includes non-matching rows in the resulting output with null values.
Next to Lookup condition, click
Create.
In the Create Lookup Condition panel, Condition Builder section, you can double-click or drag attributes, parameters, and functions to add to the editor to build a condition. You can also enter the condition expression manually in the editor, and validate the expression.
Note
In the editor, an added element such as a function might have placeholders. To replace a placeholder with another element, highlight the placeholder, and then double-click another element from the list.
Click Create.
(Optional)
Click Assign parameter to use parameters so that the lookup condition
is not bound to the compiled code when you publish the data flow. See Assigning a Parameter.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
Use the Condition Builder to visually select elements to create a lookup condition. You can also enter a condition manually in the editor.
A lookup condition lets you use a value in a primary input source to look up records in a lookup input source, returning any rows that match. Rows that do not have a match are returned with null values.
The elements that you can use in a lookup condition include incoming attributes, parameters, and functions. You can double-click or drag an element from the list to add to the editor to build a condition. You can validate the condition before creating it.
Incoming displays the attributes from the upstream input ports in two separate LOOKUP folders. View the attributes from each port by expanding or collapsing the appropriate LOOKUP folder. For example, LOOKUP_1_1 is the primary input, LOOKUP_1_2 is the lookup input, lookup conditions based on a value in the primary input attribute ADDRESS_ID could be:
Parameters are the expression parameters that have been added to the data flow using the Condition Builder (filter, join, lookup, and split operators) or Expression Builder (expression and aggregate operators). An expression parameter has a name, type, and default value. See Adding an Expression Parameter.
Suppose you want to look up bank customers of a specific bank. You can create a VARCHAR parameter with the name P_LOOK_UP, and set the default value to 2001, which is the specific bank value. Then you can create the lookup condition as:
LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = LOOKUP_1_2.BANK_CUSTOMER.ADDRESS_ID AND LOOKUP_1_1.BANK_ADDRESS.ADDRESS_ID = $P_LOOK_UP
Functions are the functions available in Data Integration that you can use in a condition. Functions are operations performed on arguments passed to the function. Functions calculate, manipulate, or extract data values from arguments.
Here's a list of functions that are available for you to add when you construct conditions:
Returns the smallest integer not greater than the numeric value
CEIL(-1,2)
FLOOR(numeric)
Returns the largest integer not greater than the numeric value.
FLOOR(-1,2)
MOD(numeric1, numeric2)
Returns the remainder after numeric1 is divided by numeric2.
MOD(8,2)
POWER(numeric1, numeric2)
Raises numeric1 to the power of numeric2.
POWER(2,3)
ROUND(numeric1, numeric2)
Returns numeric1 rounded to numeric2 decimal places.
ROUND(2.5,0)
TRUNC(numeric1, numeric2)
Returns numeric1 truncated to numeric2 decimal places.
TRUNC(2.5,0)
TO_NUMBER(expr[, format, locale])
Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US. Supported language tags.
Supported format patterns:
0: A digit
#: A digit, zero shows as absent
.: Placeholder for decimal separator
,: Placeholder for grouping separator
E: Separates mantissa and exponent for exponential formats
-: Default negative prefix
¤: Currency sign; replaced by currency symbol; if doubled, replaced by international currency symbol; if present in a pattern, the monetary decimal separator is used instead of the decimal separator
DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on.
DAYOFYEAR(date)
Returns the date's day in the year.
DAYOFYEAR('2020-12-25') returns 360
WEEKOFYEAR(date)
Returns the date's week in the year.
WEEKOFYEAR('2022-07-28') returns 30
WEEKOFYEAR('2022-07-28 13:24:30') returns 30
HOUR(datetime)
Returns the datetime's hour value.
HOUR('2020-12-25 15:10:30') returns 15
LAST_DAY(date)
Returns the date's last day of the month.
LAST_DAY('2020-12-25') returns 31
MINUTE(datetime)
Returns the datetime's minute value.
HOUR('2020-12-25 15:10:30') returns 10
MONTH(date)
Returns the date's month value.
MONTH('2020-06-25') returns 6
QUARTER(date)
Returns the quarter of year the date is in.
QUARTER('2020-12-25') returns 4
SECOND(datetime)
Returns the datetime's second value.
SECOND('2020-12-25 15:10:30') returns 30
TO_DATE(string, format_string[, localeStr])
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US. Supported language tags.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported case-sensitive format strings are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
TO_DATE('31 December 2016', 'dd MMMM yyyy') returns a Date value of 2016-12-31
TO_DATE('2018/junio/17', 'yyyy/MMMM/dd', 'es-ES') returns a Date value of 2018-06-17
TO_TIMESTAMP(expr, format_string[, localeStr])
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported format patterns are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020
WEEK(date)
Returns the date's week value.
WEEK('2020-06-25') returns 4
YEAR(date)
Returns the date's year value.
YEAR('2020-06-25') returns 2020
ADD_MONTHS(date_expr, number_months)
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS.
Returns the number of months between start_date_expr and end_date_expr. start_date_expr and end_date_expr can be a date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS
A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month.
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format.
Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC.
If fmt is omitted, the default format is yyyy-MM-dd HH:mm:ss
year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '1' YEAR returns an interval of 1 year
INTERVAL '200' YEAR(3) returns an interval of 200 years
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH
Returns a period of time in years and months. Use to store a period of time using year and month fields.
year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3.
INTERVAL 'month' MONTH[(month_precision)]
Returns a period of time in months.
month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.)
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3.
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of days, hours, minutes, and seconds.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)]
Returns a period of time in terms of days, hours, and minutes.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)]
Returns a period of time in terms of days and hours.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours
INTERVAL 'day' DAY[(day_precision)]
Returns a period of time in terms of days.
day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2.
INTERVAL '999' DAY(3) returns an interval of 999 days
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of hours, minutes, and seconds.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)]
Returns a period of time in terms of hours and minutes.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes
INTERVAL 'hour' HOUR[(hour_precision)]
Returns a period of time in terms of hours.
hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2.
INTERVAL '40' HOUR returns an interval of 40 hours
INTERVAL 'minute' MINUTE[(minute_precision)]
Returns a period of time in terms of minutes.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
INTERVAL '15' MINUTE returns an interval of 15 minutes
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)]
Returns a period of time in terms of minutes and seconds.
minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2.
fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9.
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] )
Returns the value evaluated at the row that's the first row of the window frame.
FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL.
LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME. If there is no such value, hello is returned.
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ])
Returns the value evaluated at the row that is the last row of the window frame.
LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME.
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause)
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL.
LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME. If there is no such value, hello is returned.
RANK() OVER([ partition_clause ] order_by_clause)
Returns the rank of the current row with gaps, counting from 1.
RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Returns the unique number of the current row within its partition, counting from 1.
ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID, in ascending order of BANK_NAME.
Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space.
INITCAP('oRACLE sql') returns Oracle Sql
INSTR(string, substring[start_position])
Returns the (1-based) index of the first occurrence of substring in string.
INSTR('OracleSQL', 'SQL') returns 7
LOWER(string)
Returns the string with all letters changed to lowercase.
LOWER('ORACLE') returns oracle
LENGTH(string)
Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces.
LENGTH('Oracle') returns 6
LTRIM(string)
Returns the string with leading spaces removed from the left.
Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group.
Replaces all occurrences of search with replacement.
If search is not found in string, then string is returned unchanged.
If replacement is not specified or is an empty string, nothing replaces the search that is removed from string.
REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF
RTRIM(string)
Returns the string with leading spaces removed from the right.
RTRIM('Oracle ')
SUBSTRING(string, position[, substring_length])
Returns the substring starting at position.
SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale])
Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US. See supported language tags.
In pipeline expressions, the format_string must use the strftime format codes. Otherwise, the supported date format patterns are:
yy: two digit year
yyyy: four digit year
M: Numeric month, such as 1 for January
MM: Numeric month, such as 01 for January
MMM: Abbreviated month, such as Jan
MMMM: Full month, such as January
d: Numeric day of the month, such as 1 for June 1
dd: Numeric day of the month, such as 01 for June 1
DDD: Numeric day of the year from 001 to 366, such as 002 for January 2
F: Numeric day of the week in a month, such as 3 for 3rd Monday in June.
EEE or E: Abbreviated named day of the week, such as Sun for Sunday
EEEE: Named day of the week, such as Sunday
HH: 24 hour format from 00 to 23
H: 24 hour format from 0 to 23
hh: 12 hour format from 01 to 12
h: 12 hour format from 1 to 12
mm: minutes from 00 to 59
ss: seconds from 00 to 59
SSS: milliseconds from 000 to 999
a: AM or PM
z: time zone such as PDT
Number example: TO_CHAR(123) returns 123
Date example: TO_CHAR(Date '2020-10-30', 'yyyy.MM.dd', 'en-US') returns the string 2020.10.30. The first argument is a Date object representing Oct 30th, 2020.
UPPER(string)
Returns a string with all letters changed to uppercase.
UPPER('oracle') returns ORACLE
LPAD(str, len[, pad])
Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.
LPAD('ABC', 5, '*') returns '**ABC'
RPAD(str, len[, pad])
Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space.
Parses a column containing a JSON string into one of the following types, with the specified schema.
Map, with String as the key type
Struct
Array
FROM_JSON('{\"Zipcode\":704,\"City\":\"ORACLE CITY\"}', 'STRUCT<Zipcode: BIGINT, City: STRING>') returns a Struct type column with the specified schema: {704, ORACLE CITY}
FROM_JSON('{\"a\":1, \"b\":0.8}', 'STRUCT<a: BIGINT, b: DOUBLE>') returns a Struct type column with the specified schema: {1, 0.8}
TO_JSON(column)
Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string.
TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}}
TO_MAP(string,column[,string,column]*)
Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type.
TO_MAP('Ename',Expression_1.attribute1) returns a Map type column: {"ENAME" -> 100}
TO_MAP('block', EXPRESSION_1.MYSOURCE.address.block, 'unit', EXPRESSION_1.MYSOURCE.address.unit) returns a Map type column: {"block" -> 1,"unit" -> 1}
TO_STRUCT(string,column[,string,column]*)
Creates a new column of Struct type. The input columns must be grouped as key-value pairs.
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
Aggregate
Expression
Filter
Join
Lookup
Split
Pivot
Function
Description
Example
TRANSFORM(column, lambda_function)
Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array.
For an input array of integers [1, 2, 3], TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4].
TRANSFORM_KEYS(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values.
For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'}, TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'}.
TRANSFORM_VALUES(column, lambda_function)
Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys.
For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'}, TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'}.
ARRAY_SORT(array(...), lambda_function)
Only the Expression operator supports ARRAY_SORT.
Takes an array and sorts according to the given function that takes 2 arguments.
The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element.
If the function is omitted, the array is sorted in ascending order.
array_sort(to_array(5, 6, 1),
(left, right) -> CASE WHEN left < right THEN -1
WHEN left > right THEN 1 ELSE 0 END)
The returned array is:
[1,5,6]
Function Operator 🔗
Use the function operator to invoke Oracle Cloud Infrastructure Functions from within Data Integration.
Note
For complex data structures such as Map, Array and Composite (Struct), only the JSON serialization format is currently supported.
Before you use OCI Functions from within a data flow in Data Integration, ensure that you understand the dependencies, and have completed the prerequisite tasks.
In a production environment, you might want to restrict users to invoking functions in a specific application, or to invoking a specific function only.
For example, to restrict users to invoking functions in a specific workspace, enter the policy statements in the following format:
allow any-user to use fn-invocation in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>'}
allow any-user to read fn-function in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>'}
To restrict users to invoking a specific function in a workspace:
allow any-user to use fn-invocation in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>', target.function.id='<function-ocid>'}
allow any-user to read fn-function in compartment <compartment-name> where ALL {request.principal.type='disworkspace', request.principal.id='<workspace-ocid>', target.function.id='<function-ocid>'}
An application is a logical grouping of functions.
In an application, you specify one to three subnets in which to run functions. Functions running in one application are isolated from functions running in another application.
For the function in OCI Functions to work with the Data Integration function operator in a data flow, the function must not read or write any identifier with a space character.
When you deploy a function to OCI Functions using the Fn Project CLI, the function is built as a Docker image and pushed to a specified Docker registry.
A function operator lets you use a function deployed in OCI Functions to process data from an input source. Primitive and complex data types are supported.
Specify the input shape of the function, and the input and output attributes for the function operator to read and write. Then manually map the source attributes to the input attributes.
Currently, only the JSON serialization format is supported for complex data structures such as Map, Array, and Composite (Struct).
The following procedure assumes you have added and configured a source operator with the data entity you want to use with the function.
From the Operators panel, drag a Function operator onto the canvas.
With the function operator in focus, on the Details tab
of the Properties panel, enter a name in the Identifier
field, or leave the name as-is.
Select the Serialization format and define the function
input and output data.
Specify the Input attributes and Output
attributes for the operator, and the Function
configuration values for the function. See Adding a function operator property.
The Input attributes define the input shape for the
function operator, which is one or more incoming attributes for the function
to work on. Input attributes are mapped to the incoming attributes from the
data entity source.
The Function configuration values define the input
shape of the function, which is one or more fields that are defined in the
function. Define a function field with a name and a value. The value of a
field matches an input attribute. You can parameterize a function
configuration value.
The Output attributes define the output shape for the
function operator, which is one or more attributes for the function to write
output to. Add an output attribute for each function field that you want in
the output. Subsequent downstream operators can use the output
attributes.
Note
Do not use a space character in the identifier name for an input attribute, an output attribute, or a function field. Also, names such as "Element", "Key", and "Value" are not permitted.
(Optional)
Under Function properties, expand Function configuration to review the function configuration values. You can change or parameterize a value.
BATCH_SIZE is a pre-defined function configuration that represents the number of rows that a function processes at a time. The default is 10,000 rows.
For a function configuration value, you can:
Select Edit from the Actions menu () to change the value.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the
rules that have been applied to the data entity. By default, the first
rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Map tab, drag and drop an attribute from the data entity source to the corresponding input attribute on the function to create a manual mapping.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The function output attributes are appended to the data set.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
After selecting a deployed function, specify the properties that define the input data for the function to process, the fields in the function configuration, and the output data that the function returns.
Note
Do not use a space character in the identifier name for an input attribute, an output attribute, or a function field. Also, names such as "Element", "Key", and "Value" are not permitted.
Under Function properties, click Add
property.
In the panel, select the Type of property you want to specify.
Input attributes: Input shape for the operator. Specify one or more attributes to represent the incoming attributes for the function to process. Input attributes are mapped to the incoming attributes from the data entity source.
Function configuration: Specify one or more function fields that define the input shape of the function. A field has a name and a value. The value of a field specifies an input attribute.
Output attributes: Output shape for the operator. Specify zero or more attributes to represent the output from the function after the data is processed. Add an output attribute for each function field that you want in the output. Subsequent downstream operators can use the output attributes.
Enter a name for the property in the Identifier field. Do not use a space
character in the name for an input attribute, an output attribute, or a function
field.
Select the Data type of the property.
For a primitive data type, depending on the Type and Data
type of the property you're specifying, specify the following
fields, and other applicable fields.
Length: For an input or output attribute, enter a length. For example, you specify a length for a numeric or varchar data type.
Value: For a function field, enter a value that matches an input attribute.
For a Map complex data type, specify the key-value pair for a map element. Click Add data type to select the key data type and value data type.
Key: Select from primitive data types only.
Value: Select from primitive and complex data types.
For an Array complex data type, click Add data type to specify the Element data type. You can select from primitive and complex data types.
For a Composite (Struct) data type, specify the schema child or schema children. To add a schema child, click the + symbol next to the parent schema name.
In the Add Field panel for a schema child, enter the Identifier, and
select a Data type. You can select from primitive and
complex data types. Click Add to add the schema
child.
Click the + symbol again if you want to add more than one schema child to the parent schema.
To edit or delete a schema child, use the Actions menu () at the end of the row.
You cannot delete the parent schema element.
In the Add Property panel, click Add.
Repeat the steps to add the properties you need to define the function input, function fields, and function output.
Flatten Operator 🔗
Use the flatten operator to transform hierarchical data into a simplified format for use with other relational data. The flattening process is also known as denormalization or unnesting.
The hierarchical file formats that you can denormalize are:
JSON and multi-line JSON
Avro
Parquet
Currently, the supported hierarchical data type that you can unnest is Array. You can select only one Array node in a hierarchical dataset to flatten. The data structure is flattened from the root to the selected Array node, and presented in a simplified format. If the dataset has other Array nodes, you have the option to convert those into strings.
You can configure the attributes or fields to include in the flattened output by setting
Projection preferences.
Projection preferences are settings for the format, number, and type of projected
attributes to include in the output after flattening the incoming data.
Data Integration provides four project preferences
settings that are selected by default, resulting in a projected output that has the
following:
Index arrays
All attributes up to the attribute that is selected for flattening
Null values for the attributes that are missing from the subsequent arrays
Parent name lineage in the attribute names
To understand the projection preferences that you can configure, use the following topics
with the JSON data structure example.
JSON data structure example
id
vehicle[]
make
model
insurance{}
company
policy_number
maintenance[]
date
description[]
dealerdetails[]
servicestation1[]
dealeraddress[]
city
state
contact
dealername
servicestation2[]
dealeraddress[]
city
state
contact
dealername
The projection preference setting Create and project array
indexes lets you control whether to include index attributes for arrays in
the flattened output.
The index attribute represents the index of an array. Suppose that the array is
["a","b","c"], then the index for "a" is 0, the index for "b" is 1,
and the index for "c" is 2.
By default, Data Integration creates an attribute with
the suffix _INDEX for the array node that you select to flatten data
to. The data type of an array index attribute is Integer.
Index attributes are also created for the parent array nodes of the selected array node.
The flattening operation impacts all parent array nodes in the hierarchical structure
from the root to the selected node. If an impacted parent array node has sibling arrays,
index attributes are not created for those sibling array nodes.
In the JSON data structure example, if you select the array dealeraddress that's under servicestation2 for flattening, Data Integration creates five array index attributes: one for the selected array node, and four for the parent nodes impacted by the flattening operation.
After flattening, the attributes in the simplified structure are as follows:
id
vehicle_INDEX
vehicle_make
vehicle_model
vehicle_insurance
company
policy_number
vehicle_maintenance_INDEX
vehicle_maintenance_date
vehicle_maintenance_description
vehicle_maintenance_dealerdetails_INDEX
vehicle_maintenance_servicestation1
dealeraddress
city
state
contact
dealername
vehicle_maintenance_servicestation2_INDEX
vehicle_maintenance_servicestation2_dealeraddress_INDEX
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
vehicle_maintenance_servicestation2_dealeraddress_dealername
An array index attribute is not created for servicestation1 because the
array node is a sibling of the impacted parent node servicestation2.
Similarly, an index attribute is not created for description because
the array node is a sibling of the impacted parent node
dealerdetails.
If you do not select the projection preference Create and project array
indexes, the attributes in the simplified structure are as follows:
id
vehicle_make
vehicle_model
vehicle_insurance
company
policy_number
vehicle_maintenance_date
vehicle_maintenance_description
vehicle_maintenance_servicestation1
dealeraddress
city
state
contact
dealername
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
vehicle_maintenance_servicestation2_dealeraddress_dealername
The projection preference setting Retain all attributes up to the
flattened array lets you control whether to include in the flattened output
those attributes that are not part of the flattened arrays.
By default, Data Integration shows all attributes from
the root to the selected array, including sibling arrays not impacted by flattening.
In the JSON data structure example, if you do not select the projection preference
Retain all attributes up to the flattened array, the
attributes in the simplified structure are as follows:
id
vehicle_INDEX
vehicle_maintenance_INDEX
vehicle_maintenance_dealerdetails_INDEX
vehicle_maintenance_servicestation2_INDEX
vehicle_maintenance_servicestation2_dealeraddress_INDEX
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
The projection preference setting Produce a single row with null values
for missing parent array lets you control whether to skip rows that do not
have the attributes impacted by flattening.
The effect of the Produce a single row with null values for missing parent
array setting is seen only in the Data tab. By default, Data Integration shows null values for the attributes
that are missing from the subsequent arrays.
For example, the following is a projected output with null values:
The projection preference setting Retain parent name lineage in attribute
names lets you control whether to include the parent names in the child
attribute names.
By default, Data Integration creates child node names
with their parent node names.
In the JSON data structure example, the attributes with parent names in the simplified
structure are as follows (assuming sibling arrays not impacted by flattening are
excluded in the output):
id
vehicle_INDEX
vehicle_maintenance_INDEX
vehicle_maintenance_dealerdetails_INDEX
vehicle_maintenance_servicestation2_INDEX
vehicle_maintenance_servicestation2_dealeraddress_INDEX
vehicle_maintenance_servicestation2_dealeraddress_city
vehicle_maintenance_servicestation2_dealeraddress_state
vehicle_maintenance_servicestation2_dealeraddress_contact
If you do not select the projection preference Retain parent name lineage in
attribute names, the attributes are as follows:
id
vehicle_INDEX
maintenance_INDEX
dealerdetails_INDEX
servicestation2_INDEX
dealeraddress_INDEX
city
state
contact
The following procedure assumes you have added a source operator and configured the
operator to an Object Storage source with complex data types, such as a JSON
file.
From the Operators panel, drag the Flatten operator onto the canvas, and connect the operator to the source operator.
The Details tab for the flatten operator displays in the Properties panel.
The attributes table shows the tree data structure of the incoming data. By default, all nodes are expanded. Each node has a data type and a path such as VARCHAR and parent1.parent2.node1. For example: vehicle.insurance.company
In the search field, you can search for top-level nodes only.
With the flatten operator in focus, use the Details tab
to do the following:
Enter a name for the operator in the Identifier
field, or leave the name as-is.
Under Projection preferences, review the settings. By default, all settings are selected. Clear the checkbox of the settings that you do not want configured for this operation.
In the attributes tables, select the checkbox next to the array node that you want to flatten the data structure to. Then select Flatten data to selected attribute.
You can select one checkbox only.
After flattening, the path of the complex attribute you selected and flattened is displayed above the table, next to Flatten by. For example: FLATTEN_1.MYFILE_JSON.vehicle.maintenance.dealerdetails.servicestation2.dealeraddress
(Optional)
Where applicable in the flattened structure, select the actions icon
next to an array node in the table, and select Convert array into string. See
also Array
Conversions.
(Optional)
To revert the flattening you selected in step c, select
Clear that is next to the Flatten
by path. See Removing a Flattening
Path.
Under the Attributes tab, select to view incoming
attributes or output attributes. Incoming attributes link to the operator on the
left side. Output attributes link to the operator on the right side and go into
the next operator.
You can select individual attributes or use the filter icons to filter the
list of attributes. Then apply bulk exclude rules to the selected or
filtered attributes.
Click the filter icon in the Name column to
filter attributes by a name pattern. In the filter field, enter a simple
regular expression. You can use wildcards ? and * in the regex
pattern.
Click the filter icon in the Type column to
filter attributes by a data type. Use the menu to select the type to use
as a filter.
Note
You can apply only one name pattern filter, but
multiple type filters at a time. For example, if you apply one name
pattern filter (*_CODE) and two type filters (numeric, varchar), the
attributes are filtered by the name pattern *_CODE and the type
numeric or varchar.
Use the Actions menu to apply bulk exclude rules
to the selected attributes or filtered attributes. Select
Exclude by selection or Exclude by
applied filter.
Click View rules to open the Rules panel. You can
view and manage the rules that have been applied to the data entity. By
default, the first rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on
the configuration of the operator and any rules you have applied in the
Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations
to individual attributes or do a bulk transformation on a group of
attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause the data flow to fail.
After you have selected one complex attribute for flattening, there might be other
Array nodes in the flattened structure.
With the other Array nodes in a flattened structure, you can convert an Array of
primitive data types or an Array of Structs into a String. Currently, the supported
format of a conversion String is JSON.
Depending on the data, the data type of a conversion String is VARCHAR or BLOB. You
can change a conversion String of VARCHAR to CLOB, and conversely from CLOB to
VARCHAR.
The following procedure assumes you have added a flatten operator and selected a
complex attribute to create a flattened data structure.
With the flatten operator in focus on the data flow canvas, go to the
Details tab in the Properties panel.
In the attributes table, verify that the data structure is flattened.
Next to Flatten by is the path of the complex
attribute you have selected for flattening. For example:
FLATTEN_1.MYFILE_JSON.vehicle.maintenance.dealerdetails.servicestation2.dealeraddress
The actions icon (three dots) is displayed for Array nodes that are available
in the flattened structure for converting.
Where applicable, select the actions icon next to an Array node and then select
Convert array into string.
The following procedure assumes you have selected an Array node to be converted into
a String, if applicable.
With the flatten operator in focus on the data flow canvas, go to the
Details tab in the Properties panel.
Select View array conversions.
In the Array conversions panel that displays, review the
complex attributes that have been converted into a String.
To change the data type of a conversion String, select an update action from
the actions icon. Depending on the data type, select Update data type
to CLOB or Update data type to
VARCHAR.
Do any of the following to remove conversions:
To remove a single conversion, select the checkbox next to a conversion, then select Remove. Alternatively, you can select Remove from the actions icon for a conversion.
To remove all conversions, select the checkbox at the top of the list (next to the header Array conversions), then select Remove.
A selected conversion is removed and the previously converted attribute is
reverted to its original hierarchy structure and data type.
When configuring a flatten operator on a data flow, you can revert a flattening by
clearing the Flatten by path.
With the flatten operator in focus on the data flow canvas, go to the
Details tab in the Properties panel.
Above the attributes table, next to the Flatten by path,
select Clear.
In the Clear flattening dialog that appears, confirm the complex attribute name that you want to unflatten, then select Clear to proceed.
By removing flattening, the original data structure before flattening is restored:
If created, any attributes with the suffix _INDEX are removed.
If parent names were removed from child node names, the parent-child name lineage hierarchy is restored.
If some arrays were converted into strings, the array conversions are removed and the previously converted attributes are reverted to their original hierarchy structure and data type.
Table Function Operator 🔗
Use the table function operator to add table functions that return data in the form of tables.
Currently, only built-in table functions can be selected from a list. The table functions that are provided by the table function operator include cube, cross tabulation analysis, frequent items, rollup, and Spark SQL. Each built-in table function has predefined fields that you configure to return the desired collection of data. Complex logic can be consolidated into a single function that returns specific sets of data. Parameterization of the fields allows for flexibility in adjusting the logic.
A table function operator can be connected with any other operator within a data flow. Thus, a table function operator can be used as a data source, a mid-stream operator, and a target.
From the Operators panel, drag and drop a Table function operator onto the canvas.
With the operator in focus, on the Details tab of the Properties panel, enter a name in the Identifier field, or leave the name as-is.
For Transformation type, select the table function that you want to use for this operator.
Under Configure parameters, perform the following steps:
Configure the table function's core logic by specifying the function details. Only specific function input keys and their values can be configured, as shown on the Details tab for the selected table function.
After completing the input fields, click Calculate output shape.
Data Integration generates the output fields using the core logic details, and materializes the input/output shapes. Use the Attributes tab to view the input/output shapes.
Under the Attributes tab, select to view incoming attributes or output attributes. Incoming attributes link to the operator on the left side. Output attributes link to the operator on the right side and go into the next operator.
You can select individual attributes or use the filter icons to filter the list of attributes. Then apply bulk exclude rules to the selected or filtered attributes.
Click the filter icon in the Name column to filter attributes by a name pattern. In the filter field, enter a simple regular expression. You can use wildcards ? and * in the regex pattern.
Click the filter icon in the Type column to filter attributes by a data type. Use the menu to select the type to use as a filter.
Note
You can apply only one name pattern filter, but multiple type filters at a time. For example, to filter by the name pattern *_CODE and by the type numeric or varchar, you apply one name pattern filter (*_CODE) and two type filters (numeric, varchar).
Use the Actions menu to apply bulk exclude rules to the selected attributes or filtered attributes. Select Exclude by selection or Exclude by applied filter.
Click View rules to open the Rules panel. You can view and manage the rules that have been applied to the data entity. By default, the first rule in the Rules panel includes everything.
A complex type is shown as ARRAY (data type), COMPOSITE, or MAP (key type, value type). See Hierarchical Data Types to understand what is supported for complex types.
In the Data tab, view a sampling of the data, based on the configuration of the operator and any rules you have applied in the Attributes tab.
The Data Profile does not appear for complex data attributes. For complex data attributes, to view the hierarchy of the data structure in a simplified structure, click the complex data shown. For example, [...] or {…}.
You can apply transformations to individual attributes or do a bulk transformation on a group of attributes.
For entities with hierarchical data types, see Hierarchical Data Types to understand what is supported.
In the Validation tab, check for warnings or errors that might cause your data flow to fail.
Persists and caches a dataset according to the given memory storage level.
Returns a new dataset in the cached resilient distributed dataset (RDD).
Parameter
Description
level
The memory storage to use:
MEMORY_ONLY: Store as deserialized Java objects in the JVM. If memory is insufficient, some RDD partitions are not cached, but are recomputed as and when needed.
MEMORY_AND_DISK: Store as deserialized Java objects in the JVM. If memory is insufficient, some RDD partitions are stored on disk, and read from disk as and when needed.
MEMORY_ONLY_SER: Store as deserialized Java objects.
MEMORY_AND_DISK_SER: Store as deserialized Java objects. Partitions that don't fit are stored on disk.
DISK_ONLY: Store the RDD partitions only on disk.
MEMORY_ONLY_2, MEMORY_AND_DISK_2, etc: Similar to other levels but each partition is replicated on two cluster nodes.
Computes a pair-wise frequency table or contingency table from distinct values of the given two columns.
The number of distinct values for each column must be less than 1e4.
Returns a dataframe that contains the contingency table. In the contingency table:
The name of the first column is col1_col2, where col1 is the first column and col2 is the second column before transformation. The first column of each row are the distinct values of col1.
The other column names are the distinct values of col2.
The counts are returned as the long type.
Pairs without occurrences are given zero as the count.
The maximum number of non-zero pairs is 1e6.
Null elements are replaced by null.
Parameter
Description
col1
Name of the first column. See the note after this table for limitations.
Distinct items from col1 make the first item of each row in the returned dataframe.
col2
Name of the second column. See the note after this table for limitations.
Distinct items from co2 make the column names in the returned dataframe.
Note
Only the following characters are allowed in a column name:
Generates a multi-dimensional cube of all possible combinations using the given columns list and the calculations in the given aggregate function expressions.
The following aggregate functions are supported in the expressions:
AVG
COUNT
MEAN
MAX
MIN
SUM
Returns a dataframe after the cube transformation.
Parameter
Description
cubeColumns
The list of columns, separated by commas, by which to generate the multi-dimensional cube.
aggExpressions
The aggregate function expressions to perform on the columns.
If cubeColumns is department, region and aggExpressions issalary -> avg, age -> max (compute the average salary and maximum age), the after transformation cube is:
Generates a multi-dimensional rollup of possible combinations using the given columns list and the calculations in the given aggregate function expressions.
The following aggregate functions are supported in the expressions:
AVG
COUNT
MEAN
MAX
MIN
SUM
Returns a dataframe after the rollup transformation.
Parameter
Description
rollupColumns
The list of columns, separated by commas, by which to generate the multi-dimensional rollup.
aggExpressions
The aggregate function expressions to perform on the columns.
If rollupColumns is department, region and aggExpressions is salary -> avg, age -> max (compute the average salary and maximum age), the after transformation rollup is: