Using Data Flow Operators

In Data Integration, data flow operators represent input sources, output targets, and transformations that can be used in a data flow.

From the Operators panel, drag operators onto the canvas to design a data flow. Then use the Details tab on the Properties panel to configure basic and required properties for each operator. Where applicable, use the Advanced options tab to specify other properties.

In general, a data flow operator can have one or more inbound ports, and one or more outbound ports for data to flow through. For example, you can connect the same source outbound port to inbound ports on a filter, a join, and a target operator. You can also connect another source operator to the same join inbound port.

A data flow must include at least one source operator and one target operator to be valid. While Data Integration supports multiple target operators in a data flow, a target operator can have only one inbound port.

To connect operators, hover over an operator until you see the connector (small circle) on the right side of the operator. Then drag the connector to the next operator you want to connect to. A connection is valid when a line connects the operators after you drop the connector.

A connection line symbolizes how data flows from one node to the other. While you can drag a visible connector from one object to another, you can't have more than one inbound connection line to a filter, expression, aggregate, distinct, sort, and target operator.

Important

For complex data types, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab and Data tab in the Properties panel.

Data Operators

Data Integration provides the source operator and target operator for adding input and output data entities to serve as the input to data flows, and the output for transformed data.

Source Operator

Use the source operator to specify the data entities that serve as the input to the data flow.

Target Operator

Use the target operator to specify the data entities that serve as the output for storing transformed data.

Shaping Operators

Important

For complex data types, see Hierarchical Data Types to understand what is supported. You might not be able to perform some configurations on the Attributes tab and Data tab in the Properties panel.

Filter Operator

Use the filter operator to select a subset of data from the inbound port to continue downstream to the outbound port based on a filter condition.

Join Operator

Use the join operator to link data from multiple inbound sources.

Expression Operator

Use the expression operator to perform one or more transformations on a single row of data to create new, derivative fields.

To use the Expression operator to change the data type of a group of attributes, see Changing the data type of a group of attributes.

Aggregate Operator

Use the aggregate operator to perform calculations such as sum or count, on all rows or a group of rows to create new, derivative attributes.

Distinct Operator

Use the distinct operator to return distinct rows with unique values.

Sort Operator

Use the sort operator to perform sorting of data in ascending or descending order.

When using a sort operator, we recommend that you apply the sort operator after you apply other operators. This ensures that the sort operator stays immediately before the target operator, therefore enabling data to be inserted into the target in a specific order.

Union Operator

Use the union operator to perform a union operation between two or more datasets.

Minus Operator

Use the minus operator to compare two data entities and return the rows that are present in one entity but not present in the other entity.

You can choose to keep or eliminate duplicate rows in the resulting data.

Intersect Operator

Use the intersect operator to compare two or more data entities and return the rows that are present in the connected entities.

You can choose to keep or eliminate duplicate rows in the resulting data.

Split Operator

Use the split operator to divide one source of input data into two or more output ports based on split conditions that are evaluated in a sequence.

Each split condition has an output port. Data that satisfies a condition is directed to the corresponding output port.

By default a split operator is configured with the Unmatched condition, which is always available in the sequence as the last condition. You cannot add your own condition to the Unmatched condition. You also cannot delete the Unmatched condition.

The operator evaluates the conditions one at a time. After all conditions in the sequence have been evaluated, data that does not meet a condition is directed to the Unmatched output port.

Pivot Operator

The pivot operator lets you take unique row values from one attribute in an input source and pivot the values into multiple attributes in the output.

Using input from multiple rows, the pivot operation performs a transformation based on aggregate function expressions, and the values of an attribute that you specify as the pivot key. The result of a pivot operation is a pivoted or rearranged output of rows and attributes.

The number of rows in the output is based on the selection of attributes to group by.

  • If you specify one or more attributes to group by, the incoming rows that have the same group by attribute value is grouped into one row. For example, if you specify one group by attribute that has four unique values, the incoming data is transformed and grouped into four rows in the output.
  • If you don't specify any attributes to group by, all the incoming data is transformed into a single output row.

The number of attributes in the output is:

  • Based on the number of attributes you select to group by
  • A multiple of the number of values you select in the pivot key
  • A result of the number of attributes that the aggregate function expressions transform

For example, if you select one group by attribute and three pivot key values, and add an expression that transforms two attributes, the number of attributes in the output is:

1 + (3 * 2)

The total number of attributes in the resulting pivoted output is calculated as follows:

Number of group by attributes + (Number of pivot key values * Number of attributes that are transformed from expressions)

The names of new attributes in the output are derived from a pattern you add for target attributes when you create the expressions.

The aggregate functions you use with a pivot operator determine the pivoted values in the output. If no data is found, null is inserted where a pivoted value is expected.

Lookup Operator

The lookup operator performs a query and transformation using a lookup condition and input from two sources, a primary input source and a lookup input source.

The query operation uses the condition and a value in the primary input to find rows in the lookup source. The transformation appends attributes from the lookup source to the primary source.

You can specify the action to take when multiple rows and no rows are returned by the lookup query. For example, you can specify that the action is to skip non-matching rows, and to return any one matching row when there are multiple matching rows.

The resulting output is a combination of both input sources based on the lookup condition, a value in the primary input, and the preferred actions to take. The primary input determines the order of attributes and rows in the output, with the attributes from the primary input placed before the attributes from the lookup input.

Function Operator

Use the function operator to invoke Oracle Cloud Infrastructure Functions from within Data Integration.

Note

For complex data structures such as Map, Array and Composite (Struct), only the JSON serialization format is currently supported.

Before you use OCI Functions from within a data flow in Data Integration, ensure that you understand the dependencies, and have completed the prerequisite tasks.

Flatten Operator

Use the flatten operator to transform hierarchical data into a simplified format for use with other relational data. The flattening process is also known as denormalization or unnesting.

The hierarchical file formats that you can denormalize are:

  • JSON and multi-line JSON
  • Avro
  • Parquet

Currently, the supported hierarchical data type that you can unnest is Array. You can select only one Array node in a hierarchical dataset to flatten. The data structure is flattened from the root to the selected Array node, and presented in a simplified format. If the dataset has other Array nodes, you have the option to convert those into strings.

You can configure the attributes or fields to include in the flattened output by setting Projection preferences.

Table Function Operator

Use the table function operator to add table functions that return data in the form of tables.

Currently, only built-in table functions can be selected from a list. The table functions that are provided by the table function operator include cube, cross tabulation analysis, frequent items, rollup, and Spark SQL. Each built-in table function has predefined fields that you configure to return the desired collection of data. Complex logic can be consolidated into a single function that returns specific sets of data. Parameterization of the fields allows for flexibility in adjusting the logic.

A table function operator can be connected with any other operator within a data flow. Thus, a table function operator can be used as a data source, a mid-stream operator, and a target.