Functions (Data Flow) Reference
The following functions are used with operators in a data flow, to enable you to build expressions.
Function | Description | Example |
---|---|---|
COUNT(value[, value]*) | Returns the number of rows for which one or more supplied expressions are all non-null. | COUNT(expr1) |
COUNT(*) | Returns the total number of retrieved rows, including rows containing null. | COUNT(*) |
MAX(value) | Returns the maximum value of the argument. | MAX(expr) |
MIN(value) | Returns the minimum value of the argument. | MIN(expr) |
SUM(numeric) | Returns the sum calculated from values of a group. | SUM(expr1) |
AVG(numeric) | Returns the average of numeric values in an expression. | AVG(AGGREGATE_1.src1.attribute1) |
LISTAGG(column[, delimiter]) WITHIN GROUP (order_by_clause) | Concatenates values of the input column with the specified delimiter, for each group based on the order clause. column contains the values you want to concatenate together in the result. The delimiter separates the column values in the result. If a delimiter is not provided, then an empty character is used. order_by_clause determines the order that the concatenated values are returned. This function can only be used as an aggregator, and can be used with grouping or without grouping. If you use without grouping, the result is a single row. If you use with a grouping, the function returns a row for each group. | Consider a table with two columns,
Example 1: Without grouping
Example 2: Group by the
|
Function | Description | Example |
---|---|---|
FIRST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ] ) |
Returns the value evaluated at the row that's the first row of the window frame. | FIRST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the first value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LAG(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset before the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LAG(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME DESC) returns the value of BANK_ID from the second row before the current row, partitioned by BANK_ID and in descending order of BANK_NAME . If there is no such value, hello is returned. |
LAST_VALUE(value) OVER ([ partition_clause ] order_by_clause [ windowFrame_clause ]) |
Returns the value evaluated at the row that is the last row of the window frame. | LAST_VALUE(BANK_ID) OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) returns the last value of BANK_ID in a window over which the rows are computed as the current row and 1 row after that row, partitioned by BANK_ID and in ascending order of BANK_NAME . |
LEAD(value[, offset[, default]]) OVER ([ partition_clause ] order_by_clause) |
Returns the value evaluated at the row at a given offset after the current row within the partition. If there is no such row, the default value is returned. Both offset and default are evaluated with respect to the current row. If omitted, offset is defaulted to 1 and default to NULL. | LEAD(BANK_ID, 2, 'hello') OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the value of BANK_ID from the second row after the current row, partitioned by BANK_ID and in ascending order of BANK_NAME . If there is no such value, hello is returned. |
RANK() OVER([ partition_clause ] order_by_clause) |
Returns the rank of the current row with gaps, counting from 1. | RANK() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the rank of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
ROW_NUMBER() OVER([ partition_clause ] order_by_clause) |
Returns the unique number of the current row within its partition, counting from 1. | ROW_NUMBER() OVER (PARTITION BY BANK_ID ORDER BY BANK_NAME) returns the unique row number of each row within the partition group of BANK_ID , in ascending order of BANK_NAME . |
Function | Description | Example |
---|---|---|
ABS(numeric) | Returns the absolute power of the numeric value. | ABS(-1) |
CEIL(numeric) | Returns the smallest integer not greater than the numeric value | CEIL(-1,2) |
FLOOR(numeric) | Returns the largest integer not greater than the numeric value. | FLOOR(-1,2) |
MOD(numeric1, numeric2) | Returns the remainder after numeric1 is divided by numeric2 . | MOD(8,2) |
POWER(numeric1, numeric2) | Raises numeric1 to the power of numeric2 . | POWER(2,3) |
ROUND(numeric1, numeric2) | Returns numeric1 rounded to numeric2 decimal places. | ROUND(2.5,0) |
TRUNC(numeric1, numeric2) | Returns numeric1 truncated to numeric2 decimal places. | TRUNC(2.5,0) |
TO_NUMBER(expr[, format, locale]) | Converts an expr to a number, based on the format and optional locale provided. Default locale is en-US . Supported language tags.Supported format patterns:
|
|
Only the Expression operator supports array functions.
Function | Description | Example |
---|---|---|
ARRAY_POSITION(array(...), element) | Returns the position of the first occurrence of the given element in the given array. The position is not zero based, instead it starts with 1. | ARRAY_POSITION(array(3, 2, 1, 4, 1), 1) returns 3 |
REVERSE(array(...)) |
Returns the given array of elements in a reverse order. | REVERSE(array(2, 1, 4, 3)) returns [3,4,1,2] |
ELEMENT_AT(array(...), index) |
Returns the element of the given array at the given index position. The index is not zero based, instead it starts with 1. If |
ELEMENT_AT(array(1, 2, 3), 2) returns 2 |
Function | Description | Example |
---|---|---|
COALESCE(value, value [, value]*) | Returns the first non-null argument, if it exists, otherwise returns null. | COALESCE(NULL, 1, NULL) returns 1 |
NULLIF(value, value) | Returns null if the two values equal each other, otherwise returns the first value. | NULLIF('ABC','XYZ') returns ABC |
Function | Description | Example |
---|---|---|
CURRENT_DATE |
Returns the current date. | CURRENT_DATE returns today's date such as 2023-05-26 |
CURRENT_TIMESTAMP |
Returns the current date and time for the session time zone. | CURRENT_TIMESTAMP returns today's date and current time such as 2023-05-26 12:34:56
|
DATE_ADD(date, number_of_days) |
Returns the date that's the specified number of days after the specified date . |
DATE_ADD('2017-07-30', 1) returns 2017-07-31 |
DATE_FORMAT(expr, format[, locale]) |
Formats an Supported date format patterns:
|
|
DAYOFMONTH(date) |
Returns the date's day in the month. | DAYOFMONTH('2020-12-25') returns 25 |
DAYOFWEEK(date) |
Returns the date's day in the week. | DAYOFWEEK('2020-12-25') returns 6 for Friday. In the United States, Sunday is considered to be 1, Monday is 2, and so on. |
DAYOFYEAR(date) |
Returns the date's day in the year. | DAYOFYEAR('2020-12-25') returns 360 |
WEEKOFYEAR(date) |
Returns the date's week in the year. |
|
HOUR(datetime) |
Returns the datetime's hour value. | HOUR('2020-12-25 15:10:30') returns 15 |
LAST_DAY(date) |
Returns the date's last day of the month. | LAST_DAY('2020-12-25') returns 31 |
MINUTE(datetime) |
Returns the datetime's minute value. | HOUR('2020-12-25 15:10:30') returns 10 |
MONTH(date) |
Returns the date's month value. | MONTH('2020-06-25') returns 6 |
QUARTER(date) |
Returns the quarter of year the date is in. | QUARTER('2020-12-25') returns 4 |
SECOND(datetime) |
Returns the datetime's second value. | SECOND('2020-12-25 15:10:30') returns 30 |
TO_DATE(string, format_string[, localeStr]) |
Parses the string expression with the format_string expression to a date. Locale is optional. Default is en-US . Supported language tags.In pipeline expressions, the
|
|
TO_TIMESTAMP(expr, format_string[, localeStr]) |
Converts an expr of VARCHAR to a value of TIMESTAMP, based on the format_string and the optional localeStr provided.In pipeline expressions, the
|
TO_TIMESTAMP('2020-10-11 11:10:10', 'yyyy-MM-dd HH:mm:ss') returns a TIMESTAMP object representing 11am 10:10 Oct 11th, 2020 |
WEEK(date) |
Returns the date's week value. |
WEEK('2020-06-25') returns 4 |
YEAR(date) |
Returns the date's year value. | YEAR('2020-06-25') returns 2020 |
ADD_MONTHS(date_expr, number_months) |
Returns the date after adding the specified number of months to the specified date, timestamp or string with a format such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSS . |
|
MONTHS_BETWEEN(start_date_expr, end_date_expr) |
Returns the number of months between A whole number is returned if both dates are the same day of the month, or both are the last day in their respective months. Otherwise, the difference is calculated based on 31 days per month. |
|
FROM_UTC_TIMESTAMP(time_stamp, time_zone) |
Interprets a date, timestamp or string as a UTC time and converts that time to a timestamp in the specified time zone. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul', or a time zone offset (for example, UTC+02). |
FROM_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 03:40:00.0 |
TO_UTC_TIMESTAMP(time_stamp, time_zone) |
Converts a date, timestamp or string in the specified time zone to a UTC timestamp. For string, use a format such as: Time zone format is either a region-based zone ID (for example, 'area/city' such as 'Asia/Seoul'), or a time zone offset (for example, UTC+02). |
TO_UTC_TIMESTAMP('2017-07-14 02:40:00.0', 'GMT+1') returns 2017-07-14 01:40:00.0
|
FROM_UNIXTIME(unix_time[, fmt]) |
Converts the specified Unix time or epoch to a string that represents the timestamp of that moment in the current system time zone and in the specified format. Note: Unix time is the number of seconds that have elapsed since January 1, 1970 00:00:00 UTC. If |
Default time zone is PST in the examples |
UNIX_TIMESTAMP([time_expr[, fmt]]) |
Converts the current or specified time to a Unix timestamp in seconds.
If If |
Default time zone is PST in this example |
INTERVAL 'year' YEAR[(year_precision)] |
Returns a period of time in years. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
|
INTERVAL 'year month' YEAR[(year_precision)] TO MONTH |
Returns a period of time in years and months. Use to store a period of time using year and month fields. year_precision is the number of digits in the year field; it ranges from 0 to 9. If year_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '100-5' YEAR(3) TO MONTH returns an interval of 100 years, 5 months. Must must specify the leading year precision of 3. |
INTERVAL 'month' MONTH[(month_precision)] |
Returns a period of time in months. month_precision is the number of digits in the month field; it ranges from 0 to 9. If month_precision is omitted, the default is 2 (must be less than 100 years.) |
INTERVAL '200' MONTH(3) returns an interval of 200 months. Must specify the month precision of 3. |
INTERVAL 'day time' DAY[(day_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of days, hours, minutes, and seconds. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '11 10:09:08.555' DAY TO SECOND(3) returns an interval of 11 days, 10 hours, 09 minutes, 08 seconds, and 555 thousandths of a second |
INTERVAL 'day time' DAY[(day_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of days, hours, and minutes. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '11 10:09' DAY TO MINUTE returns an interval of 11 days, 10 hours, and 09 minutes |
INTERVAL 'day time' DAY[(day_precision)] TO HOUR[(hour_precision)] |
Returns a period of time in terms of days and hours. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '100 10' DAY(3) TO HOUR returns an interval of 100 days 10 hours |
INTERVAL 'day' DAY[(day_precision)] |
Returns a period of time in terms of days. day_precision is the number of digits in the day field; it ranges from 0 to 9. The default is 2. |
INTERVAL '999' DAY(3) returns an interval of 999 days |
INTERVAL 'time' HOUR[(hour_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of hours, minutes, and seconds. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '09:08:07.6666666' HOUR TO SECOND(7) returns an interval of 9 hours, 08 minutes, and 7.6666666 seconds |
INTERVAL 'time' HOUR[(hour_precision)] TO MINUTE[(minute_precision)] |
Returns a period of time in terms of hours and minutes. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '09:30' HOUR TO MINUTE returns an interval of 9 hours and 30 minutes |
INTERVAL 'hour' HOUR[(hour_precision)] |
Returns a period of time in terms of hours. hour_precision is the number of digits in the hour field; it ranges from 0 to 2. The default is 2. |
INTERVAL '40' HOUR returns an interval of 40 hours |
INTERVAL 'minute' MINUTE[(minute_precision)] |
Returns a period of time in terms of minutes. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. |
INTERVAL '15' MINUTE returns an interval of 15 minutes |
INTERVAL 'time' MINUTE[(minute_precision)] TO SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of minutes and seconds. minute_precision is the number of digits in the minute field; it ranges from 0 to 2. The default is 2. fractional_seconds_precision is the number of digits in the fractional part of the second value in the time field; it ranges from 0 to 9. |
INTERVAL '15:30' MINUTE TO SECOND returns an interval of 15 minutes and 30 seconds |
INTERVAL 'second' SECOND[(fractional_seconds_precision)] |
Returns a period of time in terms of seconds. fractional_seconds_precision is the number of digits in the fractional part of the second field; it ranges from 0 to 9. The default is 3. |
INTERVAL '15.678' SECOND returns an interval of 15.678 seconds |
Function | Description | Example |
---|---|---|
MD5(all data types) | Calculates an MD5 checksum of the data type, and returns a string value. | MD5(column_name) |
SHA1(all data types) | Calculates a SHA-1 hash value of the data type, and returns a string value. | SHA1(column_name) |
SHA2(all data types, bitLength) | Calculates a SHA-2 hash value of the data type, and returns a string value. bitLength is an integer. | SHA2 (column_name, bitLength can be set to 0 (equivalent to 256), 256, 384, or 512) . |
ORA_HASH(expr, [max_bucket], [seed_value]) |
Computes a hash value for
Oracle applies the hash function to the combination of |
|
Function | Description | Example |
---|---|---|
SCHEMA_OF_JSON(string) | Parses a JSON string and infers the schema in DDL format. |
|
FROM_JSON(column, string) | Parses a column containing a JSON string into one of the following types, with the specified schema.
|
|
TO_JSON(column) | Converts a column containing a type of Struct or Array of Structs, or a Map or Array of Map, into a JSON string. | TO_JSON(TO_STRUCT('s1', TO_ARRAY(1,2,3), 's2', TO_MAP('key', 'value'))) returns a JSON string {"s1":[1,2,3],"s2":{"key":"value"}} |
TO_MAP(string,column[,string,column]*) | Creates a new column of Map type. The input columns must be grouped as key-value pairs. The input key columns cannot be null, and must all have the same data type. The input value columns must all have the same data type. |
|
TO_STRUCT(string,column[,string,column]*) | Creates a new column of Struct type. The input columns must be grouped as key-value pairs. |
|
TO_ARRAY(column[,column]*) | Creates a new column as Array type. The input columns must all have the same data type. |
|
Data flow operators that support creating expressions and hierarchical data types can use higher-order functions.
The supported operators are:
-
Aggregate
-
Expression
-
Filter
-
Join
-
Lookup
-
Split
-
Pivot
Function | Description | Example |
---|---|---|
TRANSFORM(column, lambda_function) | Takes an array and an anonymous function, and sets up a new array by applying the function to each element, and assigning the result to the output array. | For an input array of integers [1, 2, 3] , TRANSFORM(array, x -> x + 1) returns a new array of [2, 3, 4] . |
TRANSFORM_KEYS(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the keys have the type of the result of the lambda function, and the values have the type of the column map values. | For an input map with integer keys and string values of {1 -> 'value1', 2 -> 'value2', 3 -> 'value3'} , TRANSFORM_KEYS(map, (k, v) -> k * 2 + 1) returns a new map of {3 -> 'value1', 5 -> 'value2', 7 -> 'value3'} . |
TRANSFORM_VALUES(column, lambda_function) | Takes a map and a function with 2 arguments (key and value), and returns a map in which the values have the type of the result of the lambda functions, and the keys have the type of the column map keys. | For an input map with string keys and string values of {'a' -> 'value1', 'b' -> 'value2', 'c' -> 'value3'} , TRANSFORM_VALUES(map, (k, v) -> k || __ '' || v) returns a new map of {'a' -> 'a_value1', 'b' -> 'b_value2', 'c' -> 'c_value3'} . |
ARRAY_SORT(array(...), lambda_function) |
Only the Expression operator supports Takes an array and sorts according to the given function that takes 2 arguments. The function must return -1, 0, or 1 depending on whether the first element is less than, equal to, or greater than the second element. If the function is omitted, the array is sorted in ascending order. |
The returned array is: [1,5,6] |
Function | Description | Example |
---|---|---|
CASE WHEN condition1 THEN result1 ELSE result2 END | Returns the value for which a condition is met. | CASE WHEN 1 > 0 THEN 'ABC' ELSE 'XYZ' END returns ABC if 1> 0 , otherwise returns XYZ |
AND | The logical AND operator. Returns true if both operands are true, otherwise returns false. | (x = 10 AND y = 20) returns "true" if x is equal to 10 and y is equal to 20. If either one is not true, then it returns "false" |
OR | The logical OR operator. Returns true if either operand is true or both are true, otherwise returns false. | (x = 10 OR y = 20) returns "false" if x is not equal to 10 and also y is not equal to 20. If either one is true, then it returns "true" |
NOT | The logical NOT operator. | |
LIKE | Performs string pattern matching, whether string1 matches the pattern in string2. | |
= | Tests for equality. Returns true if expr1 equals expr2, otherwise returns false. | x = 10 returns "true" when value of x is 10, else it returns "false" |
!= | Tests for inequality. Returns true if expr1 does not equal to expr2, otherwise returns false. | x != 10 returns "false" if value of x is 10, else it returns "true" |
> | Tests for an expression greater than. Returns true if expr1 is greater than expr2. | x > 10 returns "true" if value of x is greater than 10, else it returns "false" |
>= | Tests for an expression greater than or equal to. Returns true if expr1 is greater than or equal to expr2. | x > =10 returns "true" if value of x is greater than or equal to 10, else it returns "false" |
< | Tests for an expression less than. Returns true if expr1 is less than expr2. | x < 10 returns "true" if value of x is less than 10, else it returns "false" |
<= | Tests for an expression less than or equal to. Returns true if expr1 is less than or equal to expr2. | x <= 10 returns "true" if value of x is less than 10, else it returns "false" |
|| | Concatenates two strings. | 'XYZ' || 'hello' returns 'XYZhello' |
BETWEEN | Evaluates a range. | FILTER_1.BANK.BANK_ID BETWEEN 1003 AND 1007 |
IN | Tests whether an expression matches a list of values. | FILTER_2.ORDERS.ORDER_ID IN (1003, 1007) |
Functions | Description | Example |
---|---|---|
CAST(value AS type) | Returns the specified value in the specified type. | CAST("10" AS INT) returns 10 |
CONCAT(string, string) | Returns the combined values of strings or columns. | CONCAT('Oracle','SQL') returns OracleSQL |
CONCAT_WS(separator, expression1, expression2, expression3,...) |
Returns the combined values of strings or columns using the specified separator in between the strings or columns. A separator is required and it must be a string. At least one expression must be provided after the separator. For example: |
CONCAT_WS('-', 'Hello', 'Oracle') returns Hello-Oracle
If a child of the function is an array, then the array is flattened:
|
INITCAP(string) | Returns the string with the first letter in each word capitalized, while all other letters are in lowercase, and each word is delimited by a white space. | INITCAP('oRACLE sql') returns Oracle Sql |
INSTR(string, substring[start_position]) | Returns the (1-based) index of the first occurrence of substring in string . | INSTR('OracleSQL', 'SQL') returns 7 |
LOWER(string) | Returns the string with all letters changed to lowercase. | LOWER('ORACLE') returns oracle |
LENGTH(string) | Returns the character length of string or the number of bytes of binary data. The length of the string includes trailing spaces. | LENGTH('Oracle') returns 6 |
LTRIM(string) | Returns the string with leading spaces removed from the left. | LTRIM(' Oracle') |
NVL(expr1, epxr2) | Returns the argument that is not null. | NVL(EXPRESSION_3.CUSTOMERS_JSON.CONTINENT_ID, ROWID()) |
REGEXP_SUBSTR(string, regexp[, RegexGroupIdx]) | Searches and extracts the string that matches a regular expression pattern from the input string. If the optional capturing group index is provided, the function extracts the specific group. |
REGEXP_SUBSTR('22;33;44', '([0-9.]*);([0-9.]*);([0-9.]*)', 1) returns 22 |
REPLACE(string, search, replacement) | Replaces all occurrences of search with replacement .If If | REPLACE('ABCabc', 'abc', 'DEF') returns ABCDEF |
RTRIM(string) | Returns the string with leading spaces removed from the right. | RTRIM('Oracle ') |
SUBSTRING(string, position[, substring_length]) | Returns the substring starting at position. | SUBSTRING('Oracle SQL' FROM 2 FOR 3) returns rac |
For numbers, TO_CHAR(expr) and for dates TO_CHAR(expr, format[, locale]) | Converts numbers and dates into strings. For numbers, no format is required. For dates, use the same format as DATE_FORMAT described in Date and Time Functions. Default locale is en-US . See supported language tags.In pipeline expressions, the
| Number example: Date example: |
UPPER(string) | Returns a string with all letters changed to uppercase. | UPPER('oracle') returns ORACLE |
LPAD(str, len[, pad]) | Returns a string that is left-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | LPAD('ABC', 5, '*') returns '**ABC' |
RPAD(str, len[, pad]) | Returns a string that is right-padded with specified characters to a certain length. If the pad character is omitted, the default is a space. | RPAD('XYZ', 6, '+' ) returns 'XYZ+++' |
Function | Description | Example |
---|---|---|
NUMERIC_ID() | Generates a universally unique identifier that is a 64-bit number for each row. | NUMERIC_ID() returns for example, 3458761969522180096 and 3458762008176885761 |
ROWID() | Generates monotonically increasing 64-bit numbers. | ROWID() returns for example, 0 , 1 , 2 , and so on |
UUID() | Generates a universally unique identifier that is a 128-bit String for each row. | UUID() returns for example, 20d45c2f-0d56-4356-8910-162f4f40fb6d |
MONOTONICALLY_INCREASING_ID() |
Generates unique, monotonically increasing 64-bit integers that are not consecutive numbers. | MONOTONICALLY_INCREASING_ID() returns for example,
8589934592 and 25769803776 |