DBMS_CLOUD Package Avro, ORC, and Parquet Complex Types
Describes the mapping of Avro, ORC, and Parquet complex data types to Oracle data types.
Autonomous Database supports complex data types, including the following complex types:
-
struct
-
list
-
map
-
union
-
array
When you specify a source file type of Avro, ORC, or Parquet and the source file includes complex columns, Autonomous Database queries return JSON for the complex columns. This simplifies processing of query results; you can use Oracle's powerful JSON parsing features consistently across the file types and data types. The following table shows the format for the complex types in Autonomous Database:
The complex fields map to
VARCHAR2
columns and
VARCHAR2
size limits apply.
Type | Parquet | ORC | Avro | Oracle |
---|---|---|---|---|
List: sequence of values | List | List | Array | VARCHAR2 (JSON format)
|
Map: list of objects with single key | Map | Map | Map | VARCHAR2 (JSON format)
|
Union: values of different type | Not Available | Union | Union | VARCHAR2 (JSON format)
|
Object: zero or more key-value pairs | Struct | Struct | Record | VARCHAR2 (JSON format)
|
If your ORC, Parquet, or Avro source files contain complex types, then
you can query the JSON output for these common complex types. For example, the
following shows an ORC file, movie-info.orc
, with a complex
type (the same complex type handling applies for Parquet and Avro source files).
Consider the movie-info.orc
file with the following
schema:
id int
original_title string
overview string
poster_path string
release_date string
vote_count int
runtime int
popularity double
genres array<struct<id:int,name:string>
Notice that each movie is categorized by multiple genres
using an array of genres
. The genres
array is an
array
of structs
and each item has an
id
(int
) and a name
(string
). The genres
array is considered a
complex type. You can create a table over this ORC file using DBMS_CLOUD.CREATE_EXTERNAL_TABLE
as follows:
BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE
(
table_name =>'movie_info',
credential_name =>'OBJ_STORE_CRED',
file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/mytenancy/b/movies/o/movie-info.orc',
format => '{"type":"orc", "schema": "first"}');
END;
/
When you create the external table the database automatically generates
the columns based on the schema in the ORC file (if you are using Avro or Parquet,
the same applies). For this example, the DBMS_CLOUD.CREATE_EXTERNAL_TABLE
creates a table in
your database as follows:
CREATE TABLE "ADMIN"."MOVIE_INFO"
( "ID"
NUMBER(10,0),
"ORIGINAL_TITLE" VARCHAR2(4000 BYTE) COLLATE "USING_NLS_COMP",
"OVERVIEW" VARCHAR2(4000 BYTE) COLLATE "USING_NLS_COMP",
"POSTER_PATH" VARCHAR2(4000 BYTE) COLLATE "USING_NLS_COMP",
"RELEASE_DATE" VARCHAR2(4000 BYTE) COLLATE "USING_NLS_COMP",
"VOTE_COUNT" NUMBER(10,0),
"RUNTIME" NUMBER(10,0),
"POPULARITY" BINARY_DOUBLE,
"GENRES" VARCHAR2(4000 BYTE) COLLATE "USING_NLS_COMP"
) DEFAULT COLLATION "USING_NLS_COMP"
ORGANIZATION EXTERNAL
( TYPE ORACLE_BIGDATA
DEFAULT DIRECTORY "DATA_PUMP_DIR"
ACCESS PARAMETERS
( com.oracle.bigdata.credential.name=OBJ_STORE_CRED
com.oracle.bigdata.fileformat=ORC
)
LOCATION
(
'https://objectstorage.us-phoenix-1.oraclecloud.com/n/mytenancy/b/movies/o/movie-info.orc'
)
)
REJECT LIMIT UNLIMITED
PARALLEL;
)
Now you can query the movie data:
SELECT original_title, release_date, genres
FROM movie_info
WHERE release_date > '2000'
ORDER BY original_title;
This produces the following output:
original_title release_date genres
(500) Days of Summer 2009 [{"id":3,"name":"Drama"},{"id":6,"name":"Comedy"},{"id":17,"name":"Horror"},{"id":19,"name":"Western"},{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
10,000 BC 2008 [{"id":6,"name":"Comedy"}]
11:14 2003 [{"id":9,"name":"Thriller"},{"id":14,"name":"Family"}]
127 Hours 2010 [{"id":6,"name":"Comedy"},{"id":3,"name":"Drama"}]
13 Going on 30 2004 [{"id":6,"name":"Comedy"},{"id":3,"name":"Drama"},{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
1408 2007 [{"id":45,"name":"Sci-Fi"},{"id":6,"name":"Comedy"},{"id":17,"name":"Horror"},{"id":6,"name":"Comedy"},{"id":18,"name":"War"}]
Notice that the complex type genres
is returned as a
JSON array.
To make the JSON data more useful, you can transform the column using
Oracle's JSON functions. For example, you can use the JSON "." notation as well as
the more powerful transform functions such as JSON_TABLE
.
See Simple Dot-Notation Access to JSON Data for information on "." notation.
See SQL/JSON Function JSON_TABLE
for information on JSON_TABLE
.
SELECT original_title, release_date, m.genre_name, genres
FROM movie_info mi,
JSON_TABLE(mi.genres, '$.name[*]'
COLUMNS (genre_name VARCHAR2(25) PATH
'$')
) AS m
WHERE rownum < 10;
The JSON_TABLE
creates a row for each value of the
array, think outer join, and the struct is parsed to extract the name of the genre.
This produces the following output:
original_title release_date genre_name genres
(500) Days of Summer 2009 Drama [{"id":3,"name":"Drama"},{"id":6,"name":"Comedy"},
{"id":17,"name":"Horror"},{"id":19,"name":"Western"},
{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
(500) Days of Summer 2009 Comedy [{"id":3,"name":"Drama"},{"id":6,"name":"Comedy"},
{"id":17,"name":"Horror"},{"id":19,"name":"Western"},
{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
(500) Days of Summer 2009 Horror [{"id":3,"name":"Drama"},{"id":6,"name":"Comedy"},
{"id":17,"name":"Horror"},{"id":19,"name":"Western"},
{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
(500) Days of Summer 2009 Western [{"id":3,"name":"Drama"},{"id":6,"name":"Comedy"},
{"id":17,"name":"Horror"},{"id":19,"name":"Western"},
{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
(500) Days of Summer 2009 War [{"id":3,"name":"Drama"},{"id":6,"name":"Comedy"},
{"id":17,"name":"Horror"},{"id":19,"name":"Western"},
{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
(500) Days of Summer 2009 Romance [{"id":3,"name":"Drama"},{"id":6,"name":"Comedy"},
{"id":17,"name":"Horror"},{"id":19,"name":"Western"},
{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
10,000 BC 2008 Comedy [{"id":6,"name":"Comedy"}]
11:14 2003 Family [{"id":9,"name":"Thriller"},{"id":14,"name":"Family"}]
11:14 2003 Thriller [{"id":9,"name":"Thriller"},{"id":14,"name":"Family"}]
127 Hours 2010 Comedy [{"id":6,"name":"Comedy"},{"id":3,"name":"Drama"}]
127 Hours 2010 Drama [{"id":6,"name":"Comedy"},{"id":3,"name":"Drama"}]
13 Going on 30 2004 Romance [{"id":6,"name":"Comedy"},{"id":3,"name":"Drama"},
{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
13 Going on 30 2004 Comedy [{"id":6,"name":"Comedy"},{"id":3,"name":"Drama"},
{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
13 Going on 30 2004 War [{"id":6,"name":"Comedy"},{"id":3,"name":"Drama"},
{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
13 Going on 30 2004 Drama [{"id":6,"name":"Comedy"},{"id":3,"name":"Drama"},
{"id":18,"name":"War"},{"id":15,"name":"Romance"}]
Parent topic: DBMS_CLOUD Avro, ORC, and Parquet Support