On Autonomous Database you can
create implicit partitioned external tables from Hive style
partitioned data or from simple folder partitioned data stored on
your Cloud Object Store.
About External Tables with Implicit Partitioning On Autonomous Database use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create implicit partitioned external tables from Hive style partitioned data or from simple folder partitioned data stored on your Cloud Object Store.
Query External Implicit Partitioned Non-Hive Style Data Use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create an implicit partitioned external table from data in object storage that has been generated from non-Hive data.
About External Tables with
Implicit Partitioning 🔗
On Autonomous Database use the
DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create implicit partitioned
external tables from Hive style partitioned data or from simple folder partitioned data stored
on your Cloud Object Store.
Use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to create
implicit partitioned external tables. By passing the appropriate options to this procedure,
the partitions are derived from the source data. The partitioned external tables support
runtime discovery of partition columns and their values. Runtime discovery of changes in the
underlying object store structure, such as adding or removing objects, simplifies the
maintenance process by eliminating the need for additional synchronization procedures required
by DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE. This ensures that the data is up to
date at query runtime.
With implicit partitioning, Autonomous Database
automatically determines the columns a table is "partitioned on" based on the object storage
source hierarchical file structure. There is no need to explicitly declare a partitioning
scheme. Implicit partitioning provides partition-table-like performance benefits without the
need to explicitly define a partitioned external table using the
DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE procedure.
Implicit partitioned external tables support the following naming styles for
partitioned objects in object store:
Hive Naming Format: Data in object storage that has been generated from Hive
has the following format.
Non-Hive "Pathtail" Naming Format: A second format that is often found in
data lakes is similar to the hive format, but the folder name does not include the partition
columns.
Using the DBMS_CLOUD.CREATE_EXTERNAL_TABLE
procedure, implicit partitioned external tables can be created in the following ways:
Set the partition type to hive with a given list of partition columns
In this case, implicit_partition_type is set to
hive and implicit_partition_columns provides a list
of partition columns.
For example:
DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
table_name => 'mysales',
credential_name => 'mycredential',
file_uri_list => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
column_list => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
format => '{"type":"parquet",
"implicit_partition_type":"hive",
"implicit_partition_columns":["country","year","month"]}');
Set the partition type to hive without providing a list of partition
columns
In this case, implicit_partition_type is set to
hive and implicit_partition_columns is not provided.
The partition columns are automatically detected by searching for '=' in the path
specified by file_uri_list. The column_name is to the
left of the '=' and the value is on the right side. If a column_name is
not found in the path, an error is thrown.
For example:
DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
table_name => 'mysales',
credential_name => 'mycredential',
file_uri_list => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
column_list => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
format => '{"type":"parquet", "implicit_partition_type":"hive"');
Provide a list of partition columns without specifying the type
In this case, implicit_partition_type is not set and
implicit_partition_columns provides a list of columns.
For example:
DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
table_name => 'mysales',
credential_name => 'mycredential',
file_uri_list => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
column_list => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
format => '{"type":"parquet",
"implicit_partition_columns":["country","year","month"]}');
Query External Implicit Partitioned Data with
Hive Format Source File Organization 🔗
Use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure to
create an implicit partitioned external table from data in Object Store that was
generated from Hive data.
The sample source files in the follow example, use this
naming format:
This operation stores the credentials in the
database in an encrypted format. You can use any
name for the credential name. Note that this step is
required only once unless your object store
credentials change. Once you store the credentials
you can then use the same credential name for
creating external tables.
See CREATE_CREDENTIAL Procedure
for information about
the username and password parameters
for different object storage services.
Create an implicit partitioned external table on top of your
source files using the procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE.
In this example, since the format option
implicit_partition_columns is not
provided, partition columns are automatically detected and
implicit partitioning is enabled by setting the format
option implicit_partition_type to
hive.
However, if a
list of columns is given, then those columns are
used as partition columns and Autonomous Database doesn't try to discover the
columns.
To detect the partition
columns Autonomous Database starts searching from
the beginning of the path, specified by
file_uri_list, for '='. When
found, the left part of '=' up to the last '/' is
taken as column (e.g. "country") and the right part
of '=', until the first '/' as value (e.g. "USA").
The search continues for '=' until after the '/'
that follows the first partition value, detecting
the second '=' in the path, and so on.
DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
table_name => 'mysales',
credential_name => 'DEF_CRED_NAME',
file_uri_list => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
column_list => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
format => '{"type":"parquet", "implicit_partition_type":"hive"');
The parameters are:
table_name: is
the external table name.
credential_name:
is the name of the credential created in the
previous step.
file_uri_list: is
a comma-delimited list of source file URIs. There
are two options for this list:
Specify a comma-delimited list of
individual file URIs without wildcarding.
Specify a single file URI with
wildcards, where the wildcards can only be after
the last slash "/". The character "*" can be used
as the wildcard for multiple characters, the
character "?" can be used as the wildcard for a
single character.
column_list: is a
comma delimited list of column names and data
types for the external table. The list includes
the columns inside the data file and those derived
from the object name (from names in the file path
specified by file_uri_list).
The column_list
is not required when the data files are structured
files (Parquet, Avro, or ORC).
format: defines
the options you can specify to describe the format
of the source file. The
implicit_partition_type option
specifies the data format type as hive.
If the data in your source file is
encrypted, decrypt the data by specifying the
encryption format option. See
Decrypt Data While Importing from Object Storage for more information on decrypting
data.
In this example, namespace-string is the Oracle
Cloud Infrastructure object storage namespace and
bucketname is the bucket
name. See Understanding
Object Storage Namespaces for more
information.
You can now run queries on the external partitioned table you
created in the previous step.
Your Autonomous Database takes advantage of
the partitioning information of your external
partitioned table, ensuring that the query only
accesses the relevant data files in Object
Store.
For example:
SELECT product, units FROM mysales WHERE year='2024' and month='02'
This SQL statement only queries data in the
partition for month 02 of year 2024.
Query External Implicit Partitioned Non-Hive
Style Data 🔗
Use the DBMS_CLOUD.CREATE_EXTERNAL_TABLE procedure
to create an implicit partitioned external table from data in object storage that
has been generated from non-Hive data.
The sample source files in the follow example, use this
naming format:
OBJBUCKET/<table>/<value1>/<value2>/file.parquet
Consider the following sample source files:
OBJBUCKET/sales/USA/2024/01/sales-2024-01.parquet
OBJBUCKET/sales/USA/2024/01/sales-2024-02.parquet
OBJBUCKET/sales/USA/2024/01/sales-2024-03.parquet
To create an implicit partitioned external table with data
stored in this sample Hive format, do the following:
Store Object Store credentials using the procedure DBMS_CLOUD.CREATE_CREDENTIAL.
This operation stores the credentials in the
database in an encrypted format. You can use any
name for the credential name. Note that this step is
required only once unless your object store
credentials change. Once you store the credentials
you can then use the same credential name for
creating external tables.
See CREATE_CREDENTIAL Procedure
for information about
the username and password parameters
for different object storage services.
Create an implicit partitioned external table on top of your
source files using the procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE.
In this example, implicit partitioning is enabled
by providing the partition columns with the format option
implicit_partition_columns. Since
the folder name does not include the partition columns, the
partition values in the path, specified by
file_uri_list, can only be
discovered if an explicit list of columns is given with the
format option implicit_partition_columns.
In order to detect the respective column values in the path
the order of columns must be the same as the order of
values.
The partition columns are
automatically detected.
DBMS_CLOUD.CREATE_EXTERNAL_TABLE (
table_name => 'mysales',
credential_name => 'DEF_CRED_NAME',
file_uri_list => 'https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/*.parquet',
column_list => 'product varchar2(100), units number, country varchar2(100),year (number), month varchar2(2)',
format => '{"type":"parquet", "implicit_partition_columns":["country","year","month"]}');
The parameters are:
table_name: is
the external table name.
credential_name:
is the name of the credential created in the
previous step.
file_uri_list: is
a comma-delimited list of source file URIs. There
are two options for this list:
Specify a comma-delimited list of
individual file URIs without wildcarding.
Specify a single file URI with
wildcards, where the wildcards can only be after
the last slash "/". The character "*" can be used
as the wildcard for multiple characters, the
character "?" can be used as the wildcard for a
single character.
column_list: is a
comma delimited list of column names and data
types for the external table. The list includes
the columns inside the data file and those derived
from the object name (from names in the file path
specified by file_uri_list).
The column_list
is not required when the data files are structured
files (Parquet, Avro, or ORC).
format: defines
the options you can specify to describe the format
of the source file. The
implicit_partition_type option is
unset. Since
implicit_partition_columns is
given, the type is automatically detected as
non-hive.
If the data in your source file is
encrypted, decrypt the data by specifying the
encryption format option. See
Decrypt Data While Importing from Object Storage for more information on decrypting
data.
In this example, namespace-string is the Oracle
Cloud Infrastructure object storage namespace and
bucketname is the bucket
name. See Understanding
Object Storage Namespaces for more
information.
See DBMS_CLOUD URI Formats for more information on the
supported cloud object storage services.
You can now run queries on the external partitioned table you
created in the previous step.
Your Autonomous Database takes advantage of
the partitioning information of your external
partitioned table, ensuring that the query only
accesses the relevant data files in Object
Store.
For example:
SELECT product, units FROM mysales WHERE year='2024'
This SQL statement only queries data in the
partition for year 2024.