The procedure DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE
supports external
partitioned files in the supported cloud object storage services. The credential is a
table level property; therefore, the external files must all be on the same cloud object
store.
For example:
BEGIN DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE
(
table_name => 'MYSALES',
credential_name => 'DEF_CRED_NAME',
file_uri_list => 'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/sales/*.parquet',
format =>
json_object('type' value 'parquet', 'schema' value 'first',
'partition_columns' value
json_array(
json_object('name' value 'country', 'type' value 'varchar2(100)'),
json_object('name' value 'year', 'type' value 'number'),
json_object('name' value 'month', 'type' value 'varchar2(2)')
)
)
);
END;
/
The DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE
parameters for structured data files, such as for a Parquet data file does not require
the column_list
or the field_list
parameters. The
column names and data types are derived for the columns from the first parquet file that
the procedure scans (and therefore all files must have the same shape). The generated
column list includes the columns derived from the object name and these column have the
data types specified with the partition_columns
format
parameter.
The parameters are:
-
table_name
: is the external table name.
-
credential_name
: is the name of the credential
created in the previous step.
-
file_uri_list
: is a comma-delimited list of source
file URIs. There are two options for this list:
-
Specify a comma-delimited list of individual file URIs without wildcarding.
-
Specify a single file URI with wildcards, where the wildcards can only be after
the last slash "/". The character "*" can be used as the wildcard for multiple
characters, the character "?" can be used as the wildcard for a single
character.
-
column_list
: is a comma delimited list of column names
and data types for the external table. The list includes the columns that are inside
the file as well as those derived from the object name.
The column_list
is not required when the data files
are structured files (Parquet, Avro, or ORC).
-
field_list
: Identifies the fields in the source files
and their data types. The default value is NULL
meaning the fields
and their data types are determined by the column_list
parameter.
The field_list
is not required when the data files are
structured files (Parquet, Avro, or ORC).
-
format
: defines the options you can specify to
describe the format of the source file. The partition_columns
format
parameter specifies the names of the partition columns. See
DBMS_CLOUD Package Format Options for more information.
If the data in your source file is encrypted, decrypt the data by specifying the encryption
format option. See Decrypt Data While Importing from Object Storage for more information on decrypting data.
In this example, namespace-string
is the Oracle
Cloud Infrastructure object storage namespace and
bucketname
is the bucket
name. See Understanding
Object Storage Namespaces for more
information.
See CREATE_EXTERNAL_PART_TABLE Procedure for detailed information about the parameters.
See DBMS_CLOUD URI Formats for more information on the supported cloud object storage
services.
If there are any rows in the source files that do not match the format
options you specified, the query reports an error. You can use DBMS_CLOUD
parameters, like
rejectlimit
to suppress these errors. As an alternative, you can also
validate the external partitioned table you created to see the error messages and the
rejected rows so that you can change your format options accordingly. See Validate External Data and Validate External Partitioned Data for more information.