To
query data in files in the Cloud, you need to first store your object storage credentials in
your Autonomous Database, and then create an
external table using the PL/SQL procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE.
You can also use the procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE to query external data in attached
file systems or in the local file system.
The procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE supports external files in the
supported cloud object storage services, including:
Oracle Cloud
Infrastructure Object Storage
Azure Blob Storage or Azure Data Lake
Storage
Amazon S3
Amazon S3-Compatible, including: Oracle Cloud
Infrastructure Object Storage, Google Cloud Storage, and Wasabi Hot Cloud Storage.
GitHub Repository
The source file in this example, channels.txt, has the
following data:
This operation stores the credentials in the database in an encrypted format. You can use any name for the credential name. Note that this step is required only once unless your object store credentials change. Once you store the credentials you can then use the same credential name for creating external tables.
See CREATE_CREDENTIAL Procedure for information about
the username and password parameters for different
object storage services.
Create an external table on top of your source files using the procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE.
The procedure DBMS_CLOUD.CREATE_EXTERNAL_TABLE supports external files in
the supported cloud object storage services. The credential is a table level property;
therefore, the external files must be on the same object store.
For example:
BEGIN
DBMS_CLOUD.CREATE_EXTERNAL_TABLE(
table_name =>'CHANNELS_EXT',
credential_name =>'DEF_CRED_NAME',
file_uri_list =>'https://objectstorage.us-phoenix-1.oraclecloud.com/n/namespace-string/b/bucketname/o/channels.txt',
format => json_object('delimiter' value ','),
column_list => 'CHANNEL_ID VARCHAR2(2), CHANNEL_DESC VARCHAR2(20), CHANNEL_CLASS VARCHAR2(20)' );
END;
/
The parameters are:
table_name: is the external table name.
credential_name: is the name of the credential created in the
previous step. The credential_name parameter must conform to Oracle
object naming conventions. See Database Object Naming Rules for more
information.
file_uri_list: is a comma delimited list of the source files you want to query.
format: defines the options you can specify to describe the format of the source file.
If the data in your source files
is encrypted, decrypt the data by specifying the
format parameter with the
encryption option. See Decrypt Data
While Importing from Object Storage for more
information on decrypting data.
column_list: is a comma delimited list of the column definitions in the source files.
In this example, namespace-string is the Oracle
Cloud Infrastructure object storage namespace and
bucketname is the bucket
name. See Understanding
Object Storage Namespaces for more
information.
Note
Autonomous Database supports a variety of source file formats, including compressed data
formats. See DBMS_CLOUD Package Format Options and the DBMS_CLOUDcompression format option to see the supported compression
types.
You can now run queries on the external table you created in the previous step. For example:
SELECT count(*) FROM channels_ext;
By default the database expects all rows in the external data file to be valid and
match both the target data type definitions as well as the format definition of the
files. If there are any rows in the source files that do not match the format options
you specified, the query reports an error. You can use DBMS_CLOUD parameters, like rejectlimit, to suppress
these errors. As an alternative, you can also validate the external table you created to
see the error messages and the rejected rows so that you can change your format options
accordingly. See Validate External Data for more information.
The external table metadata helps you determine where data is coming
from when you perform a query.
The external tables you create with DBMS_CLOUD.CREATE_EXTERNAL_TABLE, DBMS_CLOUD.CREATE_EXTERNAL_PART_TABLE, or DBMS_CLOUD.CREATE_HYBRID_PART_TABLE
include two invisible columns file$path and file$name.
These columns help identify which file a record is coming from.
file$path: Specifies the file path text up to the
beginning of the object name.
file$name: Specifies the object name, including all
the text that follows the final "/".
For example:
SELECT genre_id, name, file$name, file$path FROM ext_genre
WHERE rownum <= 2;
genre_id name file$name file$path
-------- --------- ----------- ----------------------
1 Action genre.csv https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/moviestream_gold/o/genre
2 Adventure genre.csv https://objectstorage.us-ashburn-1.oraclecloud.com/n/namespace-string/b/moviestream_gold/o/genre