Use Full-Text Search on Files in Object Storage
The PL/SQL package DBMS_CLOUD enables you to build a text index on the object store files, which allows you to search the text and use wildcards with your search.
- About Full-Text Search on Files in Object Storage
You can create a text index on files in object storage. A text index allows you to perform a word-based search across very large data sets in object store. - Create a Text Index on Object Storage Files
UseDBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
to create a text index on files in object storage. - Drop an Index on the Cloud Storage Files
Use theDBMS_CLOUD.DROP_EXTERNAL_TEXT_INDEX
procedure to drop a text index on object storage files. - Text Index Reference Table
A local table is created within your database with a standard suffixINDEX_NAME$TXTIDX
. This table is created internally when you runDBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
. - Monitor Text Index Creation
When you runDBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
the text index creation operation is logged in theALL_SCHEDULER_JOB_RUN_DETAILS
view.
Parent topic: Full-Text Search Queries
About Full-Text Search on Files in Object Storage
DBMS_CLOUD
provides
fast and efficient ways to manage data in object store. The DBMS_CLOUD
APIs let you create,
copy, download, delete, and traverse files present in object store. When you define
external tables you can run SQL queries on data stored in your object store (or with
hybrid partitioned external tables, across data in your database and in object store).
When you use DBMS_CLOUD
to define
a text index, this allows you to search your data for text and use wildcards.
Autonomous Database support for word-based search works for commonly used data formats, for example CSV or JSON and with formatted documents (binary), for example PDF and DOC (MS Word) formats. You can configure a refresh rate that indicates the frequency in minutes at which the index is refreshed for any new uploads or deletes.
A local table with the standard suffix INDEX_NAME$TXTIDX
is
created when you create an index on the object storage, and you can utilize the table to
perform a search using the CONTAINS
keyword.
See Indexing with Oracle Text for more information.
Parent topic: Use Full-Text Search on Files in Object Storage
Create a Text Index on Object Storage Files
Use DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
to create a
text index on files in object storage.
Formatted documents (binary) are supported when you specify the
binary_files
format
option with DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
.
You can include a stop word list when you specify the
stop_words
format
option with DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
.
See Indexing with Oracle Text for more information on Oracle Text stop words and working with binary files.
Parent topic: Use Full-Text Search on Files in Object Storage
Drop an Index on the Cloud Storage Files
Use the DBMS_CLOUD.DROP_EXTERNAL_TEXT_INDEX
procedure
to drop a text index on object storage files.
Run the DBMS_CLOUD.DROP_EXTERNAL_TEXT_INDEX
procedure to drop a text index on files in object storage.
BEGIN
DBMS_CLOUD.DROP_EXTERNAL_TEXT_INDEX
(
index_name => 'EMP',
);
END;
/
This example drops the EMP
text index.
See DROP_EXTERNAL_TEXT_INDEX Procedure for more information.
Parent topic: Use Full-Text Search on Files in Object Storage
Text Index Reference Table
A local table is created within your database with a standard suffix
INDEX_NAME$TXTIDX
. This table is created internally when you
run DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
.
You can query the INDEX_NAME$TXTIDX
table to search for a
string using the CONTAINS
keyword. For example, when you call DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
procedure the INDEX_NAME
value as EMP
, this
creates the EMP$TXTIDX
the text reference table.
-
object_name
: is the file name on the object storage that contains the searched text string. -
object_path
: is the object storage bucket or folder URI that contains the object storage file. -
mtime
: is the last modified timestamp of the object storage file. This is the time when the file was last accessed byDBMS_CLOUD
.
SELECT object_path, object_name FROM EMP$TXTIDX WHERE CONTAINS(OBJECT_NAME, 'king') > 0;
OBJECT_PATH OBJECT_NAME
------------------------------------------------------------------------------------------ ------------------------------------
https://objectstorage.us-phoenix-1.oraclecloud.com/n/example1/b/adbs_data_share/o/ts_data/ data_2_20221026T195313585601Z.json
This query returns the file names and location URI on the object storage which
contains the text string king
, in either upper or lowercase.
SELECT object_name, mtime FROM EMP$TXTIDX;
OBJECT_NAME MTIME
----------------------------- -------------------------------------
data_1_20220531T165402Z.json 31-MAY-22 04.54.02.979000 PM +00:00
data_1_20220531T165427Z.json 31-MAY-22 04.54.27.997000 PM +00:00
This query returns file name and last modified timestamp of the object files on which the index EMP
is created.
Parent topic: Use Full-Text Search on Files in Object Storage
Monitor Text Index Creation
When you run DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
the text index creation
operation is logged in the ALL_SCHEDULER_JOB_RUN_DETAILS
view.
You can query the ALL_SCHEDULER_JOB_RUN_DETAILS
view to obtain the
status and any error reported by the index creation job.
The name of the DBMS_SCHEDULER
job is derived from the
INDEX_NAME
parameter specified when you call DBMS_CLOUD.CREATE_EXTERNAL_TEXT_INDEX
.
To query the ALL_SCHEDULER_JOB_RUN_DETAILS
view, you must be logged
in as the ADMIN
user or have READ
privilege on the
ALL_SCHEDULER_JOB_RUN_DETAILS
view.
For example, the following SELECT
statement with a
WHERE
clause on job_name
shows the run details for
the job:
SELECT status, additional_info
FROM all_scheduler_job_run_details WHERE LOWER(job_name) = LOWER('index_name$JOB');
You can also query for the existence of an index creation scheduler job.
SELECT status
FROM all_scheduler_jobs where LOWER(job_name) = LOWER('index_name$JOB');
See CREATE_EXTERNAL_TEXT_INDEX Procedure for more information.
Parent topic: Use Full-Text Search on Files in Object Storage