RAG Tool Object Storage Guidelines for Generative AI Agents
Review the following guidelines to prepare Object Storage data for RAG tools in Generative AI Agents.
General Guidelines
To prepare data for Generative AI Agents data sources in OCI Generative AI Agents follow these guidelines:
- Data Sources: Data for Generative AI Agents must be uploaded as files to an Object Storage bucket.
- Number of Buckets: Only one bucket is allowed per data source.
- Supported File Types: Only
PDF
andtxt
files are supported. - File Size Limit: Each file must be no larger than 100 MB.
- PDF Contents:
PDF
files can include images, charts, and reference tables but these must not exceed 8 MB. - Chart Preparation: No special preparation is needed for charts, as long as they're two-dimensional with labeled axes. The model can answer questions about the charts without explicit explanations.
- Table Preparation: Use reference tables with several rows and columns. For example, the agent can read the table on the limits page.
- URLs: All the hyperlinks present in the
PDF
documents are extracted and displayed as hyperlinks in the chat response. - Data Not Ready: If your data isn't yet available, create an empty folder for the data source and populate it later. This way, you can ingest data into the source after the folder is populated.
Set up the following Object Storage permissions before you proceed.
- User access to Object Storage files
- Data ingestion job access to Object Storage files for long-running jobs
See Getting Access for the permissions.
The metadata filtering feature aims to improve response quality by using filter conditions that you define, helping the model generate answers relevant to the content scope.
Review the following options to select one or more methods that works best for you.
Method | Location | Usage |
---|---|---|
Include metadata for all the files in a bucket without mentioning the file names. | Create a _common.metadata.json file at the Object Storage root level. |
Use this file for metadata that's common to all files in the bucket. This method helps avoid entering metadata duplicates across objects. |
In one file create a metadata entry for each file in a bucket and include the file names. | Create an _all.metadata.json file at the Object Storage root level. |
Use this method if you have a lot of files and creating one file that includes all the file names is more convenient for you than creating one metadata file per file. |
Create a metadata file for each file in a bucket. | Create a <file-name>.metadata.json file for each file, at the file level. |
Use this method when metadata differs for each file and there aren't many files to create a metadata file for, or if you're automating the creation of the metadata files. |
Add Object Storage metadata headers to each file. | Add metadata header through each file'sObject Storage metadata property. | Use this method, if you have few metadata properties to include. We recommend you use the other methods with JSON files, because files are easier to update and manage and metadata headers are difficult to update. |
For all methods in the preceding table, you must define a metadata schema file called _metadata_schema.json
at the Object Storage root level. Here's an example hierarchy of where you save the metadata files.
The following steps show how to format the metadata JSON files using examples.
Description | Limit |
---|---|
Maximum number of entries in _all.metadata.json |
10,000 |
Maximum number of metadata fields that can be specified for each file | 20 |
Maximum number of items in a list_of_string type |
10 |
Maximum length of individual item in a list_of_string type |
50 |
Maximum length of a metadata key in characters | 25 |
Maximum length of metadata value in characters | 50 |
Beta Customers:
If you created a knowledge base in the Beta phase, you might need to delete and re-create the data source for the URL handling feature to work.
metadata
object for that file.This topic shows how to add or update the metadata
object through OCI CLI.
- The
metadata
object that overrides the default citation must have the name,customized_url_source
. - You can have one
metadata
object with the name,customized_url_source
- Each
customized_url_source
can have only one URL. - The commands in step 5 works for both adding and updating the
metadata
object, because they replace the currentmetadata
object's value. - Ensure that you pass the values for the
--metadata
object with the format shown in the commands in step 5.