Harvest from Oracle Object Storage

Harvesting is a process that extracts technical metadata from your data assets into your data catalog. A Data Asset represents a data source. For example: a database, an object store, a file or document store, a message queue, or an application.

In this tutorial, you:

  1. Allow Data Catalog to access any object in your Oracle Object Storage, in any bucket, in any compartment within the tenancy where the policy is created.
  2. Create an Oracle Object Storage data asset.
  3. Add one default connection for the data asset.
  4. Harvest the data asset by running the harvest job immediately.
Important

You can harvest Object Storage files as logical data entities.

Before You Begin

To successfully perform this tutorial, you must have the following:

1. Create an Access Policy

You create a policy to allow Data Catalog to access your Object Storage resources.

At the least, you must have READ permission for all the individual resource types objectstorage-namespaces, buckets, and objects, or for the Object Storage aggregate resource type object-family.

To create an access policy to grant READ permission to the Object Storage aggregate resource type object-family, perform the following steps:

  1. Open the navigation menu and click Identity & Security. Under Identity, click Policies.
  2. In the Policies page, click Create Policy.
  3. In the Create Policy panel, enter the following details:
    • Name: Enter a unique name for the policy. The name must be unique across all policies in your tenancy. You can't change the name later. For example, data-catalog-dynamic-group.
    • Description: Enter a description, such as Grant access to object storage resources in any compartment in the tenancy.
    • Compartment: Select a compartment in which you want to create the policy.
    • Policy Builder: In this section, move the slider to Show manual editor, and enter the policy rule. For example, for the data-catalog-dynamic-group dynamic group, enter the following policy rule:
      allow dynamic-group data-catalog-dynamic-group to read object-family in tenancy
      Note

      This policy allows access to any object, in any bucket, in any compartment within the tenancy where the policy is created. For more examples, see policy examples.
  4. Click Create.
You have successfully created the policy to allow Data Catalog to access all your Oracle Object Storage resources.

2. Create a Data Asset

You're now ready to register the Oracle Object Storage data sources with Data Catalog as a data asset .

To create an Oracle Object Storage data asset, perform the following steps:

  1. Open the navigation menu and click Analytics & AI. Under Data Lake, click Data Catalog.
  2. Click the data catalog instance where you want to create your data asset.
  3. On your data catalog instance Home page, click Create Data Asset from the Quick Actions tile.
    Note

    After creating a data catalog instance, when you access the Home tab for the first time, you get the Create Data Asset button on the Data Assets tile.
  4. In the Create Data Asset panel, enter the details as described in the following table:
    Field Description
    Name

    Enter a name to uniquely identify your data asset. You can edit the name later.

    You can't use the following special characters in the name:

    & < > " ' / \ = ;

    Name is a searchable field in Data Catalog.

    Description Specify the need or purpose for creating this data asset.
    Type Select Oracle Object Storage.
    URL Enter the swift URI for your Oracle Object Storage resource in the following format: https://swiftobjectstorage.<region-identifier>.oraclecloud.com

    For example:

    https://swiftobjectstorage.us-phoenix-1.oraclecloud.com/
    Namespace Enter the object storage namespace for the specified Oracle Cloud Infrastructure Object Storage resource.

    To view your Object Storage namespace string in the Console, from the Profile menu click Tenancy:<your_tenancy_name>. The namespace is listed under Object Storage Settings.

  5. Click Create.
You have successfully created an Oracle Object Storage data asset.

3. Add a Connection

After creating the Oracle Object Storage data asset, you create a connection for the data asset.

To add a connection for the Oracle Object Storage data asset, follow these steps:

  1. On the Home tab, click Data Assets.
  2. In the Data Assets list, select the Oracle Object Storage data asset that you created.
  3. In the Summary tab on the data asset details page, under Connection Information, click Add Connection.
  4. In the Add Connection panel, enter the details as described in the following table:
    Field Description
    Name Enter a unique name for your connection.
    Description Enter a short description for your connection.
    Type Select one of the following:
    • Resource Principal - Resource Principal is the recommended connection type. Before you create a Resource Principal connection for your Oracle Object Storage data asset, you must create a policy to allow Data Catalog to access the Object Storage resource.
    • Pre-Authenticated Request - Select this connection type to harvest a public or private object storage bucket for which you have access through a pre-authenticated request. When you select this connection type, the Pre-Authenticated Request URL field appears—enter the pre-authenticated request URL to access the object storage bucket. For more information about using this type of connection, see Using Pre-Authenticated Requests.
    OCI Region Enter the region identifier for your Object Storage resource.

    To view the region identifier for your region in the Console, from the Profile menu click Tenancy: <your_tenancy_name>. From the Manage Regions info banner, click Manage Regions. The region names and identifiers are listed.

    Compartment Select the compartment for your Object Storage resource.

    To view the compartment, in the Console, open the navigation menu, click Identity & Security. Under Identity, click Compartments. Click the compartment link for your Object Storage resource. In the Compartment details page, copy the OCID under the Compartment Information tab.

    Make this the default connection for the data asset. Select this check box to make this connection the default connection for the data asset.
    Test Connection Click the button to test your connection.
  5. Click Add.

4. Harvest the Data Asset

You are now ready to harvest your Oracle Object Storage data asset.

To harvest your Oracle Object Storage data asset, perform the following steps:

  1. On the data asset details page, click Harvest.
    The Select Connection page appears with the default connection selected.
  2. Click Next.
    The Select Data Entities page appears.
  3. From the Available Bucket section, add the data entities that you want to harvest. To add a data entity, click the add icon next to it. To harvest all the data entities, click Add All.
    The other operations that you can perform on this page are as follows:
    • To find a data entity from the available data entities, use the Filter Bucket / data entities box.
    • Use the page navigation icons to browse all the data entities.
    • To remove a selected data entity from the harvest job, click the remove icon next to the data entity.
    • To remove all the selected data entities, click Remove All.
  4. Click Next.
    The Create Job page appears.
  5. On this page, do the following:
    1. Job Name: Enter a unique name to identify the harvest job.
    2. Job Description: Enter a description.
    3. Incremental Harvest: Select this check box if you want the subsequent runs of this harvesting job to harvest only the data entities that have changed from the first run of the harvesting job.
    4. Include Unrecognized Files: Select this check box to harvest a logical data entity that's composed of only archived files, or any other file that's not supported in Data Catalog. For example, .log, .txt, .sh, .jar, and .pdf.
    5. Include matched files only: Select this check box if you want Data Catalog to harvest only the files that match the assigned filename patterns. When you select this check box, the files that don't match the assigned filename patterns are ignored during the harvest. They're added to the skipped count.
    6. Time of Execution: In this section, select one of the following options:
      • Run job now: Creates a harvest job, runs it immediately.
      • Schedule job run: Displays more fields to schedule the harvest job. Enter a name and description for the schedule. Specify how often you want the job to run. Your choices are hourly, daily, weekly, and monthly. Finally, select the start and end time for the job.
      • Save job configurations for later: Creates a job to harvest the data asset, but the job isn't run. You can run or schedule the job on the Jobs page later.
  6. Click Create Job.
    The job to harvest the Oracle Object Storage data asset is created successfully. The job is listed in the Jobs page.
The data asset is harvested successfully depending on the schedule you selected. You can review the harvest job details by clicking the job name in the Jobs page.