Harvest On-Premises Data Sources

Harvesting is a process that extracts technical metadata from your data source into your data catalog. This tutorial provides the steps to harvest from an on-premises Oracle Database data source that is connected to Oracle Cloud Infrastructure using a Virtual Cloud Network (VCN).

In this tutorial, you:

  1. Create the policies needed to harvest on-premises data sources.
  2. Obtain the on-premises database access details.
  3. Create a private endpoint in Data Catalog.
  4. Attach the private endpoint to your data catalog.
  5. Create a data asset.
  6. Add a connection for the data asset.
  7. Harvest the data asset.

For more information, see Configuring a Private Network.

Before You Begin

To successfully perform this tutorial, you must have the following:

Before you can harvest an on-premises data source, you must connect your on-premises data source to Oracle Cloud Infrastructure.

Connecting an On-Premises Data Source to Oracle Cloud Infrastructure

1. Create Access Policies

To configure Data Catalog to access the private network of a data source, you need access to networking and data catalog resources.

If you already have access to perform all Data Catalog and Networking operations in your required compartments, you may skip this step.

To create the policy needed to configure a private network in data catalog, perform the following steps:

  1. Open the navigation menu  and select Identity & Security. Under Identity, select Policies.
  2. In the Policies page, click Create Policy.
  3. In the Create Policy panel, enter the following details:
    • Name: Enter a unique name for the policy. The name must be unique across all policies in your tenancy. You can't change the name later. For example, data-catalog-private-endpoint-policy.
    • Description: Enter a description, such as Grant permissions to create a private network.
    • Compartment: Select a compartment in which you want to create the policy.
    • Policy Builder: In this section, move the slider to Show manual editor, and enter the policy rule. For example, for the data-catalog-users group, enter the following policy rule:
      allow group data-catalog-users to manage data-catalog-private-endpoints in tenancy
      Note

      This policy allows users in the data-catalog-users group to perform all data catalog private endpoint operations in any compartment in the tenancy.
    • Select Create Another Policy and enter the following policy rule:
      allow group data-catalog-users to manage virtual-network-family in tenancy
      Note

      This policy allows users in the data-catalog-users group to perform all network-related operations in any compartment in the tenancy.
  4. Click Create.
You have successfully created the policies to access the required resources for configuring a private network in Data Catalog.

2. Obtain Data Source Details

You need the private network and database connection information for the on-premises Oracle Database you want to harvest.

Obtain the following details for the on-premises Oracle Database from your administrator:

  • For configuring the private network, you need the VCN and subnet name and the URL of the Oracle Database.
  • For creating the data asset, you need the Oracle Database host, port, and database service name or SID.
  • For adding a connection, you need the database login credentials.

3. Create a Private Endpoint

You create a Data Catalog private endpoint to configure the network access details for the on-premises Oracle Database data source you want to harvest.

To create a private endpoint in Data Catalog, perform the following steps:

  1. Open the navigation menu  and select Analytics & AI. Under Data Lake, select Data Catalog.
  2. Click Private Endpoints.
  3. On the Private Endpoints page, click Create Private Endpoint.
  4. In the Create Private Endpoint panel, ensure you have permission to work in the selected compartment, and enter a name for the private endpoint. For example, XYZ Private Endpoint.
  5. Select the VCN and subnet that is used to connect your on-premises Oracle Database to Oracle Cloud Infrastructure.
  6. Enter the DNS zone for the Oracle Database. Use a comma to enter more than one data source DNS zone.
  7. Click Create.
The private endpoint is being created. The create process can take a couple of minutes. When the private endpoint is created successfully, the private endpoint is in ACTIVE status.

If the private endpoint status changes to FAILED, ensure you have the created the access policies and set up your private network correctly.

4. Attach a Private Endpoint

You attach a private endpoint to a data catalog to allow data assets to be created for data sources available in the private network.

To attach a private endpoint to a data catalog, perform the following steps:

  1. Click Data Catalogs.
  2. Click the Actions menu for the data catalog where you want to attach the private endpoint and select Attach Private Endpoint.
  3. Select the private endpoint you created in the previous step and click Attach.
The data catalog status changes to Updating, and the private endpoint is being attached. After the private endpoint is attached successfully, the status of the data catalog changes to Active.

5. Create an Oracle Database Data Asset

You are now ready to register your on-premises Oracle Database data source with Data Catalog as a data asset .

To create an Oracle Database data asset, perform the following steps:

  1. Click the data catalog instance where you attached the private endpoint in the previous step.
  2. From your data catalog Home tab, click Create Data Asset from the Quick Actions tile.
  3. In the Create Data Asset panel, enter a name to uniquely identify your data asset. Optionally, enter a description.
  4. From the Type list, select Oracle Database.
  5. In the Host field, enter the database hostname.
  6. In the Port field, enter the database port.
  7. In the Database field, enter the database service name or SID.
  8. Select the Use private endpoint check box.
  9. Click Create.
You have successfully created your Oracle Database data asset.

6. Add a Connection

After creating the Oracle Database data asset, you add a connection for the data asset.

For Oracle database, you can use secrets in Oracle Cloud Infrastructure Vault to store the password that you need to connect to the source using a connection. By using OCI Vault, you provide the OCID of the secret when specifying the connection details, so you don't have to enter the actual password when you create the data asset.

A vault is a container for keys and secrets. Secrets store credentials such as required passwords for connecting to data sources. You use an encryption key in a vault to encrypt and import secret contents to the vault. Secret contents are based64-encoded. Data Catalog uses the same key to retrieve and decrypt secrets while connecting a data asset to the data source. For more information about vault, key, and secret, see Overview of Vault. For information about copying the secret OCID, see View Secret Details.

To add a connection for the Oracle Database data asset, follow these steps:

  1. On the Home tab, click Data Assets.
  2. In the Data Assets list, select the Oracle Database data asset that you created.
  3. On the data asset details page, under Summary, in the Connections section, click Add Connection.
  4. In the Add Connection panel, enter the details as described in the following table:
    Field Description
    Name Enter a unique name for your connection.
    Description Enter a short description for your connection.
    Type Select JDBC.
    User Name Enter your Oracle Database user name
    Use Password Select this option to enter the password associated with your Oracle Database user name. When you select this option, the following field appears:
    • Password - Enter the password associated with your Oracle Database user name.
    Use Vault Secret OCID Select this option to enter the OCID of the secret that is created in OCI Vault for password associated with your Oracle Database username. When you select this option, the following field appears:
    • Vault Secret OCID for Password - Enter the OCID of the secret that's created in OCI Vault for the password associated with your Oracle Database username. For information about copying the secret OCID, see View Secret Details
    Enable TLS Select this check box to enable TLS for this connection.
    Make this the default connection for the data asset. Select this check box to make this connection the default connection for the data asset.
    Test Connection Click the button to test your connection.
  5. Click Add.

7. Harvest the Data Asset

You are now ready to harvest your Oracle Database data asset.

To harvest your Oracle Database data asset, perform the following steps:

  1. Click Harvest on the data asset details page for the data asset.
  2. The Select Connection page displays and the default connection is selected. .
  3. Click Next
  4. The Select Data Entities page displays. View and add all the data entities you want to harvest from the Available Oracle Schema section.
    1. Click the add icon for each data entity you want to include in the harvest job.
    2. Click Add All to select all the entities for harvesting.
    3. Use the Filter Oracle Schema box to find a data entity from the available data entities.
    4. Use the page navigation icons to browse all the data entities.
    5. Click the remove icon for any selected data entity that you want to remove from the harvest job.
    6. If you need to start over, click Remove All and then start over.
    After you have reviewed the data entities you want to harvest from the Selected Oracle Schema / Data Entities section, click Next.
  5. In the Create Job page, in the Job Name field, enter a unique name to identify the harvest job.
  6. Optionally, enter a description.
  7. Select Run job now and then click Create Job.
  8. The job to harvest your Oracle Database data asset is created successfully and the Jobs tab appears. To view job details, click the job name.
Your data asset is harvested successfully and you can review the harvest job details.