Harvesting a Data Asset

Harvest a data asset to extract the data structure information into the Data Catalog and view its data entities and attributes.

To harvest a data asset, you must have created at least one connection to the data asset.

    1. On the Home tab of the instance for which you want to harvest a data asset, click Data Assets.
    2. On the Data Assets page, click the data asset that you want to harvest.
    3. On the data asset details page, click Harvest.
    4. In the Select a Connection section, select the connection that you want to use and click Next.
      Note

      For an Object Storage data asset, you can use the Assign Filename Patterns button to assign filename patterns to the selected data asset. For more information, see Assigning Filename Patterns to Data Assets.
    5. In the Select Data Entities section, view and add all the data entities you want to harvest from the data entities table. Click the add icon for each data entity you want to include in the harvest job. Expand the data entity folders to view the nested data entities and include them. Click Add All to select all the entities for harvesting. To find a data entity from the available data entities, use the Filter folders / data entities box.
      Note

      Only folders and data entities that you have select or read access to are listed. If you don't find the data entities you are looking for, then ensure that you have the access privileges to those data entities in your data source.
    6. Review the data entities you want to harvest in the data entities table.
    7. (Optional) Click the remove icon for any selected data entity that you want to remove from the harvest job. If you need to start over, click Remove All.
      Note

      In data assets of type Oracle Database or Autonomous Databases, if the database version is Oracle Database 12c and above, Data Catalog harvester doesn't harvest the Oracle maintained schemas and other common user  schemas.
    8. Click Next.
    9. In the Create Job tab, in the Job Name field, enter a unique name to identify the harvest job.
    10. (Optional) Enter a Description.
    11. Select the Incremental Harvest check box if you want subsequent runs of this harvesting job to only harvest data entities that have changed since the first run of the harvesting job.
      Note

      Incremental Harvest does not apply to MySQL, PostgreSQL, Hive, and Kafka data assets.
    12. Select the Include Unrecognized Files check box if you want Data Catalog to also harvest files that aren't supported. For example, .log, .txt, .sh, .jar, and .pdf.
      Note

      Select the Include Unrecognized Files option to harvest a logical data entity  that's composed of only archived files.
    13. If you are harvesting an Oracle Object Storage data asset, select the Include matched files only check box to harvest only the files that match the assigned filename patterns. The other filename patterns are ignored and are added to the skipped count.
    14. Select one of the following options to specify the time of execution for the harvest job:
      • Run job now: Creates a harvest job, runs it immediately.
      • Schedule job run: Displays more fields to schedule the harvest job. Enter a name and description for the schedule. Specify how often you want the job to run. Your choices are hourly, daily, weekly, and monthly. Select the start and end time for the job.
      • Save job configurations for later: Creates a job to harvest the data asset, but the job doesn't run.
    15. Click Create Job.

    On the Jobs tab, you can track the status of your job and view job details.

  • Use the create command and required parameters to harvest data entities from a data asset:

    oci data-catalog job-definition create [OPTIONS]

    For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.

  • Run the CreateJobDefinition operation to harvest data asset.