Synchronize Query Server with Hive

You can synchronize the Oracle Cloud SQL Query Server with the Hive databases that you specified by using Apache Ambari or Cloudera Manager, or the dbms_bdsqs.sync_hive_databases PL/SQL API.

Use one of the following methods to synchronize Query Server with the Hive databases in the metastore:

  • Execute the Restart this Cloud SQL Query Server command in Apache Ambari or Cloudera Manager.
  • Execute the Synchronize Hive Databases command in Apache Ambari or Cloudera Manager.
  • Invoke the dbms_bdsqs.sync_hive_databases PL/SQL API locally on the edge node.
Note

The dbms_bdsqs.sync_hive_databases PL/SQL API only refreshes the Hive table definitions for the Hive databases that have already been synchronized through the other two methods.

Synchronizing Query Server using the PL/SQL API

You can synchronize Oracle Cloud SQL Query Server with the Hive databases that you specified by using the PL/SQL API.

To do so, invoke the dbms_bdsqs.sync_hive_databases PL/SQL API locally on the edge node where Query Server is installed.

The procedure contains no parameters. It synchronizes all of the Hive databases that are already in Query Server. The API refreshes the Query Server with only the Hive databases listed in the sync_hive_db_list configuration parameter. Each successive synchronization (also known as a refresh) processes changes since the last Query Server metadata refresh.

A synchronization captures any tables that were added or dropped in the Hive metastore since the last refresh. This also includes any tables whose schemas might have changed.

Enabling Query Server Full Synchronization

You can specify whether Oracle Cloud SQL Query Server performs a delta (default) or full synchronization.

When Query Server is started for the first time, a full synchronization of all of the databases is performed. An Oracle Database schema is created for each hive database and an external table is created for each Hive table in those schemas. After this initial start, you can use Apache Ambari or Cloudera Manager to specify the list of Hive databases to synchronize.

You can control whether Query Server performs a full or a delta synchronization by using the Enable full synchronization configuration parameter in Apache Ambari or Cloudera Manager. This configuration parameter is deselected by default. To enable Query Server to perform a full synchronization, select this check box in Apache Ambari or Cloudera Manager.

Using Apache Ambari

You can use Apache Ambari to enable Query Server to perform a full synchronization during a restart or a manual synchronization as follows.

  1. Log in to Apache Ambari using your login credentials.

  2. In Apache Ambari, click Cloud SQL under Services from the side toolbar.

  3. Click the Configs tab.

  4. Expand the Advanced metadata-sync tab.

  5. Select the Enable full synchronization checkbox.

  6. Click Save.

Using Cloudera Manager

You can use Cloudera Manager to enable Query Server to perform a full synchronization during a restart or a manual synchronization as follows.

  1. Log in to Cloudera Manager using your login credentials.

  2. In Cloudera Manager, use the Search field to search for the Enable full synchronization configuration parameter. Enter / Enable full synchronization (or enter part of the name until it's displayed in the list) in the Search field, and then press Enter.

  3. Click Cloud SQL: Enable full synchronization. The check box is deselected by default. This indicates that Query Sever will perform a delta synchronization.

  4. To enable full synchronization, select the check box, and then click Save Changes.

A full synchronization drops all of the existing schemas and external tables from Query Server, and then recreates new schemas and new external tables based on the Hive databases list that you specified in the configuration parameters in Apache Ambari or Cloudera Manager.

By default, Query Server performs a delta synchronization between the Hive databases in the metastore that you specify and Query Server. Any changes in the Hive databases, such as dropping or adding tables, will be reflected in Query Server. However, when you start Query Server for the very first time, it creates Oracle schemas based on the Hive databases that you specify in the configuration parameter in Apache Ambari or Cloudera Manager.

The first time the Query Server synchronizes with Hive, the process will be slower than usual. That's because the Query Server is importing all of the tables for the specified databases (configured in Apache Ambari or Cloudera Manager) in the Hadoop cluster. Subsequent refreshes should be much faster, as the Query Server would only refresh the changes that were made to the Hive metadata, such as the additions of new tables. During a delta import, the Query Server also gathers new statistics for tables that have been added or modified.