Using Big Data Connectors

You can use Oracle Big Data Connectors and Oracle Copy to Hadoop (a feature of Big Data SQL) to load data from an Big Data Service cluster into an Oracle Cloud database instance and to copy from an Oracle Cloud database instance to a Big Data Service cluster. The database can be an Oracle Autonomous Database or a co-managed Oracle Database service, as shown in the following table:

Type of Database Features Supported for Copying Data
Oracle Autonomous Database You can use the following with Oracle Shell for Hadoop Loaders (OHSH) to copy data between a Big Data Service cluster and an Autonomous Database instance.
  • Oracle Loader for Hadoop (OLH)
  • The Copy to Hadoop (CP2HADOOP) feature of Oracle Big Data SQL
Co-managed Oracle Database You can use the following with Oracle Shell for Hadoop Loaders to copy data between a Big Data Service cluster and a co-managed Oracle Database instance.
  • Oracle SQL Connector for HDFS (OSCH)
  • Oracle Loader for Hadoop
  • The Copy to Hadoop feature of Oracle Big Data SQL

Features

Big Data connectors and features are pre-installed on your Big Data Service clusters. The Copy to Hadoop feature of Oracle Big Data SQL is also already installed on your cluster.

The following features are pre-installed on every node of your cluster:

  • Oracle Shell for Hadoop Loaders

    Oracle Shell for Hadoop Loaders (OHSH) is a helper shell that provides a simple-to-use command line interface to Oracle Loader for Hadoop, Oracle SQL Connector for HDFS, and Copy to Hadoop.

  • Copy to Hadoop

    Copy to Hadoop (CP2HADOOP) is a feature of Oracle Big Data SQL, for copying data from an Oracle database to HDFS.

  • Oracle Loader for Hadoop

    Oracle Loader for Hadoop (OLH) is a high-performance loader for loading data from a Hadoop cluster into a table in an Oracle database.

  • Oracle SQL Connector for Hadoop Distributed File System (HDFS)

    Oracle SQL Connector for HDFS (OSCH) enables an Oracle external table to access data stored in HDFS files or in a table in Apache Hive. Use this connector only for loading data into a co-managed Oracle Database service.

    Note

    Oracle SQL Connector for HDFS is supported only for connecting to a co-managed Oracle Database service. It is not supported for connecting to Oracle Autonomous Database.
  • Oracle Instant Client for Linux

    Oracle Instant Client enables development and deployment of applications that connect to Oracle Database.

Set TNS Settings for Connecting to a Database

Configure TNS for Autonomous Database
  1. Download Client Credentials from the Autonomous Database console and unzip it to the /opt/oracle/bdc-test/dbwallet/client directory.

  2. Change to the directory where you unzipped the file.

    cd /opt/oracle/bdc-test/dbwallet/client
  3. Edit sqlnet.ora and change the WALLET_LOCATION parameter to the path /opt/oracle/bdc-test/dbwallet/client.

    For example:

    WALLET_LOCATION = (SOURCE = (METHOD = file) (METHOD_DATA = (DIRECTORY="/opt/oracle/bdc-test/dbwallet/client"))) 
  4. Create a file called connection.properties in this directory and include the following properties:

    javax.net.ssl.trustStore=/opt/oracle/bdc-test/dbwallet/client/cwallet.sso
    javax.net.ssl.trustStoreType=SSO
    javax.net.ssl.keyStore=/opt/oracle/bdc-test/dbwallet/client/cwallet.sso 
    javax.net.ssl.keyStoreType=SSO
    
  5. Test to verify database connectivity using your Autonomous Database wallet configuration, as follows:

    1. Get tns names from tnsnames.ora. For example:

      myuseradw_high = ( -- configuration )
    2. Run the following commands and enter the admin password when prompted:

      sqlplus admin@<tnsentry_name>

      For example:

      export TNS_ADMIN=/opt/oracle/bdc-test/dbwallet/client/
      sqlplus admin@myuseradw_high
Configure TNS for a Co-Managed Oracle Database Service
  1. Download the tnsnames.ora file for a co-managed Oracle Database service and copy it to the /opt/oracle/bdc-test/dbwallet/client directory.

  2. If SSL is enabled on the co-managed database, copy the sqlnet.ora and cwallet.sso files to the /opt/oracle/bdc-test/dbwallet/client directory.

  3. Edit sqlnet.ora and change the WALLET_LOCATION parameter to the path /opt/oracle/bdc-test/dbwallet/client.

  4. Create a file called connection.properties in this directory and include the following properties:

    javax.net.ssl.trustStore=/opt/oracle/bdc-test/dbwallet/client/cwallet.sso
    javax.net.ssl.trustStoreType=SSO
    javax.net.ssl.keyStore=/opt/oracle/bdc-test/dbwallet/client/cwallet.sso 
    javax.net.ssl.keyStoreType=SSO
    
  5. Test to verify database connectivity as follows:

    1. Get tns names from tnsnames.ora. For example:

      myuseradw_high = ( -- configuration )
    2. Run the following commands and enter the admin password when prompted:

      sqlplus admin@<tnsentry_name>

      For example:

      export TNS_ADMIN=/opt/oracle/bdc-test/dbwallet/client/
      sqlplus admin@myuseradw_high

Configuring Big Data Service to Run Connector Examples

Before running the examples included with your Oracle Shell for Hadoop Loaders installation, perform the following configuration steps. Use the oracle operating system user.

Step 1: Unzip the Examples

The examples are included in the Oracle Shell for Hadoop Loaders installation on your cluster. Unzip the file examples.zip into /opt/oracle/bdc-test directory.

For example:

cd /opt/oracle/bdc-test
unzip /opt/oracle/ohsh-version/examples.zip -d /opt/oracle/bdc-test
Step 2: Create the OHSH_EXAMPLES User
  1. Change to the /opt/oracle/bdc-test/examples directory and perform the following steps. See examples/README.txt for additional information.

  2. The environment file bdc_env.sh is pre-configured with the environment variables required to run OHSH. Run the following command to configure the environment on your cluster node.

    source /opt/oracle/bdc-test/bdc_env.sh
  3. Log on to SQL*Plus as sysdba, create a database user which will be used to run the Oracle Loader for Hadoop examples, and grant the required privileges, as shown in the following examples:

    Example for Co-Managed Database Service:

    --<ServiceName> points to the co-managed Database service TNS entry name you configured in Set TNS Settings for Connecting to a Database.

    sqlplus sys@<ServiceName> as sysdba 
    sqlplus sys@PDB1 as sysdba 
    create user ohsh_examples identified by <example-password> 
       quota unlimited on users; 
    alter user ohsh_examples default tablespace users;
    grant create session, alter session, create table, create view, 
        create procedure, create type, create sequence 
    to ohsh_examples; 
    

    Example for Autonomous Database:

    --<ServiceName> points to your ADB TNS entry name configured in Set TNS Settings for Connecting to a Database.

    sqlplus admin@<ServiceName> 
    sqlplus admin@myuseradw_high 
    create user ohsh_examples identified by <example-password> 
    quota unlimited on data; 
    alter user ohsh_examples default tablespace data; 
    grant create session, alter session, create table, create view, 
       create procedure, create type, create sequence 
    to ohsh_examples; 
    
Step 3: Configure the Database Node to Run the Examples with Oracle Loader for Hadoop and Copy To Hadoop

Follow these steps if you are using a co-managed Oracle Database service only.

  1. Connect to the co-managed Database service node using SSH and copy the /opt/oracle/bdc-test/examples/sysdba_configure.sql script.

  2. Sign in as the oracle operating system user.

  3. Connect as the sysdba user for the pluggable database (PDB) and run sysdba_configure.sql <USER_NAME>, where <USER_NAME> is the database user you created in Step 2: Create the OHSH_EXAMPLES User, above. Note that the user name must be all uppercase.

    Example:

    @sysdba_configure.sql OHSH_EXAMPLES

    The sysdba_configure.sql creates two database directory objects required to run some examples:
    • OLHP_DEFAULT_DIR points to /tmp/olhp_default
    • OLHP_STAGE_DIR points to /tmp/olhp_stage
  4. Before running the examples, verify the above directories exist at the OS level. Both directories must be owned by the oracle OS user, with read and write permissions. If these directories don't exist, you can create them using the following commands:

    sudo su oracle 
    mkdir /tmp/olhp_default 
    mkdir /tmp/olhp_stage 
Step 4: Configure the Co-Managed Database Service Node to Run the Examples with Oracle SQL Connector for HDFS

You must configure the co-managed Database service node in order to run the examples, as shown below. See Oracle Big Data Connectors User's Guide, section Installing and Configuring a Hadoop Client on the Oracle Database System for more details.

  1. Generate Oracle SQL Connector for HDFS zip file on the cluster node and copy to the database node. Example:
    cd /opt/oracle
    zip -r /tmp/orahdfs-<version>.zip orahdfs-<version>/*
  2. Unzip the Oracle SQL Connector for HDFS zip file on the database node. Example:
    mkdir -p /u01/misc_products/bdc
    unzip orahdfs-<version>.zip -d /u01/misc_products/bdc
  3. Install the Hadoop client on the database node in the /u01/misc_products/ directory.
  4. Connect as the sysdba user for the PDB and verify that both OSCH_BIN_PATH and OSCH_DEF_DIR database directories exist and point to valid operating system directories. For example,

    create or replace directory OSCH_BIN_PATH as 
       '/u01/misc_products/bdc/orahdfs-<version>/bin'; 
    grant read,execute on directory OSCH_BIN_PATH to OHSH_EXAMPLES; 

    where OHSH_EXAMPLES is the user created in Step 2: Create the OHSH_EXAMPLES User, above.

    create or replace directory OSCH_DEF_DIR as 
       '/u01/misc_products/bdc/xtab_dirs'; 
    grant read,write on directory OSCH_DEF_DIR to OHSH_EXAMPLES; 

    Note: create the xtab_dirs operating system directory if it doesn't exist.

  5. Change to your OSCH (Oracle SQL Connector for HDFS) installation directory, and edit the configuration file hdfs_stream. For example,

    sudo su -l oracle 
    cd /u01/misc_products/bdc/orahdfs-<version> 
    vi bin/hdfs_stream

    Check that the following variables are configured correctly. Read the instructions included in the hdfs_stream file for more details.

    #Include Hadoop client bin directory to the PATH variable 
    export PATH=/u01/misc_products/hadoop-<version>/bin:/usr/bin:/bin 
    export JAVA_HOME=/usr/java/jdk<version> 
    #See explanation below 
    export HADOOP_CONF_DIR=/u01/misc_products/hadoop-conf
    #Activate the Kerberos configuration for secure clusters
    export HADOOP_CLIENT_OPTS="-Djava.security.krb5.conf=/u01/misc_products/krb5.conf"
  6. Configure the Hadoop configuration directory (HADOOP_CONF_DIR).

    If it's not already configured, use Apache Ambari to download the Hadoop Client configuration archive file, as follows:

    1. Login to Apache Ambari.
      https://<ambari-host>:7183/#/main/services/BIGDATASQL/summary
    2. Click the HDFS service, and select the action Download Client Configuration.

    3. Extract the files under the HADOOP_CONF_DIR (/u01/misc_products/hadoop-conf) directory. Ensure that the hostnames and ports configured in HADOOP_CONF_DIR/core-site.xml are accessible from your co-managed Database service node (see the steps below). For example,

      <property> 
      <name>fs.defaultFS</name> 
      <value>hdfs://bdsmyhostmn0.bmbdcsxxx.bmbdcs.myvcn.com:8020</value> 
      </property>

      In this example host bdsmyhostmn0.bmbdcsxxx.bmbdcs.myvcn.com and port 8020 must be accessible from your co-managed Database service node.

    4. For secure clusters:
      1. Copy the Kerberos configuration file from the cluster node to the database node. Example:
        cp krb5.conf /u01/misc_products/
      2. Copy the Kerberos keytab file from the cluster node to the database node. Example:
        cp <kerberos-user-keytab> /u01/misc_products/
  7. Run the following commands to verify that HDFS access is working.

    #Change to the Hadoop client bin directory 
    cd /u01/misc_products/hadoop-<version>/bin 
    #--config points to your HADOOP_CONF_DIR directory. 
    ./hadoop --config /u01/misc_products/hadoop-conf fs -ls 

    This command should list the HDFS contents. If you get a timeout or "no route to host" or "unknown host" errors, you will need to update your /etc/hosts file and verify your Big Data Service Console network configuration, as follows:

    1. Sign into the Cloud Console, click Big Data, then Clusters, then <your_cluster>, then Cluster Details.

    2. Under the List of cluster nodes section, get the fully qualified name of all your cluster nodes and all the IP addresses .

    3. Edit your co-managed Database service configuration file /etc/hosts, for example:

      #BDS hostnames 
      xxx.xxx.xxx.xxx bdsmynodemn0.bmbdcsad1.bmbdcs.oraclevcn.com bdsmynodemn0 
      xxx.xxx.xxx.xxx bdsmynodewn0.bmbdcsad1.bmbdcs.oraclevcn.com bdsmynodewn0 
      xxx.xxx.xxx.xxx bdsmynodewn2.bmbdcsad1.bmbdcs.oraclevcn.com bdsmynodewn2 
      xxx.xxx.xxx.xxx bdsmynodewn1.bmbdcsad1.bmbdcs.oraclevcn.com bdsmynodewn1 
      xxx.xxx.xxx.xxx bdsmynodeun0.bmbdcsad1.bmbdcs.oraclevcn.com bdsmynodeun0 
    4. Verify that your Network Ingress Rules are configured correctly. Click Big Data, then Clusters, then <your_cluster>, then Network Information, then <your_subnet>, then <your_security_list>.

      Verify that the following ingress rules are configured.

      • Configure an ingress rule to allow TCP network traffic using the HDFS port which was configured in the fs.defaultFS property (see step 3, above).
      • Configure an ingress rule to allow TCP network traffic for the datanode using the port number of the property dfs.datanode.address.

        To confirm the port number, in Apache Ambari click Services, then HDFS, and then Configs.

      • In the Properties filter box, search for the dfs.datanode.address, for example:

        Source IP Protocol Source Port Range Destination Port Range 
        ----------------------------------------------------------------------- 
        111.111.0.0/16 TCP All 8020 
        111.111.0.0/24 TCP All 50010

        If no ingress rules are configured, click Add Ingress Rule to create a new rule.

Step 5: Configure setresources.ohsh to Run the Examples

On the Big Data Service utility node, edit /opt/oracle/bdc-test/examples/setresources.ohsh and complete the following configuration.

  1. Configure local directories.
    • For a co-managed Database service:

      set defaultdirectory OLHP_DEFAULT_DIR 
      set locationdirectory OLHP_STAGE_DIR
    • For Autonomous Database Service, comment the previous lines (that is, append with #). Local directories are not available:

      #set defaultdirectory OLHP_DEFAULT_DIR 
      #set locationdirectory OLHP_STAGE_DIR 
  2. Configure OHSH resources.

    Configure the connect_id and connectiondir parameters according to your database service (co-managed Database service or Autonomous Database) which was configured as described in Set TNS Settings for Connecting to a Database.

    <TNS_ADMIN> points to the wallet location path.

    Co-managed Database service example:

    # SQL*Plus resource 
    create sqlplus resource sql0 connectid="PDB1" 
    # JDBC resource 
    create oracle jdbc resource jdbc0 connectid="PDB1" 

    Autonomous Database example:

    # SQL*Plus resource 
    create sqlplus resource sql0 connectid="myuseradw_high" 
    
    # JDBC resource 
    create oracle jdbc resource jdbc0 connectiondir=<TNS_ADMIN> \ 
    connectid="myuseradw_high" 
  3. Configure Oracle Shell for Hadoop Loaders resources to use Hadoop Credential Store

    The previous examples will prompt for the database user password before running the examples. You can create a Hadoop credential store to protect your database password and avoid being prompted each time.

    To enable this feature, run the following commands:

    • Create a Hadoop credential store.

      hadoop credential create pw_alias -provider \ 
      "jceks://file/opt/oracle/bdc-test/examples/ohsh_hadoop_cred_store.jceks" 

      Enter the database user password when prompted. This command will store the alias pw_alias in a jceks file under the OHSH examples directory.

    • Configure OHSH resources using the credential store.

      Autonomous Database:

      # SQL*Plus resource 
      create sqlplus resource sql0 \ 
      user=OHSH_EXAMPLES passwordalias="pw_alias" \ 
      provider="jceks://file/opt/oracle/bdc-test/examples/ohsh_hadoop_cred_store.jceks" \ 
      connectid="myuseradw_high" 
      
      # JDBC resource 
      create oracle jdbc resource jdbc0 \ 
      user=OHSH_EXAMPLES passwordalias="pw_alias" \ 
      provider="jceks://file/opt/oracle/bdc-test/examples/ohsh_hadoop_cred_store.jceks" \ 
      connectiondir=<TNS_ADMIN> connectid="myuseradw_high"

      Co-managed Database service:

      # SQL*Plus resource create sqlplus resource sql0 \ 
      user=OHSH_EXAMPLES passwordalias="pw_alias" \ 
      provider="jceks://file/opt/oracle/bdc-test/examples/ohsh_hadoop_cred_store.jceks" \ 
      connectid="PDB1" 
      
      # JDBC resource create oracle jdbc resource jdbc0 \ 
      user=OHSH_EXAMPLES passwordalias="pw_alias" \ 
      provider="jceks://file/opt/oracle/bdc-test/examples/ohsh_hadoop_cred_store.jceks" \ 
      connectid="PDB1"
  4. Run the ohsh -help command to make sure everything was configured correctly. You should get something like the following example.

    $ cd /opt/oracle/bdc-test/
    $ source bdc_env.sh
    $ cd examples/
    $ ../../ohsh-<version>/bin/ohsh -h
    
    Oracle Shell for Hadoop Loaders Release 5.1.2 - Production (Build:20211008105544)
    Copyright (c) 2016, 2021, Oracle and/or its affiliates. All rights reserved.
    Oracle Loader for Hadoop (5.1.2), Oracle SQL Connector for HDFS (5.1.2), The Copy to Hadoop feature of Oracle Big Data SQL (5.1.2) enabled.
    usage: ohsh [-i <FILE>] [-f <FILE>] [-c] [-h]
      -i <FILE> initialization script
      -f <FILE> main script
      -c hadoop configuration
      -h help 
    
  5. Run the setresources script.

    ohsh>@setresources
  6. Run the following command to confirm the available resources.

    ohsh>show resources 
    
    jdbc0 : Oracle JDBC Resource 
    user=OHSH_EXAMPLES 
    connectId=PDB1 
    sql0 : Oracle SQL*Plus Command Line Resource 
    user=OHSH_EXAMPLES 
    connectId=PDB1 
    sysdba=false 
    hadoop0 : Hadoop Command Line Resource 
    HADOOP_CONF_DIR = /etc/hadoop/conf 
    hive0 : Hive Command Line Resource 
    HIVE_CONF_DIR = /etc/hive/conf 
    Connect to jdbc:hive2://<hiveserver2-host>:<port>/default;ssl=true; 
    sslTrustStore=<trust-store-path>; 
    trustStorePassword=<trust-store-password> 
    hadoop resource id = hadoop0 bash0 : Bash Command Line Resource 
    path = /bin/bash
Step 6: Create Oracle Tables to Run OHSH Examples
  1. Before running the examples, you must setup the required environment for OHSH. The bdc_env.sh file is pre-configured with the environment variables required by OHSH.
    $ cd /opt/oracle/bdc-test/; source bdc_env.sh
  2. Launch OHSH from the examples directory for ease of running example scripts.
     $ cd /opt/oracle/bdc-test/examples/
    $ ../../ohsh-<version>/bin/ohsh
  3. Run OHSH to create Oracle tables that will be loaded from content in HDFS.

    ohsh>@create_oracle_tables 
  4. Run the following command to show the Oracle tables.
    ohsh>%sql0 select table_name from tabs; 
    TABLE_NAME 
    -------------------------- 
    OHSH_FIVDTI 
    OHSH_FIVDTI_REVERSE 
    OHSH_FIVDTI_PART 
    OHSH_CP2HADOOP_FIVDTI 
  5. Run OHSH to populate HDFS with delimited files that will serve as content to load into Oracle.
    ohsh>@create_hdfs_data_files
    • If you run this example several times, you'll need to do some cleanup to avoid duplicate hdfs directory errors. Edit create_hdfs_data_files and add the following lines:

      # Environment variable HADOOP_USER is set from OHSH 
      #Add this command to remove the ohshdata 
      %hadoop0 fs -rm -r -f /user/oracle/ohshdata
    • If you get permission errors for writing files on HDFS, add the following lines:

      # Add this command before the %hadoop0 fs -put command. 
      %hadoop0 fs -chmod 777 /user/${HADOOP_USER}/ohshdata/hive_fivdti 
      %hadoop0 fs -put data/fivdti_part0.dat /user/${HADOOP_USER}/ohshdata/fivdti 

Running the Connector Examples

Run these examples to see different ways to load data.

Load Data from HDFS Delimited Text into Oracle Tables (Basic Scripts)

For a co-managed Oracle Database service, you can run the following examples:

ohsh>@load_jdbc 
ohsh>@load_directpath 
ohsh>@load_exttab

For Oracle Autonomous Database, you can run the following examples

ohsh>@load_jdbc 
ohsh>@load_directpath 

The following error may indicate a problem with how the environment variable HADOOP_CLASSPATH was set:

Error: oracle.hadoop.smartloader.api.SmartLoaderException: OLH error occured: oracle.hadoop.loader.OraLoaderException: Class oracle.hadoop.loader.OraLoader was expected to come from /opt/oracle/oraloader-<version>/jlib/oraloader.jar, not /opt/oracle/orahdfs-<version>/jlib/oraloader.jar.

To disable this check, set the configuration property oracle.hadoop.loader.enforceClasspath to false. Add the following property in OHSH_HOME/conf/smartloader-conf.xml:

<property> <name>oracle.hadoop.loader.enforceClasspath</name> <value>false</value> </property> 
Load with Loader Maps

For a co-managed Oracle Database service, you can run the following examples:

ohsh>@load_jdbcimapreverse.ohsh 
ohsh>@load_exttabimapreverse.ohsh

For Oracle Autonomous Database, you can run the following example:

ohsh>@load_jdbcimapreverse.ohsh 
Load Oracle Tables from Hive

For a co-managed Oracle Database service, you can run the following examples:

ohsh>@load_hivejdbc.ohsh --partitioned tables 
ohsh>@load_directpath_hive2parts.ohsh 
ohsh>@load_directpath_hiveallparts.ohsh 
ohsh>@load_hiveexttab.ohsh 
ohsh>@load_exttab_hive2parts.ohsh 
ohsh>@load_exttab_hiveallparts.ohsh 

For Oracle Autonomous Database, you can run the following examples:

ohsh>@load_hivejdbc.ohsh --partitioned tables 
ohsh>@load_directpath_hive2parts.ohsh 
ohsh>@load_directpath_hiveallparts.ohsh
Create a Hive table from an Oracle table by Using Copy to Hadoop

For Oracle Autonomous Database, you can run the following examples:

ohsh>@create_hivetable_empdpserde.ohsh 
ohsh>@createreplace_directcopy.ohsh 
ohsh>@load_directcopy.ohsh 

For Oracle Autonomous Database, you can run the following example:

ohsh> @create_hivetable_empdpserde.ohsh

If you run the create_hivetable_empdpserde.ohsh example several times, you will need to do some cleanup to avoid duplicate file errors. as follows:

Edit create_hivetable_empdpserde.ohsh, and add the following line:

#Add this command before copying the emp.dmp file 
%hadoop0 fs -rm -r -f /user/${HADOOP_USER}/ohshdata/hive_ohsh_emp_dpserde/emp.dmp 
%hadoop0 fs -put data/emp.dmp /user/${HADOOP_USER}/ohshdata/hive_ohsh_emp_dpserde