You can use Oracle Big Data Connectors and Oracle Copy to Hadoop (a feature of Big Data
SQL) to load data from an Big Data Service cluster into an
Oracle Cloud database instance and to copy from an Oracle Cloud database instance to a Big Data Service cluster. The database can be an Oracle Autonomous
Database or a co-managed Oracle Database service, as shown in the following table:
Type of Database
Features Supported for Copying Data
Oracle Autonomous Database
You can use the following with Oracle Shell for Hadoop Loaders
(OHSH) to copy data between a Big Data Service cluster and an Autonomous Database instance.
Oracle Loader for Hadoop (OLH)
The Copy to Hadoop (CP2HADOOP) feature of Oracle Big Data SQL
Co-managed Oracle Database
You can use the following with Oracle Shell for Hadoop Loaders to
copy data between a Big Data Service cluster and a
co-managed Oracle Database instance.
Oracle SQL Connector for HDFS (OSCH)
Oracle Loader for Hadoop
The Copy to Hadoop feature of Oracle Big Data SQL
Features
Big Data connectors and features are pre-installed on your Big Data Service clusters.
The Copy to Hadoop feature of Oracle Big Data SQL is also already installed on your
cluster.
The following features are pre-installed on every node of your cluster:
Oracle Shell for Hadoop Loaders
Oracle Shell for Hadoop Loaders (OHSH) is a helper shell that provides a simple-to-use
command line interface to Oracle Loader for Hadoop, Oracle SQL Connector for HDFS, and
Copy to Hadoop.
Copy to Hadoop
Copy to Hadoop (CP2HADOOP) is a feature of Oracle Big Data SQL, for copying data from an
Oracle database to HDFS.
Oracle Loader for Hadoop
Oracle Loader for Hadoop (OLH) is a high-performance loader for loading data from a
Hadoop cluster into a table in an Oracle database.
Oracle SQL Connector for Hadoop Distributed File System (HDFS)
Oracle SQL Connector for HDFS (OSCH) enables an Oracle external table to access data
stored in HDFS files or in a table in Apache Hive. Use this connector only for loading
data into a co-managed Oracle Database service.
Note
Oracle SQL Connector for HDFS is supported only for connecting to a co-managed Oracle
Database service. It is not supported for connecting to Oracle Autonomous Database.
Oracle Instant Client for Linux
Oracle Instant Client enables development and deployment of applications that connect to
Oracle Database.
Configuring Big Data Service to Run Connector Examples 🔗
Before running the examples included with your Oracle Shell for Hadoop Loaders
installation, perform the following configuration steps. Use the oracle
operating system user.
The examples are included in the Oracle Shell for Hadoop Loaders installation on your
cluster. Unzip the file examples.zip into
/opt/oracle/bdc-test directory.
For example:
Copy
cd /opt/oracle/bdc-test
unzip /opt/oracle/ohsh-version/examples.zip -d /opt/oracle/bdc-test
Change to the /opt/oracle/bdc-test/examples directory and perform
the following steps. See examples/README.txt for additional
information.
The environment file bdc_env.sh is pre-configured with the environment
variables required to run OHSH. Run the following command to configure the environment
on your cluster node.
source /opt/oracle/bdc-test/bdc_env.sh
Log on to SQL*Plus as sysdba, create a database user which will be
used to run the Oracle Loader for Hadoop examples, and grant the required privileges, as
shown in the following examples:
sqlplus sys@<ServiceName> as sysdba
sqlplus sys@PDB1 as sysdba
create user ohsh_examples identified by <example-password>
quota unlimited on users;
alter user ohsh_examples default tablespace users;
grant create session, alter session, create table, create view,
create procedure, create type, create sequence
to ohsh_examples;
sqlplus admin@<ServiceName>
sqlplus admin@myuseradw_high
create user ohsh_examples identified by <example-password>
quota unlimited on data;
alter user ohsh_examples default tablespace data;
grant create session, alter session, create table, create view,
create procedure, create type, create sequence
to ohsh_examples;
Follow these steps if you are using a co-managed Oracle Database service only.
Connect to the co-managed Database service node using SSH and copy the
/opt/oracle/bdc-test/examples/sysdba_configure.sql script.
Sign in as the oracle operating system user.
Connect as the sysdba user for the pluggable database (PDB) and run
sysdba_configure.sql <USER_NAME>, where
<USER_NAME> is the database user you
created in Step 2: Create the OHSH_EXAMPLES User, above.
Note that the user name must be all uppercase.
Example:
@sysdba_configure.sql OHSH_EXAMPLES
The sysdba_configure.sql creates two database directory objects
required to run some examples:
OLHP_DEFAULT_DIR points to
/tmp/olhp_default
OLHP_STAGE_DIR points to
/tmp/olhp_stage
Before running the examples, verify the above directories exist at the OS level. Both directories must be owned by the oracle OS user, with read and write permissions. If these directories don't exist, you can create them using the following commands:
Copy
sudo su oracle
mkdir /tmp/olhp_default
mkdir /tmp/olhp_stage
You must configure the co-managed Database service node in order to run the examples, as
shown below. See Oracle Big Data Connectors User's Guide, section Installing and Configuring
a Hadoop Client on the Oracle Database System for more details.
Generate Oracle SQL Connector for HDFS zip file on the cluster node and copy to the
database node. Example:
cd /opt/oracle
zip -r /tmp/orahdfs-<version>.zip orahdfs-<version>/*
Unzip the Oracle SQL Connector for HDFS zip file on the database node.
Example:
Install the Hadoop client on the database node in the
/u01/misc_products/ directory.
Connect as the sysdba user for the PDB and verify that both
OSCH_BIN_PATH and OSCH_DEF_DIR database directories
exist and point to valid operating system directories. For example,
Copy
create or replace directory OSCH_BIN_PATH as
'/u01/misc_products/bdc/orahdfs-<version>/bin';
grant read,execute on directory OSCH_BIN_PATH to OHSH_EXAMPLES;
Click the HDFS service, and select the action Download Client
Configuration.
Extract the files under the HADOOP_CONF_DIR (/u01/misc_products/hadoop-conf) directory. Ensure that the hostnames and ports configured in HADOOP_CONF_DIR/core-site.xml are accessible from your co-managed Database service node (see the steps below). For example,
In this example host bdsmyhostmn0.bmbdcsxxx.bmbdcs.myvcn.com and
port 8020 must be accessible from your co-managed Database service
node.
For secure clusters:
Copy the Kerberos configuration file from the cluster node to the database node. Example:
Copy
cp krb5.conf /u01/misc_products/
Copy the Kerberos keytab file from the cluster node to the database node. Example:
Copy
cp <kerberos-user-keytab> /u01/misc_products/
Run the following commands to verify that HDFS access is working.
Copy
#Change to the Hadoop client bin directory
cd /u01/misc_products/hadoop-<version>/bin
#--config points to your HADOOP_CONF_DIR directory.
./hadoop --config /u01/misc_products/hadoop-conf fs -ls
This command should list the HDFS contents. If you get a timeout or "no route to host"
or "unknown host" errors, you will need to update your /etc/hosts file
and verify your Big Data Service Console network configuration, as follows:
Sign into the Cloud Console, click Big Data, then
Clusters, then <your_cluster>,
then Cluster Details.
Under the List of cluster nodes section, get the fully
qualified name of all your cluster nodes and all the IP addresses .
Edit your co-managed Database service configuration file
/etc/hosts, for example:
Verify that your Network Ingress Rules are configured correctly. Click
Big Data, then Clusters, then
<your_cluster>, then Network
Information, then
<your_subnet>, then
<your_security_list>.
Verify that the following ingress rules are configured.
Configure an ingress rule to allow TCP network traffic using the HDFS port which
was configured in the fs.defaultFS property (see step 3,
above).
Configure an ingress rule to allow TCP network traffic for the datanode using
the port number of the property dfs.datanode.address.
To confirm the port number, in Apache Ambari click
Services, then HDFS, and then
Configs.
In the Properties filter box, search for the
dfs.datanode.address, for example:
Source IP Protocol Source Port Range Destination Port Range
-----------------------------------------------------------------------
111.111.0.0/16 TCP All 8020
111.111.0.0/24 TCP All 50010
If no ingress rules are configured, click Add Ingress
Rule to create a new rule.
Configure the connect_id and connectiondir parameters
according to your database service (co-managed Database service or Autonomous Database)
which was configured as described in Set TNS Settings for Connecting to a Database.
Configure Oracle Shell for Hadoop Loaders resources to use Hadoop Credential Store
The previous examples will prompt for the database user password before running the
examples. You can create a Hadoop credential store to protect your database password and
avoid being prompted each time.
To enable this feature, run the following commands:
Run the ohsh -help command to make sure everything was configured correctly. You should
get something like the following example.
Copy
$ cd /opt/oracle/bdc-test/
$ source bdc_env.sh
$ cd examples/
$ ../../ohsh-<version>/bin/ohsh -h
Oracle Shell for Hadoop Loaders Release 5.1.2 - Production (Build:20211008105544)
Copyright (c) 2016, 2021, Oracle and/or its affiliates. All rights reserved.
Oracle Loader for Hadoop (5.1.2), Oracle SQL Connector for HDFS (5.1.2), The Copy to Hadoop feature of Oracle Big Data SQL (5.1.2) enabled.
usage: ohsh [-i <FILE>] [-f <FILE>] [-c] [-h]
-i <FILE> initialization script
-f <FILE> main script
-c hadoop configuration
-h help
Run the setresources script.
Copy
ohsh>@setresources
Run the following command to confirm the available resources.
Copy
ohsh>show resources
jdbc0 : Oracle JDBC Resource
user=OHSH_EXAMPLES
connectId=PDB1
sql0 : Oracle SQL*Plus Command Line Resource
user=OHSH_EXAMPLES
connectId=PDB1
sysdba=false
hadoop0 : Hadoop Command Line Resource
HADOOP_CONF_DIR = /etc/hadoop/conf
hive0 : Hive Command Line Resource
HIVE_CONF_DIR = /etc/hive/conf
Connect to jdbc:hive2://<hiveserver2-host>:<port>/default;ssl=true;
sslTrustStore=<trust-store-path>;
trustStorePassword=<trust-store-password>
hadoop resource id = hadoop0 bash0 : Bash Command Line Resource
path = /bin/bash
Before running the examples, you must setup the required environment for OHSH. The
bdc_env.sh file is pre-configured with the environment variables
required by OHSH.
$ cd /opt/oracle/bdc-test/; source bdc_env.sh
Launch OHSH from the examples directory for ease of running example scripts.
Copy
$ cd /opt/oracle/bdc-test/examples/
$ ../../ohsh-<version>/bin/ohsh
Run OHSH to create Oracle tables that will be loaded from content in HDFS.
Copy
ohsh>@create_oracle_tables
Run the following command to show the Oracle tables.
Run OHSH to populate HDFS with delimited files that will serve as content to load into Oracle.
Copy
ohsh>@create_hdfs_data_files
If you run this example several times, you'll need to do some cleanup to avoid
duplicate hdfs directory errors. Edit create_hdfs_data_files and
add the following lines:
Copy
# Environment variable HADOOP_USER is set from OHSH
#Add this command to remove the ohshdata
%hadoop0 fs -rm -r -f /user/oracle/ohshdata
If you get permission errors for writing files on HDFS, add the following lines:
Copy
# Add this command before the %hadoop0 fs -put command.
%hadoop0 fs -chmod 777 /user/${HADOOP_USER}/ohshdata/hive_fivdti
%hadoop0 fs -put data/fivdti_part0.dat /user/${HADOOP_USER}/ohshdata/fivdti
Running the Connector Examples 🔗
Run these examples to see different ways to load data.
For Oracle Autonomous Database, you can run the following examples
Copy
ohsh>@load_jdbc
ohsh>@load_directpath
The following error may indicate a problem with how the environment variable
HADOOP_CLASSPATH was set:
Error: oracle.hadoop.smartloader.api.SmartLoaderException: OLH error occured:
oracle.hadoop.loader.OraLoaderException: Class oracle.hadoop.loader.OraLoader was expected
to come from /opt/oracle/oraloader-<version>/jlib/oraloader.jar,
not /opt/oracle/orahdfs-<version>/jlib/oraloader.jar.
To disable this check, set the configuration property
oracle.hadoop.loader.enforceClasspath to false. Add the
following property in OHSH_HOME/conf/smartloader-conf.xml: