Spark Oracle Datasource

Spark Oracle Datasource is extension of the JDBC datasource provided by Spark.

Spark Oracle Datasource is an extension of the Spark JDBC datasource. It simplifies the connection to Oracle databases from Spark. In addition to all the options provided by Spark's JDBC datasource, Spark Oracle Datasource simplifies connecting Oracle databases from Spark by providing:

An auto download wallet from Autonomous Database Serverless, which means there's no need to download the wallet and keep it in Object Storage or Vault.
It automatically distributes the wallet bundle from Object Storage to the driver and executor without any customized code fom users.
It includes JDBC driver JAR files, and so eliminates the need to download them and include them in your archive.zip file. The JDBC driver is version 21.3.0.0.

Use a Spark Oracle Datasource

You can use this data source in Data Flow in two ways.

In the Advanced Options section when creating, editing, or running an application, include the key:
```
spark.oracle.datasource.enabled
```
with the value: true. For more information, see the Create Applications section.

Use the Oracle Spark datasource format. For example in Scala:

val df = spark.read
  .format("oracle")
  .option("adbId","autonomous_database_ocid")
  .option("dbtable", "schema.tablename")
  .option("user", "username")
  .option("password", "password")
  .load()

More examples in other languages are available in the Spark Oracle Datasource Examples section.

The following three properties are available with Oracle datasource in addition to the properties provided by Spark's JDBC datasource:

Oracle Datasource Properties
Property Name	Default Setting	Description	Scope
`walletUri`		An Object Storage or HDFS-compatible URL. It contains the ZIP file of the Oracle Wallet needed for mTLS connections to an Oracle database. For more information on using the Oracle Wallet, see View TNS Names and Connection Strings for an Autonomous Database Serverless	Read/write
`connectionId`	Optional with `adbld`, `<database_name>_medium` from tnsnames.ora. Required with `walletUri` option.	The connection identifier alias from tnsnames.ora file, as part of the Oracle wallet. For more information, see the Overview of Local Naming Parameters and the Glossary in the Oracle Database Net Services Reference.	Read/write
`adbId`		The Oracle Autonomous database OCID. For more information, see the Overview of Autonomous Database Serverless.	Read/write

Note

The following limitations apply to the options:

adbId and walletUri can't be used together.
connectionId must be provided with walletUri, but is optional with adbId.
adbId isn't supported for databases with scan.
adbId isn't supported for Autonomous Database Serverless.

You can use Spark Oracle Datasource in Data Flow with Spark 3.0.2 and later versions.

To use Spark Oracle Datasource with Spark Submit, set the following option:

--conf spark.oracle.datasource.enable=true

The following databases, only, are supported with adbId:

Autonomous Database Serverless
Note

If you have this database in a VCN private subnet, use a Private Network to allowlist the FQDN of the autonomous database's private endpoint.

The following databases can be used with the walletUri option:

Autonomous Database Serverless
Autonomous Dedicated Infrastructure Database, including Exadata infrastructure.
Autonomous Transaction Processing Dedicated Infrastructure
On premises Oracle database, which can be accessed from Data Flow's network, either through fastconnect or site-to-site VPN.

Oracle Cloud Infrastructure Documentation Try Free Tier

Spark Oracle Datasource

Use a Spark Oracle Datasource 🔗

Oracle Cloud Infrastructure Documentation
Try Free Tier

Use a Spark Oracle Datasource