Migrating Spark Applications to Oracle Cloud Infrastructure Data Flow

This tutorial shows you how to migrate your existing Spark applications to Oracle Cloud Infrastructure Data Flow.

Before You Begin

To successfully perform this tutorial, you must have Set Up Your Tenancy, and be able to Accessing Data Flow.

Allowed Spark Variables

Data Flow automatically configures many Spark variables based on factors including the infrastructure you choose when running jobs. To ensure proper operation, some Spark variables can't be set or overridden when running jobs. For more information, see Supported Spark Properties in Data Flow.

Compatibility Limitations

You cannot set environment variables in Data Flow jobs. Instead, you can pass variables as command line arguments, and add them to the environment in the application.

1. Supported Ways to Access the Spark Session

Data Flow creates the Spark session before your Spark application runs. It ensures that your application takes full advantage of all the hardware you configured your run to use.
Important

Do not attempt to create a Spark session within your application. It doesn't use the hardware you provisioned for it, and other unpredictable behavior might result.

The following are supported ways of accessing your Spark session within applications:

Java
  1. Basic session creation.
    Builder builder = SparkSession.builder().appName("My App");
                    SparkSession session = builder.getOrCreate();
  2. Set configurations while getting the session.
    Builder builder = SparkSession.builder().appName("My App");
                builder.config("spark.sql.orc.impl", "hive");
                SparkSession session = builder.getOrCreate();
  3. The spark.sql settings can be changed after the session is retrieved.
    Builder builder = SparkSession.builder().appName("My App");
                SparkSession session = builder.getOrCreate();
                session.conf().set("spark.sql.crossJoin.enabled", "true");
Python
  1. Basic session creation.
    spark_builder = SparkSession.builder.appName("My App")
                    spark_session = spark_builder.getOrCreate()
  2. Set configurations while getting the session.
    spark_builder = SparkSession.builder.appName("My App")
                spark_builder.config("spark.sql.orc.impl", "hive")
                spark_session = spark_builder.getOrCreate()
  3. The spark.sql settings can be changed after the session is retrieved.
    spark_builder = SparkSession.builder.appName("My App")
                spark_session = spark_builder.getOrCreate()
                spark_session.conf.set("spark.sql.crossJoin.enabled", "true")
Scala
  1. Basic session creation.
    val builder = SparkSession.builder.appName("My App")
                    val session = builder.getOrCreate()
  2. Set configurations while getting the session.
    val builder = SparkSession.builder.appName("My App")
                        builder.config("spark.sql.orc.impl", "hive")
                        val session = builder.getOrCreate()
  3. The spark.sql settings can be changed after the session is retrieved.
    val builder = SparkSession.builder.appName("My App")
                val session = builder.getOrCreate()
                session.conf.set("spark.sql.crossJoin.enabled", "true")
SQL

No action is required. SQL manages the Spark session for you automatically.

2. Managing Java Dependencies for Apache Spark Applications in Data Flow

In Data Flow, when you run Java or Scala applications that rely on JARs not included with Spark, you must create uber or fat JARs. These JARs include the code dependencies you need. The Data Flow runtime includes several popular open source libraries that your Java or Scala applications might also use. To avoid runtime conflicts between Data Flow versions and your application versions, use a process called shading instead. You might need to recompile your Java or Scala applications with the following shading rules for your applications to run correctly in Data Flow.
Note

Shading is not needed if you are using Spark 3.2.1.
Shading with Maven

To shade your application using Maven, use the template pom.xml to build Data Flow Applications on your development machine, and as the basis for your own application libraries.

  1. Create a project directory.
  2. Create a file in the directory called pom.xml.
  3. Download the template pom.xml file. and copy the contents into your pom.xml file.
  4. Follow the README instructions in the pom.xml file on how to download Spark and build Data Flow.
  5. Add your application dependencies immediately below this line: README: Application dependencies should be added below
    Note

    Do not add them in another place, as that changes the class loading order. The Data Flow Spark runtime environment is not accurately reflected.
  6. To ensure that Spark is built and working correctly, build and run the Data Flow sample project.
  7. Integrate your code into the project.
  8. To build your application separately without the Oracle Cloud Infrastructure SDK and third-party dependencies, you must build an archive.zip file.
    1. Follow the library chapter in the service guide.
    The template pom.jar files and target directory Spark distribution included in both the compile and runtime classpath, contain every library that Data Flow uses with the same versions and the same load order. For more information on Shading with Maven, see the Apache Maven Shade Plugin.
Example Pom.xml File for Spark 3.2.1
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<artifactId>project</artifactId>
<groupId>org.example</groupId>
<version>1.0-SNAPSHOT</version>
<!-- README -->
<!--
This template pom will build Spark on your local machine Spark libraries that match Data Flow.
 
Using a web browser:
Download Spark to your project base directory - https://archive.apache.org/dist/spark/
Use this exact version to match Data Flow - spark-3.2.1-bin-hadoop3.2.tgz
 
Alternatively:
$ wget https://archive.apache.org/dist/spark/spark-3.2.1/spark-3.2.1-bin-hadoop3.2.tgz
 
 
To test and debug on a local dev machine
$ mvn -P dev clean install
OR
Select dev profile under maven->profiles
 
To build for Data Flow
$ mvn clean install
 
You may run your Spark application in two ways, either by running your application main() method
directly in your IDE, e.g. main() -> right-click -> Run, or from the command-line
using spark-submit. Remember, if you change the pom artifactId, also change the jar name in the
command below. You may also need to explicitly select the "dev" Maven profile in your favorite
IDE to avoid ClassNotFoundException when you setup the example project.
 
$ ./target/spark-3.2.1-bin-hadoop3.2/bin/spark-submit -\
-class example.Example target/project-1.0-SNAPSHOT.jar
-->
<properties>
<oci-java-sdk-version>2.12.1-SNAPSHOT</oci-java-sdk-version>
<spark-scope>provided</spark-scope>
<spark-download>${project.build.directory}/spark-3.2.1-bin-hadoop3.2/jars</spark-download>
</properties>
<dependencies>
 
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.1</version>
<scope>system</scope>
<systemPath>${basedir}/spark-3.2.1-bin-hadoop3.2.tgz</systemPath>
<classifier>bin</classifier>
<type>tgz</type>
</dependency>
 
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.30</version>
<scope>${spark-scope}</scope>
</dependency>
 
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-hdfs-connector</artifactId>
<version>3.3.1.0.3.2</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.woodstox</groupId>
<artifactId>woodstox-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
</exclusion>
<!-- this is required for com.oracle.bmc.auth -->
<!-- <exclusion>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-addons-apache</artifactId>
</exclusion>-->
<exclusion>
<groupId>io.netty</groupId>
<artifactId>netty-codec-http</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-servlet</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-security</artifactId>
</exclusion>
<exclusion>
<groupId>net.minidev</groupId>
<artifactId>json-smart</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk15on</artifactId>
</exclusion>
<exclusion>
<groupId>org.bouncycastle</groupId>
<artifactId>bcpkix-jdk15on</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-util-ajax</artifactId>
</exclusion>
</exclusions>
</dependency>
 
<!-- Data Flow Spark runtime dependency upgrades that are not transitive dependencies of Spark in Maven -->
<dependency>
<groupId>org.apache.hadoop.thirdparty</groupId>
<artifactId>hadoop-shaded-guava</artifactId>
<version>1.1.1</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-web-proxy</artifactId>
<version>3.3.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>javax.servlet-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-server-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-common</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-yarn-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
</exclusion>
<exclusion>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk15on</artifactId>
</exclusion>
<exclusion>
<groupId>org.bouncycastle</groupId>
<artifactId>bcpkix-jdk15on</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>commons-collections</groupId>
<artifactId>commons-collections</artifactId>
<version>3.2.2</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>27.0-jre</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
<version>4.4.14</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.13.1</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.13.1</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.13.1</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-yaml</artifactId>
<version>2.13.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.12</artifactId>
<version>2.13.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>1.28</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.9.3</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.9.0</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>commons-dbcp</groupId>
<artifactId>commons-dbcp</artifactId>
<version>1.2</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>commons-pool</groupId>
<artifactId>commons-pool</artifactId>
</exclusion>
</exclusions>
</dependency>
 
<!-- Spark transitive dependencies exclude/upgrade section -->
<dependency>
<groupId>org.apache.curator</groupId>
<artifactId>curator-framework</artifactId>
<version>2.13.0</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.curator</groupId>
<artifactId>curator-recipes</artifactId>
<version>2.13.0</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>com.squareup.okio</groupId>
<artifactId>okio</artifactId>
<version>2.8.0</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
<version>1.21</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>org.objenesis</groupId>
<artifactId>objenesis</artifactId>
<version>2.6</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>jline</groupId>
<artifactId>jline</artifactId>
<version>2.14.6</version>
<scope>${spark-scope}</scope>
</dependency>
 
<!-- Spark 3.2.1 -->
<!-- ********************************************************************** -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
<exclusion>
<groupId>org.objenesis</groupId>
<artifactId>objenesis</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.curator</groupId>
<artifactId>curator-framework</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.curator</groupId>
<artifactId>curator-recipes</artifactId>
</exclusion>
<exclusion>
<groupId>com.squareup.okio</groupId>
<artifactId>okio</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang2</artifactId>
</exclusion>
<exclusion>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
</exclusion>
<exclusion>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.commons</groupId>
<artifactId>commons-compress</artifactId>
</exclusion>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>jakarta.validation</groupId>
<artifactId>jakarta.validation-api</artifactId>
<version>2.0.2</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
<exclusion>
<groupId> org.apache.yetus</groupId>
<artifactId>audience-annotations</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.apache.hive</groupId>
<artifactId>hive-llap-client</artifactId>
</exclusion>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
<exclusion>
<groupId>jline</groupId>
<artifactId>jline</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.hive.shims</groupId>
<artifactId>hive-shims-0.23</artifactId>
</exclusion>
</exclusions>
</dependency>
 
<!-- ********************************************************************** -->
<!-- README
This section is not required. These dependencies are here for runtime completeness
when testing within an IDE, to exactly match all of the Data Flow classpath libraries.
-->
 
<dependency>
<groupId>org.apache.curator</groupId>
<artifactId>curator-client</artifactId>
<version>2.13.0</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib-local_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive-thriftserver_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-kubernetes_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-repl_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
<version>2.2.11</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17-16</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-stdlib</artifactId>
<version>1.4.10</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.jetbrains</groupId>
<artifactId>annotations</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-stdlib-common</artifactId>
<version>1.4.0</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
<version>2.12.2</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
<version>1.4.01</version>
<scope>${spark-scope}</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.12</artifactId>
<version>3.2.1</version>
<scope>${spark-scope}</scope>
<exclusions>
<exclusion>
<groupId>org.spark-project.spark</groupId>
<artifactId>unused</artifactId>
</exclusion>
</exclusions>
</dependency>
 
 
<!-- ********************************************************************** -->
 
<!-- ############################################## -->
<!-- README: Application dependencies should be added below -->
<!-- ############################################## -->
 
<!-- THESE ARE NOT REQUIRED, JUST AN EXAMPLE, UNCOMMENT AS NEEDED -->
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-core</artifactId>
<version>${oci-java-sdk-version}</version>
</dependency>
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-common</artifactId>
<version>${oci-java-sdk-version}</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.datatype</groupId>
<artifactId>jackson-datatype-jsr310</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-circuitbreaker</artifactId>
<version>${oci-java-sdk-version}</version>
</dependency>
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-objectstorage</artifactId>
<version>${oci-java-sdk-version}</version>
</dependency>
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-objectstorage-generated</artifactId>
<version>${oci-java-sdk-version}</version>
</dependency>
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-objectstorage-extensions</artifactId>
<version>${oci-java-sdk-version}</version>
</dependency>
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-secrets</artifactId>
<version>${oci-java-sdk-version}</version>
</dependency>
<dependency>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-vault</artifactId>
<version>${oci-java-sdk-version}</version>
</dependency>
<dependency>
<groupId>com.oracle.database.jdbc</groupId>
<artifactId>ojdbc8</artifactId>
<version>18.3.0.0</version>
</dependency>
<!-- README Add archive.zip jar dependencies here with <scope>${spark-scope}</scope> or <scope>system</scope>-->
 
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.0</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.1.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
<exclude>META-INF/versions/11/org/roaringbitmap/ArraysShim.class</exclude>
<exclude>META-INF/versions/11/org/glassfish/jersey/internal/jsr166/**</exclude>
</excludes>
</filter>
</filters>
<artifactSet>
<excludes>
<exclude>org.bouncycastle:bcpkix-jdk15on</exclude>
<exclude>org.bouncycastle:bcprov-jdk15on</exclude>
<exclude>com.google.code.findbugs:jsr305</exclude>
</excludes>
</artifactSet>
</configuration>
</plugin>
<!-- Unpack Spark compressed tgz -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.1.1</version>
<executions>
<execution>
<phase>generate-resources</phase>
<goals>
<goal>unpack-dependencies</goal>
</goals>
<configuration>
<overWriteIfNewer>true</overWriteIfNewer>
<includeTypes>tgz</includeTypes>
<includeArtifactIds>spark-core_2.12</includeArtifactIds>
<outputDirectory>${project.build.directory}</outputDirectory>
<includes>**/**</includes>
<excludes>
**/jars/guava-14.0.1.jar,
**/jars/jackson-annotations-2.12.3.jar,
**/jars/jackson-core-2.12.3.jar,
**/jars/jackson-databind-2.12.3.jar,
**/jars/jackson-dataformat-yaml-2.12.3.jar,
**/jars/jackson-module-scala_2.12-2.12.3.jar,
**/jars/mesos-1.4.0-shaded-protobuf.jar,
**/jars/snakeyaml-1.27.jar,
**/jars/spark-mesos_2.12-3.2.1.jar,
**/jars/log4j-1.2.17.jar,
**/jars/okhttp-3.12.12.jar,
**/jars/okio-1.14.0.jar,
**/jars/gson-2.2.4.jar,
**/jars/commons-dbcp-1.4.jar,
**/examples/jars/*
</excludes>
</configuration>
</execution>
</executions>
</plugin>
<!--
We need to move the correct versions of the jars that we excluded above into the Spark
jars directory so that spark-submit from the command line will run correctly.
Note: some jars like mesos-1.4.0 are removed because Data Flow does not use Mesosphere.
-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>3.1.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>copy</goal>
</goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-circuitbreaker</artifactId>
<version>1.2.0</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-core</artifactId>
<version>1.2.0</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.glassfish.jersey.media</groupId>
<artifactId>jersey-media-json-jackson</artifactId>
<version>2.27</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-core</artifactId>
<version>${oci-java-sdk-version}</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-common</artifactId>
<version>${oci-java-sdk-version}</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-circuitbreaker</artifactId>
<version>${oci-java-sdk-version}</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-objectstorage</artifactId>
<version>${oci-java-sdk-version}</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-objectstorage-generated</artifactId>
<version>${oci-java-sdk-version}</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-objectstorage-extensions</artifactId>
<version>${oci-java-sdk-version}</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-java-sdk-secrets</artifactId>
<version>${oci-java-sdk-version}</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>-->
<artifactItem>
<groupId>com.oracle.oci.sdk</groupId>
<artifactId>oci-hdfs-connector</artifactId>
<version>3.3.1.0.3.2</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.bouncycastle</groupId>
<artifactId>bcpkix-jdk15on</artifactId>
<version>1.60</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.bouncycastle</groupId>
<artifactId>bcprov-jdk15on</artifactId>
<version>1.60</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>27.0-jre</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.13.1</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.13.1</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.13.1</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-yaml</artifactId>
<version>2.13.1</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.12</artifactId>
<version>2.13.1</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>1.28</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.9.3</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.squareup.okio</groupId>
<artifactId>okio</artifactId>
<version>2.8.0</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17-16</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.glassfish.jersey.connectors</groupId>
<artifactId>jersey-apache-connector</artifactId>
<version>2.34</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>io.vavr</groupId>
<artifactId>vavr</artifactId>
<version>0.10.0</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-stdlib</artifactId>
<version>1.4.10</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.jetbrains.kotlin</groupId>
<artifactId>kotlin-stdlib-common</artifactId>
<version>1.4.0</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
<version>2.12.2</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>xml-apis</groupId>
<artifactId>xml-apis</artifactId>
<version>1.4.01</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.9.0</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.12</artifactId>
<version>3.2.1</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>commons-dbcp</groupId>
<artifactId>commons-dbcp</artifactId>
<version>1.2</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.example</groupId>
<artifactId>Test</artifactId>
<version>2.0-SNAPSHOT</version>
<type>jar</type>
<outputDirectory>${spark-download}</outputDirectory>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
<profiles>
<profile>
<id>dev</id>
<activation>
<activeByDefault>false</activeByDefault>
</activation>
<properties>
<spark-scope>compile</spark-scope>
</properties>
<build>
<plugins>
<plugin>
<artifactId>maven-antrun-plugin</artifactId>
<version>1.8</version>
<executions>
<execution>
<phase>validate</phase>
<configuration>
<tasks>
<taskdef resource="net/sf/antcontrib/antcontrib.properties" />
<if>
<available file="archive.zip"/>
<then>
<unzip src="archive.zip" dest="${project.build.directory}/archive" />
</then>
<else>
<echo>The archive.zip does not exist</echo>
</else>
</if>
</tasks>
</configuration>
<goals>
<goal>run</goal>
</goals>
</execution>
</executions>
<dependencies>
<dependency>
<groupId>ant-contrib</groupId>
<artifactId>ant-contrib</artifactId>
<version>20020829</version>
</dependency>
</dependencies>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
Example Pom.xml File for Spark 3.0.2
<?xml version="1.0" encoding="UTF-8"?>
        <project xmlns="http://maven.apache.org/POM/4.0.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
        
        <groupId>com.example</groupId>
        <artifactId>project</artifactId>
        <version>1.0-SNAPSHOT</version>
        
        <repositories>
        <repository>
        <id>project.local</id>
        <name>project</name>
        <url>file:${project.basedir}/repo</url>
        </repository>
        </repositories>
        
        <properties>
        <oci-java-sdk-version>1.25.2</oci-java-sdk-version>
        </properties>
        
        <dependencies>
        <dependency>
        <groupId>com.oracle.oci.sdk</groupId>
        <artifactId>oci-hdfs-connector</artifactId>
        <version>3.2.1.3</version>
        <scope>provided</scope>
        </dependency>
        <dependency>
        <groupId>com.oracle.oci.sdk</groupId>
        <artifactId>oci-java-sdk-core</artifactId>
        <version>${oci-java-sdk-version}</version>
        </dependency>
        <dependency>
        <groupId>com.oracle.oci.sdk</groupId>
        <artifactId>oci-java-sdk-objectstorage</artifactId>
        <version>${oci-java-sdk-version}</version>
        </dependency>
        <dependency>
        <groupId>com.oracle.oci.sdk</groupId>
        <artifactId>oci-java-sdk-secrets</artifactId>
        <version>${oci-java-sdk-version}</version>
        </dependency>
        
        <!-- Spark 3.0.2 -->
        <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>3.0.2</version>
        <scope>provided</scope>
        </dependency>
        <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>3.0.2</version>
        <scope>provided</scope>
        </dependency>
        </dependencies>
        
        <build>
        <plugins>
        <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.8.0</version>
        <configuration>
        <source>1.8</source>
        <target>1.8</target>
        </configuration>
        </plugin>
        <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.22.0</version>
        </plugin>
        <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.1.1</version>
        <executions>
        <execution>
        <phase>package</phase>
        <goals>
        <goal>shade</goal>
        </goals>
        </execution>
        </executions>
        <configuration>
        <filters>
        <filter>
        <artifact>*:*</artifact>
        <excludes>
        <exclude>META-INF/*.SF</exclude>
        <exclude>META-INF/*.DSA</exclude>
        <exclude>META-INF/*.RSA</exclude>
        </excludes>
        </filter>
        </filters>
        <relocations>
        <relocation>
        <pattern>org.apache.http</pattern>
        <shadedPattern>shaded.oracle.org.apache.http</shadedPattern>
        </relocation>
        <relocation>
        <pattern>org.apache.commons</pattern>
        <shadedPattern>shaded.oracle.org.apache.commons</shadedPattern>
        </relocation>
        <relocation>
        <pattern>com.fasterxml</pattern>
        <shadedPattern>shaded.oracle.com.fasterxml</shadedPattern>
        </relocation>
        <relocation>
        <pattern>com.google</pattern>
        <shadedPattern>shaded.oracle.com.google</shadedPattern>
        </relocation>
        <relocation>
        <pattern>javax.ws.rs</pattern>
        <shadedPattern>shaded.oracle.javax.ws.rs</shadedPattern>
        </relocation>
        <relocation>
        <pattern>org.glassfish</pattern>
        <shadedPattern>shaded.oracle.org.glassfish</shadedPattern>
        </relocation>
        <relocation>
        <pattern>org.jvnet</pattern>
        <shadedPattern>shaded.oracle.org.jvnet</shadedPattern>
        </relocation>
        <relocation>
        <pattern>javax.annotation</pattern>
        <shadedPattern>shaded.oracle.javax.annotation</shadedPattern>
        </relocation>
        <relocation>
        <pattern>javax.validation</pattern>
        <shadedPattern>shaded.oracle.javax.validation</shadedPattern>
        </relocation>
        <relocation>
        <pattern>com.oracle.bmc</pattern>
        <shadedPattern>shaded.com.oracle.bmc</shadedPattern>
        <includes>
        <include>com.oracle.bmc.**</include>
        </includes>
        <excludes>
        <exclude>com.oracle.bmc.hdfs.**</exclude>
        </excludes>
        </relocation>
        </relocations>
        <artifactSet>
        <excludes>
        <exclude>org.bouncycastle:bcpkix-jdk15on</exclude>
        <exclude>org.bouncycastle:bcprov-jdk15on</exclude>
        <exclude>com.google.code.findbugs:jsr305</exclude>
        </excludes>
        </artifactSet>
        </configuration>
        </plugin>
        </plugins>
        </build>
        </project>
Example Pom.xml File for Spark 2.4.4
<?xml version="1.0" encoding="UTF-8"?>
        <project xmlns="http://maven.apache.org/POM/4.0.0"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
        
        <groupId>com.example</groupId>
        <artifactId>project</artifactId>
        <version>1.0-SNAPSHOT</version>
        
        <repositories>
        <repository>
        <id>project.local</id>
        <name>project</name>
        <url>file:${project.basedir}/repo</url>
        </repository>
        </repositories>
        
        <properties>
        <oci-java-sdk-version>1.15.4</oci-java-sdk-version>
        </properties>
        
        <dependencies>
        <dependency>
        <groupId>com.oracle.oci.sdk</groupId>
        <artifactId>oci-hdfs-connector</artifactId>
        <version>2.9.2.6</version>
        <scope>provided</scope>
        </dependency>
        <dependency>
        <groupId>com.oracle.oci.sdk</groupId>
        <artifactId>oci-java-sdk-core</artifactId>
        <version>${oci-java-sdk-version}</version>
        </dependency>
        <dependency>
        <groupId>com.oracle.oci.sdk</groupId>
        <artifactId>oci-java-sdk-objectstorage</artifactId>
        <version>${oci-java-sdk-version}</version>
        </dependency>
        <dependency>
        <groupId>com.oracle.oci.sdk</groupId>
        <artifactId>oci-java-sdk-secrets</artifactId>
        <version>${oci-java-sdk-version}</version>
        </dependency>
        
        <!-- Spark 2.4.4 -->
        <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.12</artifactId>
        <version>2.4.4</version>
        <scope>provided</scope>
        </dependency>
        <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.12</artifactId>
        <version>2.4.4</version>
        <scope>provided</scope>
        </dependency>
        </dependencies>
        
        <build>
        <plugins>
        <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.8.0</version>
        <configuration>
        <source>1.8</source>
        <target>1.8</target>
        </configuration>
        </plugin>
        <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.22.0</version>
        </plugin>
        <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>3.1.1</version>
        <executions>
        <execution>
        <phase>package</phase>
        <goals>
        <goal>shade</goal>
        </goals>
        </execution>
        </executions>
        <configuration>
        <filters>
        <filter>
        <artifact>*:*</artifact>
        <excludes>
        <exclude>META-INF/*.SF</exclude>
        <exclude>META-INF/*.DSA</exclude>
        <exclude>META-INF/*.RSA</exclude>
        </excludes>
        </filter>
        </filters>
        <relocations>
        <relocation>
        <pattern>org.apache.http</pattern>
        <shadedPattern>shaded.oracle.org.apache.http</shadedPattern>
        </relocation>
        <relocation>
        <pattern>org.apache.commons</pattern>
        <shadedPattern>shaded.oracle.org.apache.commons</shadedPattern>
        </relocation>
        <relocation>
        <pattern>com.fasterxml</pattern>
        <shadedPattern>shaded.oracle.com.fasterxml</shadedPattern>
        </relocation>
        <relocation>
        <pattern>com.google</pattern>
        <shadedPattern>shaded.oracle.com.google</shadedPattern>
        </relocation>
        <relocation>
        <pattern>javax.ws.rs</pattern>
        <shadedPattern>shaded.oracle.javax.ws.rs</shadedPattern>
        </relocation>
        <relocation>
        <pattern>org.glassfish</pattern>
        <shadedPattern>shaded.oracle.org.glassfish</shadedPattern>
        </relocation>
        <relocation>
        <pattern>org.jvnet</pattern>
        <shadedPattern>shaded.oracle.org.jvnet</shadedPattern>
        </relocation>
        <relocation>
        <pattern>javax.annotation</pattern>
        <shadedPattern>shaded.oracle.javax.annotation</shadedPattern>
        </relocation>
        <relocation>
        <pattern>javax.validation</pattern>
        <shadedPattern>shaded.oracle.javax.validation</shadedPattern>
        </relocation>
        <relocation>
        <pattern>com.oracle.bmc</pattern>
        <shadedPattern>shaded.com.oracle.bmc</shadedPattern>
        <includes>
        <include>com.oracle.bmc.**</include>
        </includes>
        <excludes>
        <exclude>com.oracle.bmc.hdfs.**</exclude>
        </excludes>
        </relocation>
        </relocations>
        <artifactSet>
        <excludes>
        <exclude>org.bouncycastle:bcpkix-jdk15on</exclude>
        <exclude>org.bouncycastle:bcprov-jdk15on</exclude>
        <exclude>com.google.code.findbugs:jsr305</exclude>
        </excludes>
        </artifactSet>
        </configuration>
        </plugin>
        </plugins>
        </build>
        </project>
Shading with Scala Build Tool

To Shade your application using Scala Build Tool (sbt), you must add the sbt-assembly plugin to your project, and create an assemblyMergeStrategy and an assemblyShadeRules in your sbt build file.

The plugins.sbt file
In your project’s root directory, add this line to the project/plugins.sbt file. Create the file if it does not exist.
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.10")
The build.sbt file

Add the following items to your build.sbt file. While this configuration addresses the most common conflicts, you might need to add more items to the assemblyShadeRules section if you experience run-time failures.

For Spark 2.4.4:
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.4.4" % "provided",
  "org.apache.spark" %% "spark-sql" % "2.4.4" % "provided",
  "com.oracle.oci.sdk" % "oci-hdfs-connector" % "2.9.2.1" % "provided",
  "com.oracle.oci.sdk" % "oci-java-sdk-core" % "1.15.4" % "provided",
  "com.oracle.oci.sdk" % "oci-java-sdk-objectstorage" % "1.15.4" % "provided",
  "com.oracle.oci.sdk" % "oci-java-sdk-secrets" % "1.15.4",
)

assemblyMergeStrategy in assembly := {
  case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
  case "module-info.class" => MergeStrategy.last
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("org.apache.http.**" -> "shaded.oracle.org.apache.http.@1").inAll,
  ShadeRule.rename("org.apache.commons.**" -> "shaded.oracle.org.apache.commons.@1").inAll,
  ShadeRule.rename("com.fasterxml.**" -> "shaded.oracle.com.fasterxml.@1").inAll,
  ShadeRule.rename("com.google.**" -> "shaded.oracle.com.google.@1").inAll,
  ShadeRule.rename("javax.ws.rs.**" -> "shaded.oracle.javax.ws.rs.@1").inAll,
  ShadeRule.rename("org.glassfish.**" -> "shaded.oracle.org.glassfish.@1").inAll,
  ShadeRule.rename("org.jvnet.**" -> "shaded.oracle.org.jvnet.@1").inAll,
  ShadeRule.rename("javax.annotation.**" -> "shaded.oracle.javax.annotation.@1").inAll,
  ShadeRule.rename("javax.validation.**" -> "shaded.oracle.javax.validation.@1").inAll,
  ShadeRule.rename("com.oracle.bmc.hdfs.**" -> "com.oracle.bmc.hdfs.@1").inAll,
  ShadeRule.rename("com.oracle.bmc.**" -> "shaded.com.oracle.bmc.@1").inAll,
  ShadeRule.zap("org.bouncycastle").inAll,
)
For Spark 3.0.2:
libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "3.0.2" % "provided",
  "org.apache.spark" %% "spark-sql" % "3.0.2" % "provided",
  "com.oracle.oci.sdk" % "oci-hdfs-connector" % "3.2.1.3" % "provided",
  "com.oracle.oci.sdk" % "oci-java-sdk-core" % "1.25.2" % "provided",
  "com.oracle.oci.sdk" % "oci-java-sdk-objectstorage" % "1.25.2" % "provided",
  "com.oracle.oci.sdk" % "oci-java-sdk-secrets" % "1.25.2",
)

assemblyMergeStrategy in assembly := {
  case PathList("javax", "inject", xs @ _*) => MergeStrategy.last
  case "module-info.class" => MergeStrategy.last
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

assemblyShadeRules in assembly := Seq(
  ShadeRule.rename("org.apache.http.**" -> "shaded.oracle.org.apache.http.@1").inAll,
  ShadeRule.rename("org.apache.commons.**" -> "shaded.oracle.org.apache.commons.@1").inAll,
  ShadeRule.rename("com.fasterxml.**" -> "shaded.oracle.com.fasterxml.@1").inAll,
  ShadeRule.rename("com.google.**" -> "shaded.oracle.com.google.@1").inAll,
  ShadeRule.rename("javax.ws.rs.**" -> "shaded.oracle.javax.ws.rs.@1").inAll,
  ShadeRule.rename("org.glassfish.**" -> "shaded.oracle.org.glassfish.@1").inAll,
  ShadeRule.rename("org.jvnet.**" -> "shaded.oracle.org.jvnet.@1").inAll,
  ShadeRule.rename("javax.annotation.**" -> "shaded.oracle.javax.annotation.@1").inAll,
  ShadeRule.rename("javax.validation.**" -> "shaded.oracle.javax.validation.@1").inAll,
  ShadeRule.rename("com.oracle.bmc.hdfs.**" -> "com.oracle.bmc.hdfs.@1").inAll,
  ShadeRule.rename("com.oracle.bmc.**" -> "shaded.com.oracle.bmc.@1").inAll,
  ShadeRule.zap("org.bouncycastle").inAll,
)
For more information on Shading with sbt, see, the sbt-assembly plugin.

What's Next

Now you can start migrating your Spark applications to run in Data Flow.