Best Practices for Building Java Spark Streaming Applications

Tips and best practices for configuring Java build projects in Data Flow.

Maven is used for the following examples to show the various configuration possibilities.

Use a consistent version of Spark across your libraries and Data Flow
The easiest way to do this is define a common variable and reuse it. In the following example, the common variable is called spark.version.
<properties>
 ...
 <spark.version>3.0.2</spark.version>
 ...
</properties>
<dependencies>
 <dependency>
     <groupId>org.apache.spark</groupId>
     <artifactId>spark-core_2.12</artifactId>
     <version>${spark.version}</version>
     <scope>provided</scope>
 </dependency>
Note

Ensure the Scala version in artifactId is suitable for the version of Spark. In the example, artifactId is set to spark-core_2.12 for Scala 2.12.
Include Spark binaries that are already part of the standard Spark bundle with the scope of provided, to avoid any code duplication.
Expanding the previous example:
<properties>
 ...
 <spark.version>3.0.2</spark.version>
 ...
</properties>
<dependencies>
 <dependency>
     <groupId>org.apache.spark</groupId>
     <artifactId>spark-core_2.12</artifactId>
     <version>${spark.version}</version>
     <scope>provided</scope>
 </dependency>
 <dependency>
     <groupId>org.apache.spark</groupId>
     <artifactId>spark-sql_2.12</artifactId>
     <version>${spark.version}</version>
     <scope>provided</scope>
 </dependency>
...
Note

More information on dependency scope can be found in the Maven documentation.
Package Spark binaries that aren't part of the standard bundle with your Spark application.
This example includes spark-sql-kafka-0-10_2.12 in artifactId.
<properties>
 ...
 <spark.version>3.0.2</spark.version>
 ...
</properties>
<dependencies>
<dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql-kafka-0-10_2.12</artifactId>
    <version>${spark.version}</version>
</dependency>
...
The Oracle Cloud Infrastructure SDK might be compiled against a different version of third-party libraries, which might result in runtime failures. To avoid such failures, package the Oracle Cloud Infrastructure SDK and third-party libraries with your Spark application and move any newer third-party libraries into a shaded namespace.
For example:
<dependencies>
    <dependency>
        <groupId>com.oracle.oci.sdk</groupId>
        <artifactId>
</artifactId>
        <optional>false</optional>
        <version>1.36.1</version>
    </dependency>
    <dependency>
        <groupId>com.google.protobuf</groupId>
        <artifactId>protobuf-java</artifactId>
        <version>3.11.1</version>
    </dependency>
</dependencies>
...
<build>
 <plugins>
 ...
<plugin>
   <groupId>org.apache.maven.plugins</groupId>
   <artifactId>maven-shade-plugin</artifactId>
   <version>3.2.4</version>
   <configuration>
       <!-- The final uber jar file name will not have a version component. -->
       <finalName>${project.artifactId}</finalName>
       <createDependencyReducedPom>false</createDependencyReducedPom>
       <transformers>
           <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
           <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"/>
       </transformers>
       <relocations>
           <relocation>
               <pattern>com.google.</pattern>
               <shadedPattern>com.shaded.google.</shadedPattern>
           </relocation>
           <relocation>
               <pattern>com.oracle.bmc.</pattern>
               <shadedPattern>com.shaded.oracle.bmc.</shadedPattern>
           </relocation>
       </relocations>
       <!-- exclude signed Manifests -->
       <filters>
           <filter>
               <artifact>*:*</artifact>
               <excludes>
                   <exclude>META-INF/*.SF</exclude>
                   <exclude>META-INF/*.DSA</exclude>
                   <exclude>META-INF/*.RSA</exclude>
               </excludes>
           </filter>
       </filters>
   </configuration>
   <executions>
       <execution>
           <phase>package</phase>
           <goals>
               <goal>shade</goal>
           </goals>
       </execution>
   </executions>
</plugin>
...

Was this article helpful?