Getting Started with Spark-Submit and CLI
A tutorial to help you get started running a Spark application in Data Flow using spark-submit while using the
execute
string at the CLI.
Follow the existing tutorial for Getting Started with Oracle Cloud Infrastructure Data Flow, but use CLI to run spark-submit commands.
Before You Begin
Complete some prerequisites and set up authentication before you can use spark-submit commands in Data Flow with CLI.
- Complete the prerequisites to use spark-submit with CLI.
- Set up authentication to use spark-submit with CLI.
Prerequisites to Use Spark-submit with CLI
Complete these prerequisites to use spark-submit with CLI.
Authentication to Use Spark-submit with CLI
Set up authenticate to use spark-submit with CLI.
$ oci session authenticate
- select the intended region from the provided list of regions.
- Please switch to newly opened browser window to log in!
- Completed browser authentication process!
- Enter the name of the profile you would like to create: <profile_name> ex. oci-cli
- Config written to: ~/.oci/config
- Try out your newly created session credentials with the following example command:
$ oci iam region list --config-file ~/.oci/config --profile <profile_name> --auth security_token
~/.oci/config file
. Use the profile name to
run the tutorial.1. Create the Java Application Using Spark-Submit and CLI
Use Spark-submit and the CLI to complete tutorials.
2: Machine Learning with PySpark
Use Spark-submit and CLI to carry out machine learning with PySpark,
What's Next
Use Spark-submit and the CLI in other situations.
You can use spark-submit from the CLI to create and run Java, Python, or SQL applications with Data Flow, and explore the results. Data Flow handles all details of deployment, tear down, log management, security, and UI access. With Data Flow, you focus on developing Spark applications without worrying about the infrastructure.