Creating a SQL Data Flow Application

Follow these steps to create a SQL application in Data Flow.

  • Upload your Spark-submit files to an Oracle Cloud Infrastructure Object Storage. See Set Up Object Store for details.
    1. Open the navigation menu, and click Analytics and AI. Under Data Lake click Data Flow.
    2. In the left-side menu, click Applications.
    3. Under List scope, select the compartment that you want to create the application in.
    4. On the Applications page, click Create application.
    5. In the Create application panel, enter a name for the application and an optional description that can help you search for it.
    6. Under Resource configuration, provide the following values. To help calculate the number of resources that you need, see Sizing the Data Flow Application.
      1. Select the Spark version.
      2. (Optional) Select a pool.
      3. For Driver shape, select the type of cluster node to use to host the Spark driver.
      4. (Optional) If you selected a flexible shape for the driver, customize the number of OCPUs and the amount of memory.
      5. For Executor shape, select the type of cluster node to use to host each Spark executor.
      6. (Optional) If you selected a flexible shape for the executor, customize the number of OCPUs and the amount of memory.
      7. (Optional) To enable use of Spark dynamic allocation (autoscaling), select Enable autoscaling.
      8. Enter the Number of executors you need. If you selected to use autoscaling, enter a minimum and maximum number of executors.
    7. Under Application configuration, provide the following values.
      1. (Optional) If the application is for Spark streaming, select Spark Streaming
      1. Note

        You must have followed the steps in Getting Started with Spark Streaming for your streaming application to work.
      2. Don't select Use Spark-Submit Options
      3. Select SQL from the Language options.
      4. Under Select a file, Enter specify the File file URL to the application. There are two in one of the following ways to do this:
        • Select the file from the Object Storage file name list. Click Change compartment if the bucket is in a different compartment.
        • Select Enter the file URL manually and enter the file name and the path to it using this format:
           oci://<bucket_name>@<objectstore_namespace>/<file_name>
      5. (Optional) Enter any parameters.
        • Enter the name and value of each parameter.
        • To add another parameter, click +Another parameter.
      6. (Optional) If you have an archive.zip file, upload the file to Oracle Cloud Infrastructure Object Storage and then populate Archive URI with the path to it. There are two ways to do this:
        • Select the file from the Object Storage file name list. Click Change compartment if the bucket is in a different compartment.
        • Click Enter the file path manually and enter the file name and the path to it using this format:
           oci://<bucket_name>@<namespace_name>/<file_name>
      7. Under Application log location, specify where you want to ingest Oracle Cloud Infrastructure Logging in one of the following ways:
        • Select the dataflow-logs bucket from the Object Storage file name list. Click Change compartment if the bucket is in a different compartment.
        • Select Enter the bucket path manually and enter the bucket path to it using this format:
           oci://dataflow-logs@<namespace_name>
      8. (Optional) Select the metastore from the list. If the metastore is in a different compartment, click Change compartment. The Default managed table location is automatically populated based on the metastore.
    8. (Optional) To add tags to the application, select a tag namespace (for defined tags) and populate then specify a tag key and value. Add more tags as needed. For more information about tagging, see Overview of Tagging.
    9. Add advanced configuration options.
      1. Click Show advanced options.
      2. (Optional) Select Use resource principal auth to enable faster starting or if you expect the Run to last more than 24 hours.
      3. (Optional) Click Enable Spark Oracle data source to use Spark Oracle Datasource.
      4. Select a Delta Lake version. The value you choose is reflected in the Spark configuration properties Key/Value pair. See Data Flow and Delta Lake for information on Delta Lake.
      5. In the Logs section, select the Logs groups and the application logs for Oracle Cloud Infrastructure Logging. You can change compartment if the logs groups are in a different compartment.
      6. Enter the key of the Spark configuration property and a value.
        • If you're using Spark streaming, include a key of spark.sql.streaming.graceful.shutdown.timeout with a value of no more than 30 minutes (in milliseconds).
        • If you're using Spark Oracle Datasource, include a key of spark.oracle.datasource.enabled with a value of true.
      7. Click + Another property to add another configuration property.
      8. (Optional) Override the default value for the warehouse bucket by populating Warehouse Bucket bucket URI in the following format:
        oci://<warehouse-name>@<tenancy>
      9. Choose the network access.
        1. If you're attaching a private endpoint to Data Flow, click Secure access to private subnet. Select the private endpoint from the resulting list.
          Note

          You can't use an IP address to connect to the private endpoint, you must use the FQDN.
        2. If you're not using a private endpoint, click Internet access (No subnet).
      10. For Max run duration in minutes, enter a value between 60 (1 hour) and 10080 (7 days). If you don't enter a value, the submitted run continues until it succeeds, fails, is canceled, or reaches its default maximum duration (24 hours).
    10. Click Create to create the application, or click Save as stack to create it later.
      To change the values for language, name, and file URL in the future, see Editing an Application. You can't change Language if SQL is selected.
  • Use the create command and required parameters to create an application:

    oci data-flow application create [OPTIONS]
    For a complete list of flags and variable options for CLI commands, see the CLI Command Reference.
  • Run the CreateApplication operation to create an application.