Working with OCI Data Flow Tasks

An Oracle Cloud Infrastructure Data Flow task lets you schedule, run, and monitor an OCI Data Flow application from within Data Integration.

An application that's created in OCI Data Flow consists of a Spark application and version, dependencies, default parameters, and a default runtime resource specification.

Creating and running OCI Data Flow tasks in Data Integration requires relevant permissions and IAM policies to access applications in OCI Data Flow. For details, see Required Policies and Setup.

The following pages describe how you can create, edit, and delete OCI Data Flow tasks in Data Integration:

The following pages describe other management tasks that can be performed after an OCI Data Flow task is created:

Required Policies and Setup

Before you create an OCI Data Flow task, use the following task checklist to ensure that you have the required setup and information you need for using OCI Data Flow tasks in Data Integration.

Task Requirement
Obtain access to Oracle Cloud Infrastructure Data Flow

This topic assumes you have already set up what you need to use OCI Data Flow and create applications in OCI Data Flow.

Getting Started with OCI Data Flow

The OCI Data Flow task you create in Data Integration is associated to an application you create in OCI Data Flow.

Create an application in OCI Data Flow

To use an OCI Data Flow task in Data Integration, you must have already created the application in OCI Data Flow for the language you want.

See The OCI Data Flow Library.

Obtain the details of an application in OCI Data Flow

Gather the following details of the application you created in OCI Data Flow:

  • Compartment in which the OCI Data Flow application is created
  • Name of the OCI Data Flow application
  • If applicable, the arguments that invoke the main class

Create policies

To allow Data Integration to create and access applications in OCI Data Flow:

allow any-user to manage dataflow-application in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}

allow any-user to manage dataflow-run in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}

allow group <group-name> to read dataflow-application in compartment <compartment-name>

allow group <group-name> to manage dataflow-run in compartment <compartment-name>

To allow Data Integration to trigger OCI Data Flow tasks with metastore configured, create two dynamic groups, and create a rule for each dynamic group:

Create the following matching rule in <dynamic-group-name>:

ANY {resource.id = '<workspace-ocid>>'}

Create the following matching rule in <dynamic-group-name-1>:

ANY {resource.id = '<datacatalog-metastore-ocid>'}

Then add the following policies:

allow dynamic-group <dynamic-group-name> to manage data-catalog-metastores in compartment <compartment-name>

allow dynamic-group <dynamic-group-name> to use data-catalog-metastores in compartment <compartment-name>

allow dynamic-group <dynamic-group-name-1> to read object-family in compartment <compartment-name>

Note

  • The policy statements provided in this topic are examples only. Ensure that you write policies that meet your own requirements.

  • Cross-tenancy policies are required if your resources (such as Object Storage objects and buckets) and your Data Integration workspace are on different tenancies. See Policy Examples and the Policies blog to identify the right policies for your needs.

  • After you add IAM components (for example, dynamic groups and policy statements), don't try to perform the associated tasks immediately. New IAM policies require about five to 10 minutes to take effect.