Getting Started with Data Integration

Before you create a Data Integration workspace, review the prerequisites and list of tasks that you, the customer, are responsible for.

Customer Responsibility Checklist

You must have the following resources and minimum policies in the tenancy. If you don't have the proper rights, have the administrator create them for you.

Before You Begin

Before you start setting up the Data Integration service for use, you must have:

  • An Oracle Cloud Infrastructure account with administrator privileges
  • Access to the Data Integration service

List of Customer Tasks

This section summarizes the responsibilities of Data Integration customers before setting up and using Data Integration for the first time.

TaskDescription

Create Oracle Cloud Infrastructure resources for your Data Integration activities

In the service Oracle Cloud Infrastructure Identity and Access Management (IAM) with Identity Domains, create the compartments, users, and groups of users.

Configure networking components for your data sources

You can set up virtual cloud networks (VCNs) and subnets n Oracle Cloud Infrastructure Networking for Data Integration. Only regional subnets are supported, and DNS hostnames must be used in the subnets. Depending on the location of the data sources that you're using, you might have to create other network objects such as service gateways, network security groups, and Network Address Translation (NAT) gateways.

For data sources in a private network, create a VCN with at least one regional subnet.

Create policies to access and use Data Integration

In the service Oracle Cloud Infrastructure Identity and Access Management (IAM) with Identity Domains, create the required policies that give groups of users proper access to Data Integration resources.

Data Integration must also have permission to manage the virtual networks and subnets that you set up for integration.

For reference and examples, see Data Integration Policies, and also ensure that you understand the relationship between Permissions and Verbs.

Create a workspace

When you create a workspace in Data Integration, you can enable the private network that you have set up.

After creating a workspace, see Typical Data Integration User Activities as a guide.

See also Data Security.

Shared Responsibilities Checklist

Learn how control plane and data plane management tasks for Data Integration are shared between Oracle and you, the customer.

Generally speaking, the control plane is responsible for provisioning OCI resources and managing metadata operations to get, create, update, and delete Data Integration workspaces. The data plane is responsible for design time and runtime operations related to data assets, data flows, pipelines, tasks, and applications in Data Integration.

Task Who Description
Workspace resources provisioning Oracle and Customer

Oracle is responsible for provisioning Oracle Cloud Infrastructure resources for Data Integration workspaces, including compute instances and their connectivity to a subnet (if provided) through a secondary VNIC.

You, the customer, are responsible for:

  • Setting up the infrastructure resources beforehand, such as creating a compartment and networking resources.
  • Creating the Data Integration workspaces that you need by specifying the appropriate configuration characteristics.

For the list of customer responsibilities to set up the Data Integration service before first use, see Customer Responsibility Checklist.

Backup and recovery of workspaces and applications Oracle and Customer

Oracle backs up content on a continuous basis to perform disaster recovery of Data Integration service resources metadata and the operation of the service only. Such backups include customer workspace backups, but the backups aren't made available to customers.

You, the customer, are responsible for making backups of the application data, by copying the applications to the same workspace, another workspace, or another compartment. This is especially important for cross-region disaster recovery.

Service patching and upgrading Oracle Oracle is responsible for patching and upgrading the Data Integration service and its agent components.
Scaling Oracle

Oracle is responsible for scaling the control and data planes.

You, the customer, can request scaling the OCI resources in the data plane for agent computation.

Health monitoring Oracle and Customer

Oracle is responsible for monitoring the health of workspace resources and for ensuring their availability.

You, the customer, are responsible for monitoring the health and performance of tasks and applications at all levels, including the availability of dependent resources that are referenced in the data plane during task runs.

Application security Oracle and Customer

Oracle ensures that data stored in OCI is encrypted and ensures that connections to Data Integration require SSL encryption.

You, the customer, are responsible for the security of applications at all levels. This responsibility includes access to workspace resources, network access to those resources, and access to dependent data.

Auditing Oracle and Customer

Oracle is responsible for logging REST API calls that are made to workspace resources and for making those logs available to you for auditing purposes.

You, the customer, are responsible for configuring access to audit logs in the audit log service, and using the logs to audit usage and monitor activity within the tenancy.

Alerts and notifications Oracle and Customer

Oracle provides service events and notifications.

You, the customer, are responsible to configure alerts and notifications for service events and for monitoring alerts that might be of interest.

Creating Resources

To create resources for Data Integration activities:

  1. Create a compartment in the tenancy for Data Integration activities.

    For more information, see Managing Compartments.

  2. If the data sources are in a private network, create a VCN with at least one subnet in the compartment.
    Note

    The VCN and subnet you create here are the ones you select when you create a workspace. The subnet must be regional, spanning all availability domains.

    If you don't see the subnet listed, go back and check that it was created as a regional subnet.

    For more information, see VCNs and Subnets.

  3. Create a group for users in charge of workspaces, and then add users to the group.

    Take note of the group name. You create policies for the group in the next section. For more information, see Managing Groups.

Creating Policies

To control non-administrator user access to Data Integration resources and functions, you create groups in Oracle Cloud Infrastructure Identity and Access Management (IAM) with Identity Domains. Then you write IAM policies that give the groups proper access.

You can use Data Integration policy templates in the IAM Policy Builder to create a policy, or you can manually enter the policy statements in the manual editor. See Writing Policy Statements with the Policy Builder for information about how to use the Policy Builder and policy templates.

To understand the syntax used in writing a policy statement, see Policy Syntax. Ensure that you understand the relationship between Permissions and Verbs.

You can create most of the Data Integration policies at the tenancy level or at the compartment level. The policies listed here are examples, which you can modify to suit access needs.

For more examples and reference, see Data Integration Policies.

Note

After you add IAM components (for example, dynamic groups and policy statements), don't try to perform the associated tasks immediately. New IAM policies require about five to 10 minutes to take effect.

For Workspaces

For Data Assets

For Publishes

Creating a Workspace

Before you can get started with Data Integration, you or the administrator must first create a workspace for the data integration projects.

Create a workspace after the connectivity requirements for Data Integration are satisfied. See Creating Resources.

For other networking information, see the following topics:

Ensure that you also have the required policies for creating workspaces, as described in Creating Policies. For example, if you're creating a workspace that uses virtual cloud network (VCN) resources, you must create policies to allow Data Integration access to the VCN in the compartment.

Components in a Design

After creating data assets for the source and target data systems, you create the data integration processes for extracting, loading, and transforming data.

In Data Integration, to ingest and transform data, you create data loader tasks, data flows, integration tasks, and other tasks. To orchestrate a set of tasks in a sequence or in parallel, you create pipelines and pipeline tasks. You can use the following tasks as a guideline.

TaskDescription
Create a data loader taskCreate a data loader task from the Tasks section of a project or folder details page. A data loader task takes data from a source, transforms the data, then loads the data into a target.
Create a data flowCreate a data flow from the Data Flows section of a project or folder details page.
Add operatorsIn the data flow designer, build the logical flow of data from source data assets to target data assets. Add data operators to specify the source and target data sources. Add shaping operators such as filter and join to cleanse, transform, and enrich data.
Add user-defined functionsCreate and use custom functions.
Apply transformationsIn the Data tab of an operator in the data flow designer, apply transformations to aggregate, cleanse, and shape data.
Assign parametersIn the Details tab of an operator in the data flow designer, assign parameters to externalize and override values. By using parameters, different configurations of sources, targets, and transformations can be reused at design time and runtime.
Create an integration taskAfter completing a data flow design, from the Tasks section of a project or folder details page, create an integration task that uses the data flow. Wrapping the data flow in an integration task lets you run the data flow, and you can choose the parameter values you want to use at runtime.
Create other tasksIf needed, you can create other types of tasks from the Tasks section of a project or folder details page.
Create a pipelineCreate a pipeline from the Pipelines section of a project or folder details page. In the pipeline designer, use operators to add the tasks and activities you want to orchestrate as a set of processes in a sequence or in parallel. You can also use parameters to override values at design time and runtime.
Create a pipeline taskAfter completing a pipeline design, from the Tasks section of a project or folder details page, create a pipeline task that uses the pipeline. Wrapping the pipeline in a pipeline task lets you run the pipeline, and you can choose the parameter values you want to use at runtime.