Getting Started with Data Integration
Before you create a Data Integration workspace, review the prerequisites and list of tasks.
Customer Responsibility Checklist
You must have the following resources and minimum policies in your tenancy. If you don't have the proper rights, have your administrator create them for you.
Before You Begin
Before you start setting up the Data Integration service for use, you must have:
- An Oracle Cloud Infrastructure account with administrator privileges
- Access to the Data Integration service
List of Tasks
This section summarizes the responsibilities of Data Integration customers.
Task | Description |
---|---|
Create Oracle Cloud Infrastructure resources for your Data Integration activities | In Oracle Cloud Infrastructure Identity and Access Management (IAM), create your compartments, users, and groups of users. |
You can set up virtual cloud networks (VCNs) and subnets n Oracle Cloud Infrastructure Networking for Data Integration. Only regional subnets are supported, and DNS hostnames must be used in the subnets. Depending on the location of your data sources, you might have to create other network objects such as service gateways, network security groups, and Network Address Translation (NAT) gateways. For data sources in a private network, create a VCN with at least one regional subnet. | |
In Oracle Cloud Infrastructure Identity and Access Management (IAM), create the required policies that give groups of users proper access to Data Integration resources. Data Integration must also have permission to manage the virtual networks and subnets that you set up for integration. For reference and examples, see Data Integration Policies, and also ensure that you understand the relationship between permissions and verbs. | |
When you create a workspace in Data Integration, you can enable the private network that you have set up. After creating a workspace, you can refer to Typical Data Integration User Activities as a guide. |
See also Data Security.
Creating Resources
To create resources for Data Integration activities:
Creating Policies
To control non-administrator user access to Data Integration resources and functions, you create groups in Oracle Cloud Infrastructure Identity and Access Management (IAM). Then you write IAM policies that give the groups proper access.
You can use Data Integration policy templates in the IAM Policy Builder to create a policy, or you can manually enter the policy statements in the manual editor. See Writing Policy Statements with the Policy Builder for information about how to use the Policy Builder and policy templates.
To understand the syntax used in writing a policy statement, see Overview of Policy Syntax. Ensure that you understand the relationship between permissions and verbs.
You can create most of the Data Integration policies at the tenancy level or at the compartment level. The policies listed here are examples, which you can modify to suit your access needs.
For more examples and reference, see Data Integration Policies.
After you add IAM components (for example, dynamic groups and policy statements), don't try to perform the associated tasks immediately. New IAM policies require about five to 10 minutes to take effect.
For Workspaces
This policy gives permission to a group to create Data Integration workspaces.
allow group <group-name> to manage dis-workspaces in compartment <compartment-name>
Users with the inspect
permission can only list
dis-workspaces
. Users with the manage
permission for dis-workspaces
can create and delete workspaces.
Users with the use
permission can only perform integration
activities within workspaces. View more
examples to create a policy specific to your requirements.
This policy gives permission to a group to check the status while creating a workspace.
allow group <group-name> to manage dis-work-requests in compartment <compartment-name>
This policy gives Data Integration access to list users' names in the Created by field when they create projects, data assets, and applications in the workspace.
allow service dataintegration to inspect users in tenancy
After creating workspaces, you can allow a specific group to manage a specific workspace and not any other workspace:
allow group <group-name> to manage dis-workspaces in compartment <compartment-name> where target.workspace.id = '<workspace-ocid>'
This policy gives Data Integration access to move a workspace from one compartment to another target compartment.
allow service dataintegration to inspect compartments in compartment <target-compartment-name>
This policy gives permission to a group to move Data Integration workspaces.
allow group <group-name> to manage dis-workspaces in compartment <source-compartment-name>
allow group <group-name> to manage dis-workspaces in compartment <target-compartment-name>
This policy gives permission to a group to manage tag-namespaces and tags in Data Integration workspaces.
allow group <group-name> to manage tag-namespaces in compartment <compartment-name>
To add a defined tag, you must have permission to use the tag namespace. To learn more about tagging, see Resource Tags.
These policies give Data Integration access to search within workspaces in your tenancy.
allow service dataintegration to {TENANCY_INSPECT} in tenancy
allow service dataintegration to {DIS_METADATA_INSPECT} in tenancy
While creating a workspace for which private network is enabled, to check whether the subnet has enough IP addresses to allocate, add the following policy:
allow group <group_name> to inspect instance-family in compartment <compartment_name>
To restrict the permission to a specific API call, add the following policy:
allow group <group_name> to inspect instance-family in compartment <compartment_name> where ALL {request.operation = 'ListVnicAttachments'}
allow service dataintegration to use virtual-network-family in compartment <compartment-name>
The following policy gives permission to a group to manage networking resources in the compartment.
allow group <group-name> to manage virtual-network-family in compartment <compartment-name>
Or, for non-admin users:
allow group <group-name> to use virtual-network-family in compartment <compartment-name>
allow group <group-name> to inspect instance-family in compartment <compartment-name>
You
can limit user activities within the network when you assign the
inspect
permission for VCNs and subnets within your compartment
instead of manage
. Users can then view existing VCNs and subnets
and select them when creating a workspace. View more examples to create a policy specific to your
requirements.
For Data Assets
Create these policies to allow Data Integration to access Object Storage resources, such as objects and buckets.
allow group <group-name> to use object-family in compartment <compartment-name>
allow any-user to use buckets in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
allow any-user to manage objects in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
If your Data Integration workspace and Object Storage data source are in different tenancies, then you must also create the following policies for compartments:
In the workspace tenancy:
Endorse any-user to inspect compartments in tenancy <tenancy-name> where ALL {request.principal.type = 'disworkspace'}
In the Object Storage tenancy:
Admit any-user of tenancy <tenancy-name> to inspect compartments in tenancy
Different types of policies (resource principal and on behalf of ) are required for Object Storage. Policies required also depend on whether the Object Storage instance and Data Integration instance are in the same tenancy or different tenancies, and whether you create the policies at the compartment level or tenancy level. Review more examples and this blog to identify the right policies for your needs.
Create these policies to allow Data Integration to access buckets and objects in Oracle Cloud Infrastructure Object Storage. The policies are required for staging extracted data, which need pre-authentication to complete the operations.
allow group <group-name> to use object-family in compartment <compartment-name>
allow any-user to use buckets in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
allow any-user to manage objects in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
allow any-user to manage buckets in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>', request.permission = 'PAR_MANAGE'}
Different types of policies (resource principal and on behalf of) are required for Object Storage. Policies required also depend on whether the Object Storage instance and Data Integration instance are in the same tenancy or different tenancies, and whether you create the policies at the compartment level or tenancy level. Review more examples and this blog to identify the right policies for your needs.
Create this policy if you want to use OCI vault to save sensitive information, such as user credentials.
allow any-user to read secret-bundles in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
Create this policy if you use an autonomous database as a target. Autonomous databases use Object Storage for staging data and need pre-authentication to complete operations.
allow any-user to manage buckets in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>', request.permission = 'PAR_MANAGE'}
Create this policy if you want the autonomous database credentials to be retrieved automatically while create an autonomous database data asset.
allow group <group-name> to read autonomous-database-family in compartment <compartment-name>
For Publishes
Create these policies if you want to publish tasks from Data Integration to OCI Data Flow.
allow any-user to manage dataflow-application in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
allow group <group-name> to read dataflow-application in compartment <compartment-name>
allow group <group-name> to manage dataflow-run in compartment <compartment-name>
Create this policy for non-administrators if your tasks use data sources that are hosted in private networks and you want to publish to OCI Data Flow using a private endpoint.
allow group <group-name> to inspect dataflow-private-endpoint in compartment <compartment-name>
Creating a Workspace
Before you can get started with Data Integration, you or your administrator must first create a workspace for your data integration projects.
Create a workspace after the connectivity requirements for Data Integration are satisfied. See Creating Resources.
For other networking information, see the following topics:
- Configure networking components for data assets
- Blog: Understanding VCN configuration for Data Integration
- Blog: Using Network Path Analyzer (troubleshoot, verify, and validate)
Ensure that you also have the required policies for creating workspaces, as described in Creating Policies. For example, if you're creating a workspace using virtual cloud network (VCN) resources, you must allow Data Integration access to your VCN in the compartment.
Use the workspace to create design-time artifacts such as data assets, data flows, and tasks in one or more projects or folders. For information about using projects in a workspace, see Using Projects and Folders. Use the oci data-integration workspace create command and required parameters to create a workspace:
oci data-integration workspace create [OPTIONS]
For a complete list of flags and variable options for CLI commands, see the Command Line Reference.
Run the CreateWorkspace operation to create a workspace.
Components in a Design
After creating data assets for the source and target data systems, you create the data integration processes for extracting, loading, and transforming data.
In Data Integration, to ingest and transform data, you create data loader tasks, data flows, integration tasks, and other tasks. To orchestrate a set of tasks in a sequence or in parallel, you create pipelines and pipeline tasks. You can use the following tasks as a guideline.
Task | Description |
---|---|
Create a data loader task | Create a data loader task from the Tasks section of a project or folder details page. A data loader task takes data from a source, transforms the data, then loads the data into a target. |
Create a data flow | Create a data flow from the Data Flows section of a project or folder details page. |
Add operators | In the data flow designer, build the logical flow of data from your source data assets to your target data assets. Add data operators to specify the source and target data sources. Add shaping operators such as filter and join to cleanse, transform, and enrich data. |
Add user-defined functions | Create and use custom functions. |
Apply transformations | In the Data tab of an operator in the data flow designer, apply transformations to aggregate, cleanse, and shape data. |
Assign parameters | In the Details tab of an operator in the data flow designer, assign parameters to externalize and override values. By using parameters, different configurations of your sources, targets, and transformations can be reused at design time and runtime. |
Create an integration task | After completing a data flow design, from the Tasks section of a project or folder details page, create an integration task that uses the data flow. Wrapping the data flow in an integration task lets you run the data flow, and you can choose the parameter values you want to use at runtime. |
Create other tasks | If needed, you can create other types of tasks from the Tasks section of a project or folder details page. |
Create a pipeline | Create a pipeline from the Pipelines section of a project or folder details page. In the pipeline designer, use operators to add the tasks and activities you want to orchestrate as a set of processes in a sequence or in parallel. You can also use parameters to override values at design time and runtime. |
Create a pipeline task | After completing a pipeline design, from the Tasks section of a project or folder details page, create a pipeline task that uses the pipeline. Wrapping the pipeline in a pipeline task lets you run the pipeline, and you can choose the parameter values you want to use at runtime. |