When you create a link to files in a cloud store bucket from you Oracle
Autonomous database, you create an external table that links to the files in the cloud
store.
Configure and run a data link job from the Link Cloud Object page. To
open that page:
Open the Database Actions launchpad , click Data Studio tab and select the Data Load menu. See, The Data Load Page.
Select LINK DATA and CLOUD STORE.
On the left side of the page is a navigator pane, where you choose a cloud store connection and the folders or files containing the data. On the right of the page is the data load cart, where you stage the files and folders for the data link job. You can set options for the data link job before running it. The Autonomous Database comes with predefined CPU/IO shares assigned to different consumer groups. You can set the consumer group to either low, medium or high while executing a data load job depending on your workload.
Add files from the cloud store to the data link cart, where you can edit the
details of the data link job. To add the files:
From the list at the top of the navigator pane on the left, select
the bucket with your source data.
The list shows links that were established on the Manage Cloud Storage page. If you haven't yet registered the cloud store you want to use, click the Connections button under the Data Load menu in Data Studio suite of tools and register a connection.
Drag one or more items from the file navigator on the left and drop
them into the cart on the right.
You can add files, folders, or both. A card is added to the
cart for each file or folder you drag into it. The card lists the name
of the source file or folder and a proposed name for the target
table.
If you add a folder that contains multiple files, all the
files must be of the same type, that is, CSV, TXT, etc.
When you add the folder to the cart, a prompt is displayed that asks if you want to load all the objects from the multiple source files into a single target table. Click Yes to continue or No to cancel.
When you add multiple individual files or multiple folders
to the cart, the data represented by each card will be loaded into a
separate table, but all the items in the cart will be processed as part
of the same data load job.
You can add files or folders from a different bucket, but if
you do that, you're prompted to remove all files that are already in the
cart before proceeding. To select files from a different bucket, select
the bucket from the drop-down list in the navigator pane on the left and
then add the file(s), as described above.
You can drop files or folders into the data load cart and
then navigate away from the Data Link Object page. When you return to
the page, those items remain on the page, but you may receive a message,
"Remove All Data Link Items. Changing to another Cloud storage location
requires all items to be removed from the data load job. Do you wish to
continue?" Click Yes to remove the items from the
cart. Click No to keep the items in the cart.
Then you can continue to work.
You can remove items from the cart before running the data link job:
To remove an item from the cart, select Remove on the card for the item Data Link cart menu bar at the top of the pane.
To remove all items from the cart, click Remove All in the data link cart menu bar at the top of the pane.
Set details about the target table in the Table section.
Name: The name of the target table.
Partition Column:
List Partitions and Date-based partitions are the different types of partitions available in data linking.
List partitioning is required when you specifically want to map rows to partitions based on discrete values.
To partition according to a specific column, click the Partition Column drop-down list and select the column you want to use for the partitioning.
You will have N files per partition value, all partitioned by the partition column you select.
Note
For linked files (from external tables) there is also a requirement that for each file, the list partitioning column can contain only a single distinct value across all of the rows.
If a file is list partitioned, the partitioning key can only consist of a single column of the table.
Date-based partitioning is available when you link a folder containing two or more data sources that have columns that contain date or timestamp data.
To partition according to date, click the Partition
Column drop-down list and select the
DATE or TIMESTAMP column
you want to use for the partitioning.
Validation Type: Validation examines the
source files, optional partitioning information, and report rows that do not
match the format options specified. Select None for
no validation; select Sample to perform validation
based on a sample of the data; or select Full to
perform validation based on all the data.
Use Wildcard: This check box enables use of wildcard characters in search condition to retrieve specific group of files that matches the filter criteria.
You can use a wildcard character, such as an asterisk (*) that searches, filters, and specifies groups of files that detect and add new files to the external table.
For example, if you enter file*, then file01, file02, file03, and so on are considered to match the keyword. The asterisk (*) matches zero or more characters of the possibilities, to the keyword.
Note
The wildcard support is incompatible with partitioning. The validation of source file fails if you use wildcards with partitioned data.
Settings Tab - Properties
Section
Specify options to control how the source data is interpreted, previewed,
and processed. These options vary, depending on the type of source data.
Encoding: Select a character encoding type
from the list. This option is available when the linked file is in plain
text format (CSV, TSV, or TXT). The default encoding type is UTF-8.
Text enclosure: Select the character for
enclosing text: " (double-quote character),
' (single-quote character) or
None. This option is visible only when the
selected file is in plain text format (CSV, TSV, or TXT).
Field delimiter: Select the delimiter
character used to separate columns in the source. For example, if the source
file uses semicolons to delimit the columns, select
Semicolon from this list. The default is
Comma. This option is visible only when the
selected file is in plain text format (CSV, TSV, or TXT).
Start processing data at row: Specifies the number of rows to skip when linking the source data to the target external table:
If you select the Column header row option under Source column name (see below) and if you enter a number greater than 0 in the Start processing data at row field, then that number of rows after the first row are not linked to the target.
If you deselect the Column header row option under Source column name, and if you enter a number greater than 0 in the Start processing data at row field, then that number of rows including the first row are not linked to the target.
Source column name: Select the Column header row checkbox to use the column names form the source table in the target table.
If you select this option, the first row in the file is
processed as column names. The rows in the
Mapping section, below, are filled with
those names (and with the existing data types, unless you change
them).
If you deselect this option, the first row is processed
as data. To specify column names manually, enter a name for each
target column in the Mapping section. (You will also have to
enter data types.)
Numeric column: Select the Convert
invalid data to null checkbox to convert an invalid numeric column
value into a null value.
Newlines included in data values: Select this option if there are newline characters or returns to the beginning of the current line without advancing downward in the data fields. Selecting this option will increase the time taken to process the load. If you do not select this option when loading the data, the rows with newlines in the fields will be rejected. You can view the rejected row in the Job Report panel.
Settings Tab - Mapping
Section
The settings in the Mapping section control how
data from the source files are linked to the rows of the target external table. For
each row, the data from the column listed under Source column
will be linked to the column listed under Target column.
Source column: Lists the columns from the
source file.
If the Column header row option under Properties is selected, Source column shows the names of the columns in the source file. If the Column header row option is not selected, generic names like COLUMN_1, COLUMN_2, etc., are used. This field is always read only.
You can view two source columns FILE$NAME and SYSTIMESTAMP. The FILE$NAME column enables you to locate the source file containing a particular data record. For example, you load a source file that contains a list of files. The file names in the file list refer to the department names across the organization. For instance, a finance.txt file contains data from the Finance department. In the mapping, you can use string data types to extract the department name from the output of the file name column. You can use the extracted department name to process data differently for each department.
The SYSTIMESTAMP column allows us to view the current timestamp in the database.
Note
FILE$NAME and SYSTIMESTAMP source columns are not included by default. You must check the Include check box and run the load for the target table to display these two columns.
When you are creating a livefeed, the FILE$NAME and SYSTIMESTAMP source columns appear in the Mapping table by default.
Target column: Lists the columns in the
target table.
If the Column header row option is selected, then the Target column uses the names of the columns in the source file. You can change the name of a target column by replacing the provided name with a new one. You need to make sure that the target column is not empty. Target column name must not be a duplicate of another target column. The target column name cannot have duplicate name as another target column. The target column length must not be beyond 128 bytes. 128 byte limit is a database limit.
If the Column header row option is not selected, then generic names like COLUMN_1, COLUMN_2, etc., are used. You can change the name of a target column by replacing the provided name with a new one.
Note
If you're linking multiple files from a folder in a
single data link job, only the first file will be shown in the
Mapping section. However, as long as the
column names and data types match, the data from all source files
will be linked.
Data Type: Lists the data type to use for
data in that column. The contents change depending on whether the
Get from file header option is selected.
If the Column header row option is selected, Data type shows the data types of the columns in the source file. If you want to change the data type for the target, click the name and select a different one from the list.
If the Column header row option is not selected, Data type shows all available data types. Select the data type to use for the target column from the list.
Length/Precision (Optional): For columns where the Data Type is NUMBER, enter the length/precision for the numbers in the column. Precision is the number of significant digits in a number. Precision can range from 1 to 38.
For columns where Data Type is VARCHAR2, the Auto value in Length/Precision field enables the Auto Size feature.
With the Auto-Size column Width feature, you can automatically size any column to fit the largest value in the column. Select Auto from the Length/Precision drop-down values or pick a value from the drop-down list.
Scale (Optional): For columns where the
Data Type is NUMBER, enter
the scale for the numbers in the column. Scale is the number of digits to
the right (positive) or left (negative) of the decimal point. Scale can
range from ranges from -84 to 127.
Format: If the data type in the
Data type column is DATE
or one of the TIMESTAMP types, select a format for
that type from the from the Format drop-down
list.
Preview Tab
The Load Preview menu in the Preview tab displays the source data in tabular form. The display reflects the settings you chose in the Properties section. The File menu displays source data with the column names.
If you dragged a folder containing multiple files into the data link cart and then clicked Settings for that card, the Preview pane includes a Preview Object (File) drop-down list at the top of the pane that lists all the files in the folder. Select the source file you want to preview from that list.
Table Tab
The Table tab displays what the target table is
expected to look like after the data has been linked.
SQL Tab
The SQL tab displays the SQL commands that will be run to complete this data link job.
Note
You can see the SQL code even before the table is created.
Close Button - Save and Close the
Pane
After entering all the details for the data link job, click Close at the bottom of the page. This saves the details you entered and returns you to the Link Data from Cloud Storage pane.
Once you've added data sources to the data link cart and entered details
about the data link job, you can run the job.
To run the job:
If you haven't already done so, click the Close
button in the Link Data from Cloud Storage pane to save your
settings and close the pane. If any of the settings are invalid, an error message
reports the problem. Fix the problem and click Close.
Click Start in the data link cart menu bar. To stop the data
link job, click Stop.
When the data link job completes, the Data Load Dashboard page displays the results of the job under Table and View Loads section.
Once the data link job starts, you can view the progress of the job in the Data Load dashboard.
View Details About the Data
Link Job After It Is Run 🔗
You can view the progress of the job on the Data Load dashboard.
When the data load job completes, the Data Load dashboard page displays the results of the job. At the top of the header of the table load, you can view the name of the table along with the total columns present in the table.
Click Job Report to view the total number of rows processed successfully and the count of rejected rows. You can also view the Start time. The SQL pane of the Job Report displays the equivalent SQL code of the job.
To view information about an item in the job, click the Actions icon on the Table Load.
To view a log of the load operation, click the Logging icon. You can save the log, clear it, or refresh it. Click OK to dismiss the log.
After running a data link job, you can view the table created by the data link job on the Data Load dashboard.
Fix your data load job. After your data load job, you might see errors that you want to correct, or upon inspection, realize that you wanted to name a column differently. In such cases, click the Reload option on the selected Table Load to reload cards from your recent cart and edit them as you did before your first attempt. The Reload icon reloads the source data with the fixes suggested by the tool. Click the Actions icon on the Table header, click Table and select Edit to make any changes to the data load job (i.e., change a column name).