Data Entities (in OCI Object Storage bucket)
Available data entities are listed on the schema (bucket) details page.
Select a data entity to view the entity details. Under Format options, select a File type. Then specify format options and click Get attributes to fetch the data entity's attributes. The Attributes and Data tables are empty if you do not fetch the attributes.
File format options:
-
For CSV and JSON files, select the Compression type of the file. If you do not know the compression algorithm that is used to compress the file, use Auto (Default). Specify also the Encoding to use to parse the attributes.
-
For Parquet, Avro, and Excel files, the Auto (Default) Compression type cannot be changed.
-
For CSV files, other format options you can select are:
- If the first row in the file is a header row, select Yes for Has header.
- If the values in the data rows span multiple lines, select Yes for Multi-line.
- Specify the Escape character that escapes other characters found in data values. For example:
\
- Select the Delimiter character that separates data fields. For example: COLON (
:
), COMMA (,
), PIPE (|
), SEMICOLON (;
), or TAB (\t
) - If a column delimiter is included at the end of a data row in the file, select Yes for Trailing delimiter.
- Specify the Quote character that treats other characters as literal characters. For example:
"
-
For JSON files:
-
Select Use custom schema to paste or upload a custom sample schema that's used to infer the entity shape. When this checkbox is selected, schema drift is no longer applicable in the source entity.
-
If you select Upload, drop a custom schema file in the box provided, or click Select a file to select the schema file to upload.
-
If you select Paste in schema, copy the schema text file content and paste it in the box provided.
-
After loading the custom schema file, click Get Attributes to view the attributes of the schema. If you add or remove attributes after the schema is added, click Get Attributes to get an updated list.
-
-
For Excel files:
- By default, Data Integration treats the first row in a file as a header row. If the first row in your file is not a header row, select No for Has header.
- For Select entity by, choose the criteria as Sheet name, Sheet index, or Table name. Then enter a Value for the worksheet name, worksheet index, or table name. Sheet index is zero-based.
- For Sheet name or Sheet index, enter the area of the file to use as the Data range for selection. If you don't enter a data range value, the default is the data range A1, which corresponds to the entire sheet. If the file has a header row, enter a value that starts from the header row, for example,
A1:K56
.
After the data entity's attributes have been successfully retrieved, click Data to list the data rows. In the Data table, click an attribute header to view the hierarchical data and attribute profile. See Hierarchical Data Types for more information.
A data and attribute profile is not supported on these attribute data types:
- BLOB
- RAW
- BINARY
- BINARY_DOUBLE
- BINARY_FLOAT
- CLOB
- NCLOB
- SDO_GEOMETRY
- XMLTYPE
- XMLFORMAT
- COMPLEX
- VARBINARY