Reading data from OCI Autonomous Data Warehouse(ADW)¶
Overview¶
Insights can read data from OCI ADW tables using user-provided SQL queries. Insights provides two Data Readers to do this:
ADWNativeDataReader : Read data from ADW tables using Native Execution Engine
ADWDaskDataReader: Read data from ADW tables using Dask Execution Engine
The general workflow for generating metrics when using the ADW reader is:
The OCI ADW instance contains tables to read data from.
An Appropriate authentication method is enabled. Currently ADW Readers only support mTLS.
Wallet files are made available to ADW Readers (either locally or using OCI Vault).
Use the Insights Config API or the Insights Builder API to configure ADW Reader.
Run the Builder API to generate the Insights Profile/Test Results.
Prerequisites and Installation¶
This section describes the prerequisites, installation and setup of ADW Reader
Prerequisites
OCI Autonomous Data Warehouse instance
Wallet files are either made available locally or using OCI Vault
If using OCI Vault, appropriate IAM policies are configured to allow custom application to read wallet files from Vault
Installation
See here for detailed instructions on installing the library
ADW Reader Authentication¶
ADW Readers connect to an OCI ADW instance using mTLS connection. This requires the use of wallet files which contain the required authentication information. Wallet files can be downloaded from OCI ADW and need to be made available to Insights library in one of the following ways:
Local file system.
OCI Vault.
Using Local file system¶
Insights ADW Reader can read wallet files from a local directory. The steps are:
Download the wallet files from OCI ADW and save them in a local directory in an unzipped format.
When using Insights Config API, add the wallet location, database username and database password in the reader configuration as follows:
"reader": { "type": "ADWNativeDataReader", "params": { "auth": { "wallet_location": "/Users/your_name/Downloads/Wallet_folder", "db_username":"<username>", "db_password": "<password>" }, } }
wallet_location: Set the path of the directory that contains the wallet files.
db_username: Set the username for the ADW instance.
db_password: Set the password for the ADW instance.
Note
We recommend using OCI Vault for production scenarios since using the local file system requires the username and password to be set in the configuration file
Using OCI Vault¶
Insights ADW Reader can read wallet files from OCI Vault. The steps are as below:
Upload each wallet file to OCI Vault. This creates a secret for each file
Write the appropriate policy to allow the principal (user/resource) to read secrets from OCI Vault
While using Insights Config API, add OCI Vault details in the reader config as shown below:
"reader": { "type": "ADWNativeDataReader", "params": { "auth": { "vault_ocid": "<vault_ocid>", "vault_tenancy_ocid": "<tanancy_ocid>" }, } }
vault_ocid: The OCID of the vault that contains the secret.
vault_tenancy_ocid: The OCID of the tenancy where vault resides
By default, The ADW Reader expects the following wallet files to be present in OCI Vault in base 64 encoded format:
[
"cwallet.sso",
"ewallet.pem",
"ewallet.p12",
"keystore.jks",
"ojdbc.properties",
"tnsnames.ora",
"truststore.jks",
"sqlnet.ora"
]
We recommend storing the username as
db_username
and password asdb_password
as secret names in OCI Vault for production scenarios for better security. However, for development use cases one can explicitly specify the username and password in the monitor configuration.
6. If the secret names in OCI Vault are not same as default file names , provide the names explicitly in the configuration. As shown below, user has uploaded the cwallet.sso file as custom-cwallet.sso in OCI Vault. In this case, the ADW Reader is configured to read the custom wallet file names from OCI Vault.
"reader": {
"type": "ADWNativeDataReader",
"params": {
"auth": {
"vault_ocid": "<vault_ocid>",
"vault_tenancy_ocid": "<tenancy_ocid>",
"vault_secret_names": {
"db_username": "custom_db_username",
"db_password": "custom_db_password",
"cwallet.sso": "custom-cwallet.sso",
"ewallet.pem": "custom-ewallet.pem",
"ewallet.p12": "custom-ewallet.p12",
"keystore.jks": "custom-keystore.jks",
"ojdbc.properties": "custom-ojdbc.properties",
"tnsnames.ora": "custom-tnsnames.ora",
"truststore.jks": "custom-truststore.jks",
"sqlnet.ora": "custom-sqlnet.ora"
}
}
}
}
Read data using ADW readers¶
ADW readers read data from OCI cloud hosted Autonomous Data Warehouse using SQL query as mandatory input, which in return, output the data into the dataframe on which you can run the data analytics specifying the metrics, test results
Read data using ADW Native Reader¶
Using Builder API¶
Begin by loading the required libraries and modules:
from mlm_insights.builder.insights_builder import InsightsBuilder
Set up the environment variable to enable the thick mode for reading the data from OCI ADW (if it isn’t already set during installation).
import os
os.environ['INSTANT_CLIENT_LIBRARY_PATH'] = '/Users/your_name/Downloads/instantclient_19_16'
Create
ADWNativeDataReader
instance using Vault Authentication. For other authentication options See ADW Reader Authentication.
from mlm_insights.mlm_native.readers.adw_native_data_reader import ADWNativeDataReader
params = {
"auth": {
"vault_ocid": "<vault_ocid>",
"vault_tenancy_ocid": "<tanancy_ocid>"
},
'sql_query': "select emp.employee_id as en ,emp.first_name as FN, emp.last_name as LN, dept.department_name as DN, emp.salary as salary, emp.hire_date as hire_date from EMPLOYEES emp , DEPARTMENTS dept where emp.department_id = dept.department_id",
'chunksize': 10000
}
adw_data_reader = ADWNativeDataReader(**params)
Create
Builder
component
run_result = InsightsBuilder().
with_input_schema(get_input_schema()).
with_metrics(metrics=get_metrics()).
with_reader(reader=adw_data_reader).
build().run()
The resultant profile can be extracted from the run result object.
profile = run_result.profile
Using Config API¶
Begin by loading the required libraries and modules:
from mlm_insights.config_reader.insights_config_reader import InsightsConfigReader
Set up the environment variable to enable the thick mode for reading the data from OCI ADW (if it isn’t already set during installation).
import os
os.environ['INSTANT_CLIENT_LIBRARY_PATH'] = '/Users/your_name/Downloads/instantclient_19_16'
Create the Insights builder using config and reference profile (optional)
run_result = InsightsConfigReader(config_location=<location of monitor config path>)
.get_builder()
.build()
.run()
The resultant profile can be extracted from the run result object.
profile = run_result.profile
A sample Json Configuration for ADW Native Data reader
sample_config.json
{
"input_schema": {
"salary": {
"data_type": "INTEGER",
"variable_type": "CONTINUOUS",
"column_type": "INPUT"
},
"fn": {
"data_type": "STRING",
"variable_type": "NOMINAL",
"column_type": "INPUT"
}
},
"dataset_metrics": [
{
"type": "RowCount"
}
],
"feature_metrics": {
"salary": [
{
"type": "Count"
},
{
"type": "Quartiles"
},
{
"type": "FrequencyDistribution"
},
{
"type": "ProbabilityDistribution"
},
{
"type": "DistinctCount"
},
{
"type": "Skewness"
},
{
"type": "Variance"
},
{
"type": "TypeMetric"
},
{
"type": "KolmogorovSmirnov"
}
],
"fn": [
{
"type": "Count"
},
{
"type": "TopKFrequentElements"
},
{
"type": "TypeMetric"
},
{
"type": "ChiSquare"
}
]
},
"reader": {
"type": "ADWNativeDataReader",
"params": {
"auth": {
"wallet_location": "",
"db_username":"",
"db_password": ""
},
"sql_query": "select emp.employee_id as en ,emp.first_name as FN, emp.last_name as LN, dept.department_name as DN, emp.salary as salary, emp.hire_date as hire_date from EMPLOYEES emp , DEPARTMENTS dept where emp.department_id = dept.department_id",
"chunksize": 10000
}
}
}
The user API parameters of ADWNativeDataReader
are defined in the following table
Parameter |
Description |
Examples |
---|---|---|
|
|
query example
|
|
|
bind variables example
|
|
|
|
|
|
|
Read data using ADW Dask Reader¶
Using Builder API¶
Begin by loading the required libraries and modules:
from mlm_insights.builder.insights_builder import InsightsBuilder
Set up the environment variable to enable the thick mode for reading the data from OCI ADW (if it isn’t already set during installation).
import os
os.environ['INSTANT_CLIENT_LIBRARY_PATH'] = '/Users/your_name/Downloads/instantclient_19_16'
Create
ADWDaskDataReader
instance using Vault Authentication. For other authentication options See ADW Reader Authentication.
from mlm_insights.mlm_dask.readers.adw_dask_data_reader import ADWDaskDataReader
params = {
"auth": {
"vault_ocid": "<vault_ocid>",
"vault_tenancy_ocid": "<tanancy_ocid>"
},
'sql_query': "select emp.employee_id as en ,emp.first_name as FN, emp.last_name as LN, dept.department_name as DN, emp.salary as salary, emp.hire_date as hire_date from EMPLOYEES emp , DEPARTMENTS dept where emp.department_id = dept.department_id",
'index_col': "en",
'partition_size': 10240
}
adw_data_reader = ADWDaskDataReader(**params)
Create
Builder
component
run_result = InsightsBuilder().
with_input_schema(get_input_schema()).
with_metrics(metrics=get_metrics()).
with_reader(reader=adw_data_reader).
build().run()
The resultant profile can be extracted from the run result object.
profile = run_result.profile
Using Config API¶
Begin by loading the required libraries and modules:
from mlm_insights.config_reader.insights_config_reader import InsightsConfigReader
Set up the environment variable to enable the thick mode for reading the data from OCI ADW (if it isn’t already set during installation)
import os
os.environ['INSTANT_CLIENT_LIBRARY_PATH'] = '/Users/your_name/Downloads/instantclient_19_16'
Create the Insights builder using config and reference profile (optional)
run_result = InsightsConfigReader(config_location=<location of monitor config path>)
.get_builder()
.build()
.run()
The resultant profile can be extracted from the run result object.
profile = run_result.profile
A sample Json Configuration for ADW Native Data reader
sample_config.json
{
"input_schema": {
"salary": {
"data_type": "INTEGER",
"variable_type": "CONTINUOUS",
"column_type": "INPUT"
},
"fn": {
"data_type": "STRING",
"variable_type": "NOMINAL",
"column_type": "INPUT"
}
},
"dataset_metrics": [
{
"type": "RowCount"
}
],
"feature_metrics": {
"salary": [
{
"type": "Count"
},
{
"type": "Quartiles"
},
{
"type": "FrequencyDistribution"
},
{
"type": "ProbabilityDistribution"
},
{
"type": "DistinctCount"
},
{
"type": "Skewness"
},
{
"type": "Variance"
},
{
"type": "TypeMetric"
},
{
"type": "KolmogorovSmirnov"
}
],
"fn": [
{
"type": "Count"
},
{
"type": "TopKFrequentElements"
},
{
"type": "TypeMetric"
},
{
"type": "ChiSquare"
}
]
},
"engine_detail": {
"engine_name": "dask"
},
"reader": {
"type": "ADWDaskDataReader",
"params": {
"auth": {
"wallet_location": "",
"db_username":"",
"db_password": ""
},
"sql_query": "select emp.employee_id as en ,emp.first_name as FN, emp.last_name as LN, dept.department_name as DN, emp.salary as salary, emp.hire_date as hire_date from EMPLOYEES emp , DEPARTMENTS dept where emp.department_id = dept.department_id",
"index_col": "en",
"partition_size": 10240
}
}
}
The user API parameters of ADWDaskDataReader
are defined in the following table
Parameter |
Description |
Examples |
---|---|---|
|
|
query example
|
|
|
bind variables example
|
|
|
|
|
|
index column example
|
|
|
number of partition example
|
|
|
partition size example
|
Note
If the
partition_size
andno_of_partitions
are not provided, the number of partitions are calculated using
the default value of partition_size
which is 256 MiB
.
If the
partition_size
andno_of_partitions
are both providedno_of_partitions
is honoured.