mlm_insights.mlm_native.readers package¶
Submodules¶
mlm_insights.mlm_native.readers.csv_native_data_reader module¶
- class mlm_insights.mlm_native.readers.csv_native_data_reader.CSVNativeDataReader(file_path: List[str] | str = '', data_source: DataSource | None = None, **kwargs: Any)¶
Bases:
NativeDataReader
This Data Reader can read CSV using Native (pandas) execution engine.This reader can handle reading both from Local file system and OCI Object storage as well.Configuration¶
- file_path: Union[List[str], str]
The path or list of paths to CSV files.
- data_source Optional[DataSource]:
A DataSource object to read data from.
Sample code
For reading from file_path which is a string test_files = [ 'data/csv/2000-01-01.csv', 'data/csv/2000-01-30.csv' ] csv_reader = CsvNativeDataReader(file_path=test_files) actual_df = csv_reader.read(None) For reading using some datasource data_source_args = { 'bucket_name': bucket_name, 'namespace': namespace, 'object_prefix': object_prefix, 'file_type': 'csv', 'storage_options' : {"config": "~/.oci/config"} to authenticate the file systems } file_location = 'oci://%s@%s/%s' % (bucket_name, namespace, object_prefix) ds = SomeDataSource(file_path=file_location, **data_source_args) csv_reader = CsvNativeDataReader(data_source=ds) actual_df = csv_reader.read(None)
- classmethod create(config: Dict[str, Any]) CSVNativeDataReader ¶
Factory method to create an instance of CSVNativeDataReader from a configuration dictionary.
Parameters¶
- config (Dict[str, Any]):
A dictionary containing configuration information.
Configuration¶
- file_path: Union[List[str], str]
The path or list of paths to CSV files.
Returns¶
CSVNativeDataReader: An instance of CSVNativeDataReader.
- read(schema_provider: SchemaProvider, **kwargs: Any) DataFrame ¶
- Reads the data from local file system / OCI file system.In order to use ocifs, please refer to https://docs.oracle.com/en-us/iaas/tools/ocifs-sdk/latest/index.html
Parameters¶
- schema_providerSchemaProvider
dtypes of the column present
Other parameters¶
- storage_options :
{“config”: “~/.oci/config”} to authenticate the file systems
- kwargs:
Extra keyword arguments to forward to pandas.read_csv().
Returns¶
- pandas.DataFrame:
Result of reading the data from the local file system / OCI file system.
mlm_insights.mlm_native.readers.jsonl_native_data_reader module¶
- class mlm_insights.mlm_native.readers.jsonl_native_data_reader.JsonlNativeDataReader(file_path: List[str] | str = '', data_source: DataSource | None = None, **kwargs: Any)¶
Bases:
NativeDataReader
This Data Reader can read JSONL using Native (pandas) execution engine.This reader can handle reading both from Local file system and OCI Object storage as well.Configuration¶
- file_path: Union[List[str], str]
The path or list of paths to JSONL files.
- data_source Optional[DataSource]:
A DataSource object to read data from.
Sample code
For reading from file_path which is a string test_files = [ 'data/jsonl/2000-01-01.jsonl', 'data/jsonl/2000-01-30.jsonl' ] jsonl_reader = JsonlNativeDataReader(file_path=test_files) actual_df = jsonl_reader.read(None) For reading using some datasource data_source_args = { 'bucket_name': bucket_name, 'namespace': namespace, 'object_prefix': object_prefix, 'file_type': 'jsonl', 'storage_options' : {"config": "~/.oci/config"} to authenticate the file systems } file_location = 'oci://%s@%s/%s' % (bucket_name, namespace, object_prefix) ds = SomeDataSource(file_path=file_location, **data_source_args) jsonl_reader = JsonlNativeDataReader(data_source=ds) actual_df = jsonl_reader.read(None)
- classmethod create(config: Dict[str, Any]) JsonlNativeDataReader ¶
Factory method to create an instance of JsonlNativeDataReader from a configuration dictionary.
Parameters¶
- config (Dict[str, Any]):
A dictionary containing configuration information.
Configuration¶
- file_path: Union[List[str], str]
The path or list of paths to JSONL files.
Returns¶
JsonlNativeDataReader: An instance of JsonlNativeDataReader.
- read(schema_provider: SchemaProvider, **kwargs: Any) DataFrame ¶
- Read data from local file system / oci file systemIn order to use ocifs, please refer this https://docs.oracle.com/en-us/iaas/tools/ocifs-sdk/latest/index.html
Parameters¶
- schema_providerSchemaProvider
dtypes of the column present
Other parameters¶
- storage_options :
{“config”: “~/.oci/config”} to authenticate the file systems
- kwargs :
Extra keyword arguments to forward to pandas.read_json().
Returns¶
- pandas.DataFrame:
Result of reading the data from the local file system / OCI file system.
mlm_insights.mlm_native.readers.nested_json_native_data_reader module¶
- class mlm_insights.mlm_native.readers.nested_json_native_data_reader.NestedJsonNativeDataReader(file_path: List[str] | str = '', query: str = '', query_engine_name: str = '', data_source: DataSource | None = None, **kwargs: Any)¶
Bases:
NativeDataReader
This Data Reader can extract data from Nested JSON using Native (pandas) execution engine.This reader can handle reading both from Local file system and OCI Object storage as well.Configuration¶
- file_path: Union[List[str], str]
The path or list of paths to JSON files.
- data_source Optional[DataSource]:
A DataSource object to read data from.
- query: str
A query string to extract data from the JSON files.
- query_engine_namestr
Name of query engine to run the query. Currently, only JMESPATH is supported.
Sample code:
For reading from file_path which is a string test_files = [ 'data/json/2000-01-01.json', 'data/json/2000-01-30.json' ] query = "user defined query" query_engine_name = "JMESPATH" nested_json_reader = NestedJsonNativeDataReader(file_path=test_files, query=query, query_engine_name=query_engine_name) actual_df = nested_json_reader.read(None) For reading using some datasource data_source_args = { 'bucket_name': bucket_name, 'namespace': namespace, 'object_prefix': object_prefix, 'file_type': 'jsonl', 'storage_options' : {"config": "~/.oci/config"} to authenticate the file systems } file_location = 'oci://%s@%s/%s' % (bucket_name, namespace, object_prefix) ds = SomeDataSource(file_path=file_location, **data_source_args query = "user defined query" query_engine_name = "JMESPATH" nested_json_reader = NestedJsonNativeDataReader(data_source=ds,query=query, query_engine_name=query_engine_name) actual_df = nested_json_reader.read(None)
- classmethod create(config: Dict[str, Any]) NestedJsonNativeDataReader ¶
- Factory method to create an instance of NestedJsonNativeDataReader from a configuration dictionary.
Parameters¶
- config (Dict[str, Any]):
A dictionary containing configuration information.
Configuration¶
- file_path: Union[List[str], str]
The path or list of paths to JSON files.
- data_source Optional[DataSource]:
A DataSource object to read data from.
- query: str
A query string to extract data from the JSON files.
- query_engine_namestr
Name of query engine to run the query. Currently, only JMESPATH is supported.
Parameters¶
- configDict[str, Any]
dictionary for providing config inputs like FILE_PATH_KEY or DATA_SOURCE, QUERY and QUERY_ENGINE_NAME.
- Returns:
NestedJsonNativeDataReader: An instance of NestedJsonNativeDataReader.
- read(schema_provider: SchemaProvider, **kwargs: Any) DataFrame ¶
- Reads the data from the local file system / OCI file system.In order to use ocifs, please refer to https://docs.oracle.com/en-us/iaas/tools/ocifs-sdk/latest/index.html
Parameters¶
- schema_providerSchemaProvider
dtypes of the column present
Other parameters¶
- kwargs:
Extra keyword arguments to forward to pandas.DataFrame.
Returns¶
- pandas.DataFrame
Result of reading the data from the local file system / OCI file system.