An Oracle Cloud Infrastructure account. Trial accounts can be used
to show Data Flow.
A Service Administrator role for the Oracle Cloud services. When the service is
activated, the credentials and URL are sent to the chosen Account Administrator. The
Account Administrator creates an account for each user who needs access to the
service.
A supported browser, such as:
Microsoft Internet Explorer 11.x or later
Mozilla Firefox ESR 38 or later
Google Chrome 42 or later
Note
For the Spark UI, only use Google Chrome.
Data for processing loaded into Object Storage.
The data can be read from external data sources or cloud services. Data Flow SQL Endpoints optimizes performance
and security for data stored in Object Storage.
Note
Avoid entering confidential information when assigning descriptions, tags, or friendly
names to the cloud resources through the Oracle Cloud Infrastructure
Console, API, or CLI. It applies when creating or
editing applications in Data Flow.
Understanding SQL Endpoints
Data Flow SQL Endpoint is a service entity that uses
long-running compute clusters in your tenancy. You pick a compute shape and how many
instances you want to use. Each cluster runs until an administrator stops it. Spark runs
in the cluster. Its SQL engine is fast, integrates with Data Flow, and it supports unstructured data. You
connect using ODBC or JDBC, authenticate with IAM
credentials.
What are Data Flow SQL Endpoints 🔗
Data Flow SQL Endpoints are designed for developers, data
scientists, and advanced analysts to interactively query data directly where it lives in
the data lake. This data is relational, semi-structured, and unstructured such as logs,
sensor streams, and video streams typically stored in the object store. As the volume
and complexity of data grows, tools to explore and analyze data in the data lake in
native formats, rather than transforming or moving it, become important. Using Data Flow SQL Endpoints, you can economically process
large amounts of raw data, with cloud native security used to control access. You can
access the insights they need in a self-service way, with no need to coordinate complex
IT projects or worry about stale data. Queries in Data Flow SQL Endpoints seamlessly interoperate with Data Flow Batch for scheduled production pipelines. They
enable fast data analytic and use long-running autoscaling compute clusters that are
fixed in size and run until stopped by the administrator.
Data Flow SQL Endpoints:
Provide interactive analytics directly against the data lake.
Are built on Spark for scale-out, easy read and write of unstructured data, and
interoperability with existing Data Flow.
Uses SQL to make analytics easier.
Support major Business Intelligence (BI) tools using ODBC or JDBC connections
with IAM credentials.
Use data for processing loaded into Object Storage. The data can be read from
external data sources or cloud services.
Data Flow SQL Endpoints support all the same types of
file supported by Spark. For example, JSON, Parquet, CSV, and Avro.