Before You Begin With Data Flow SQL Endpoints

To use Data Flow SQL Endpoints, you must have:

  • An Oracle Cloud Infrastructure account. Trial accounts can be used to show Data Flow.
  • A Service Administrator role for the Oracle Cloud services. When the service is activated, the credentials and URL are sent to the chosen Account Administrator. The Account Administrator creates an account for each user who needs access to the service.
  • A supported browser, such as:
    • Microsoft Internet Explorer 11.x or later

    • Mozilla Firefox ESR 38 or later

    • Google Chrome 42 or later

    Note

    For the Spark UI, only use Google Chrome.
  • Data for processing loaded into Object Storage. The data can be read from external data sources or cloud services. Data Flow SQL Endpoints optimizes performance and security for data stored in Object Storage.

Note

Avoid entering confidential information when assigning descriptions, tags, or friendly names to the cloud resources through the Oracle Cloud Infrastructure Console, API, or CLI. It applies when creating or editing applications in Data Flow.

Understanding SQL Endpoints

Data Flow SQL Endpoint is a service entity that uses long-running compute clusters in your tenancy. You pick a compute shape and how many instances you want to use. Each cluster runs until an administrator stops it. Spark runs in the cluster. Its SQL engine is fast, integrates with Data Flow, and it supports unstructured data. You connect using ODBC or JDBC, authenticate with IAM credentials.

What are Data Flow SQL Endpoints

Data Flow SQL Endpoints are designed for developers, data scientists, and advanced analysts to interactively query data directly where it lives in the data lake. This data is relational, semi-structured, and unstructured such as logs, sensor streams, and video streams typically stored in the object store. As the volume and complexity of data grows, tools to explore and analyze data in the data lake in native formats, rather than transforming or moving it, become important. Using Data Flow SQL Endpoints, you can economically process large amounts of raw data, with cloud native security used to control access. You can access the insights they need in a self-service way, with no need to coordinate complex IT projects or worry about stale data. Queries in Data Flow SQL Endpoints seamlessly interoperate with Data Flow Batch for scheduled production pipelines. They enable fast data analytic and use long-running autoscaling compute clusters that are fixed in size and run until stopped by the administrator.

Data Flow SQL Endpoints:
  • Provide interactive analytics directly against the data lake.
  • Are built on Spark for scale-out, easy read and write of unstructured data, and interoperability with existing Data Flow.
  • Uses SQL to make analytics easier.
  • Support major Business Intelligence (BI) tools using ODBC or JDBC connections with IAM credentials.
  • Use data for processing loaded into Object Storage. The data can be read from external data sources or cloud services.

Data Flow SQL Endpoints support all the same types of file supported by Spark. For example, JSON, Parquet, CSV, and Avro.