Overview

Big Data Service provides enterprise-grade Hadoop as a service, with end-to-end security, high performance, and ease of management and upgradeability.

Big Data Service is an Oracle Cloud Infrastructure service designed for a diverse set of big data use cases and workloads. From short-lived clusters used to tackle specific tasks to long-lived clusters that manage large data lakes, Big Data Service scales to meet an organization's requirements at a low cost and with the highest levels of security.

Note

The data at rest in Block Volumes used by the Big Data Service service is encrypted by default.

Big Data Service includes:

  • A choice of Hadoop technology stacks. You can choose to create a cluster based on either of the following:
    • An Hadoop stack that includes an installation of Oracle Distribution including Apache Hadoop (ODH). ODH includes Apache Ambari, Apache Hadoop, Apache HBase, Apache Hive, Apache Spark, and other services for working with and securing big data.

      For a detailed list of what's in ODH, see About Oracle Distribution Including Apache Hadoop (ODH).

    • An Hadoop stack that includes complete installation of the Cloudera Distribution including Apache Hadoop (CDH). CDH includes Cloudera Manager, Apache Flume, Apache Hadoop, Apache HBase, Apache Hive, Apache Hue, Apache Kafka, Apache Pig, Apache Sentry, Apache Solr, Apache Spark, and other services for working with and securing big data.

      The current version of Big Data Service includes CDH 6.3.3. See CDH 6.3.3 Packaging in the "Cloudera Enterprise 6.x Release Notes" for a complete list of the included components.

  • Oracle Cloud Infrastructure features and resources, including identity management, networking, compute, storage, and monitoring.
  • A REST API for creating and managing clusters.
  • bda-oss-admin CLI for managing storage providers.
  • odcp CLI for copying and moving data.
    Note

    ODCP is only available in clusters that use Cloudera Distribution including Hadoop.
  • The ability to create clusters of any size, based on native Oracle Cloud Infrastructure shapes. For example, you can create small, short-lived clusters in flexible virtual environments, very large, long-running clusters on dedicated hardware, or any combination between.
  • Optional secure, high availablity (HA) clusters.
  • Oracle Cloud SQL integration, for analyzing data across Apache Hadoop, Apache Kafka, NoSQL, and object stores using Oracle SQL query language.
  • Full access to customize what is deployed on your Big Data Service clusters.

About Oracle Distribution Including Apache Hadoop (ODH)

ODH is built from the ground up, natively integrated into Oracle's data platform. ODH is fully managed, with the same Hadoop components you know and build on today. ODH is available as versions ODH 2.0, ODH 1.x, and ODH 0.9.

ODH 2.x Based on Apache Hadoop 3.3.3

The table below lists the components included in ODH and their versions.

Component Version
Apache Ambari 2.7.5
Apache Flume 1.10.0
Apache Hadoop (HDFS, YARN, MR) 3.3.3
Apache HBase 2.2.6
Apache Hive 3.1.3
Apache Kafka 3.2.0
Apache Livy 0.7.1
Apache Oozie 5.2.0
Apache JupyterHub 2.1.1
Apache Ranger and InfrSolr 2.10.0 and 0.1.0
Apache Spark 3.2.1
Apache Sqoop 1.4.7
Apache Tez 0.10.0
Apache Zookeeper 3.5.9
Delta Lake1 1.2.1
Hue 4.10.0
Trino 389
Additional value added services
Data Studio included
Cloud SQL included
Data Catalog Metastore included

1 With ODH 2.0, Big Data Service also supports Delta Lake 1.2.1 as part of Big Data Service Apache Spark service. Delta Lake offers an ACID able storage layer over cloud object stores for the Big Data Service Apache Spark service.

ODH 1.x Based on Apache Hadoop 3.1

The table below lists the components included in ODH and their versions.

Component Version
Apache Ambari 2.7.5
Apache Flink 1.15.2
Apache Flume 1.10.0
Apache Hadoop (HDFS, YARN, MR) 3.1.2
Apache HBase 2.2.6
Apache Hive 3.1.2
Apache Kafka 3.2.0
Apache Livy 0.7.1
Apache Oozie 5.2.0
Apache JupyterHub 2.1.1
Apache Ranger and InfrSolr 2.1.0 and 0.1.0
Apache Spark 3.0.2
Apache Sqoop 1.4.7
Apache Tez 0.10.0
Apache Zookeeper 3.5.9
Hue 4.10.0
Trino 360
Additional value added services
Data Studio included
Cloud SQL included
Data Catalog Metastore included
ODH 0.9 Based on Apache Hadoop 3.0

The table below lists the components included in ODH 0.9 and their versions.

Component Version
Apache Ambari 2.7.5
Apache Avro 1.8.2
Apache Flume 1.9.0
Apache Hadoop (HDFS, YARN, MR) 3.0.0
Apache HBase 2.1.4
Apache Hive 2.1.1
Apache Jupyterhub 2.1.1
Apache Kafka 2.2.1
Apache Livy 0.7.1
Apache Oozie 5.1.0
Apache Parquet or Apache Parquet MR 1.10.1 or 1.9.0
Parquet Format 2.4.0
Apache Ranger and InfrSolr 2.1.0 and 0.1.0
Apache Spark 2.4.4
Apache Sqoop 1.4.7
Apache Tez 0.10.0
Apache Zookeeper 3.4.14
Hue 4.10.0
Trino 360*
Additional value added services
Data Studio included
Cloud SQL included
Data Catalog Metastore included
Note

Apache Hive supports functions for data masking which may include weak algorithms. For strong encryption algorithm custom functions can be written. For more information see Apache Hive UDF Reference at: hive/languagemanual+udf.

See Big Data Service Big Data Service Versions for details of components included in each version of ODH.

Accessing Big Data Service

You access Big Data Service using the Console, OCI CLI, REST APIs, or SDKs.

  • The OCI Console is an easy-to-use, browser-based interface. To access the Console, you must use a supported browser.
  • The OCI CLI provides both quick access and full functionality without the need for programming. Use the Cloud Shell environment to run your CLIs.
  • The REST API documentation provide the most functionality, but require programming expertise. API Reference and Endpoints provide endpoint details and links to the available API reference documents including the Big Data Service API.
  • OCI provides SDKs that interact with Big Data without the need to create a framework.

Resource Identifiers

Big Data Service resources, like most types of resources in Oracle Cloud Infrastructure, have a unique, Oracle-assigned identifier called an Oracle Cloud ID (OCID).

For information about the OCID format and other ways to identify your resources, see Resource Identifiers.

Service Limits

When you sign up for Oracle Cloud Infrastructure (OCI), a set of service limits is configured for your tenancy. The service limit is the quota or allowance set on a resource. These limits might be increased for you automatically based on your OCI resource usage and account standing. See Service Limits.

Default Service Limits

Among the limits set on your tenancy are limits on the number of Big Data Service cluster nodes you can create. More specifically, you're restricted to a certain number of nodes of a certain shape.

The following table shows the default limits to various cluster shapes. These are your limits if you didn't make other arrangements when you bought your subscription and if you haven't already asked for an increase.

Resource Monthly universal credits Pay-as-you-go
VM.Standard2.1 12 instances (12 OCPUs) 8 instances (8 OCPUs)
VM.Standard2.2 12 instances (24 OCPUs) 8 instances (16 OCPUs)
VM.Standard2.4 12 instances (48 OCPUs) 8 instances (32 OCPUs)
VM.Standard2.8 8 instances (64 OCPUs) Contact us
VM. Standard2.16 8 instances (128 OCPUs) Contact us
VM.Standard2.24 8 instances (192 OCPUs) Contact us

VM.DenseIO2.8

VM.DenseIO2.16

VM.DenseIO2.24

VM.DenseIO.E4

BM.HPC2.36

BM.DenseIO2.52

BM.DenseIO.E4

BM.Optimized3

BM.Standard2.52

BM.Standard3.64

BM.Standard.E4

Contact us Contact us

Units Shown

In practice, you increase the number of nodes, or instances, in a cluster. ("Nodes" and "instances" mean the same thing in this context. OCI services usually use the term "instance," but Big Data Service follows the Hadoop convention of using the term "node.")

However, the limits are usually expressed as a number of Oracle Compute Units (OCPUs). Each type of Big Data Service node shape has a set number of OCPUs. The number after the decimal in the node shape name indicates the number of OCPUs in a single node of that shape. For example, a VM.Standard2.1 node has one OCPU, a VM.Standard2.4 node has four OCPUs, and a BM.DenseIO2.52 node has 52 OCPUs.

For example, if your subscription uses monthly universal credits, the default limit for node shape VM.Standard2.4 is 48 OCPUs, which equals 12 nodes. The calculation is as follows: 48 OCPUs service limit divided by 4 OCPUs per node equals 12 nodes.

Service Quotas

Big Data Service administrators can set quota policies to enforce restrictions on users by limiting the resources that they can create.

For information about how Oracle Cloud Infrastructure handles quotas, see Overview of Compartment Quotas.

Use the following information to create quotas:

Service name:big-data

Quotas:
Quota Name Scope Description
vm-standard-2-1-ocpu-count Regional Number of VM.Standard2.1 OCPUs
vm-standard-2-2-ocpu-count Regional Number of VM.Standard2.2 OCPUs
vm-standard-2-4-ocpu-count Regional Number of VM.Standard2.4 OCPUs
vm-standard-2-8-ocpu-count Regional Number of VM.Standard2.8 OCPUs
vm-standard-2-16-ocpu-count Regional Number of VM.Standard2.16 OCPUs
vm-standard-2-24-ocpu-count Regional Number of VM.Standard2.24 OCPUs
vm-dense-io-2-8-ocpu-count Regional Number of VM.DenseIO2.8 OCPUs
vm-dense-io-2-16-ocpu-count Regional Number of VM.DenseIO2.16 OCPUs
vm-dense-io-2-24-ocpu-count Regional Number of VM.DenseIO2.24 OCPUs
bm-hpc2-36-ocpu-count Regional Number of BM.HPC2.36 OCPUs
bm-dense-io-2-52-ocpu-count Regional Number of BM.DenseIO2.52 OCPUs
bm-standard-2-52-ocpu-count Regional Number of BM.Standard2.52 OCPUs

Big Data Service quota policy examples:

  • Limit the number of VM.Standard2.4 OCPUs that users can allocate to services they create in the mycompartment compartment to 40.

    Set big-data quota vm-standard-2-4-ocpu-count to 40in Compartment mycompartment

  • Limit the number of BM.DenseIO2.52 OCPUs that users can allocate to services they create in the testcompartment compartment to 20.

    Set big-data quota bm-dense-io-2-52-ocpu-count to 20 in Compartment testcompartment

  • Don't allow users to create any VM.Standard2.4 OCPUs in the examplecompart compartment.

    Zero big-data quota vm-standard-2-4-ocpu-count in Compartment examplecompart

Integrated OCI Services

Big Data Service is integrated with various OCI services and features.

Service Events

Certain actions performed on Big Data Service clusters emit events.

You can define rules that trigger a specific action when an event occurs. For example, you might define a rule that sends a notification to administrators when someone deletes a resource. See Overview of Events and Getting Started with Events.

The following table lists Big Data Service event types.

Friendly Name Event Type
Create Instance Begin com.oraclecloud.bds.cp.createinstance.begin
Create Instance End com.oraclecloud.bds.cp.createinstance.end
Terminate Instance Begin com.oraclecloud.bds.cp.terminateinstance.begin
Terminate Instance End com.oraclecloud.bds.cp.terminateinstance.end
Add Worker Node Begin com.oraclecloud.bds.cp.addnode.begin
Add Worker Node End com.oraclecloud.bds.cp.addnode.end
Add Block Storage Begin com.oraclecloud.bds.cp.addblockstorage.begin
Add Block Storage End com.oraclecloud.bds.cp.addblockstorage.end
Configure Cloud SQL Begin com.oraclecloud.bds.cp.addcloudsql.begin
Configure Cloud SQL End com.oraclecloud.bds.cp.addcloudsql.end
Disable Cloud SQL Begin com.oraclecloud.bds.cp.removecloudsql.begin
Disable Cloud SQL End com.oraclecloud.bds.cp.removecloudsql.end
Disasble ODH Service Certificate Begin com.oraclecloud.bds.cp.disableodhservicecertificate.begin
Disable ODH Service Certificate End com.oraclecloud.bds.cp.disableodhservicecertificate.end
Enable ODH Service Certificate Begin com.oraclecloud.bds.cp.enableodhservicecertificate.begin
Enable ODH Service Certificate End com.oraclecloud.bds.cp.enableodhservicecertificate.end
Renew ODH Service Certificate Begin com.oraclecloud.bds.cp.renewodhservicecertificate.begin
Renew ODH Service Certificate End com.oraclecloud.bds.cp.renewodhservicecertificate.end
Asynchronous Work Requests

The following Big Data Service operations create work requests. You can view these work requests in a Big Data Service cluster's detail page.

Big Data Service API Work Request Operation Work Request Status Options

CreateBdsInstance

UpdateBdsInstance

DeleteBdsInstance

AddBlockStorage

AddWorkerNodes

AddCloudSql

RemoveCloudSql

ChangeBdsInstanceCompartment

ChangeShape

RestartNode

AddAutoScalingConfiguration

UpdateAutoScalingConfiguration,

RemoveAutoScalingConfiguration

CREATE_BDS

UPDATE_BDS

DELETE_BDS

ADD_BLOCK_STORAGE

ADD_WORKER_NODES

ADD_CLOUD_SQL

REMOVE_CLOUD_SQL

CHANGE_COMPARTMENT_FOR_BDS

CHANGE_SHAPE

RESTART_NODE

UPDATE_INFRA

UPDATE_INFRA

UPDATE_INFRA

ACCEPTED

IN_PROGRESS

FAILED

SUCCEEDED

CANCELING

CANCELED

References:

Additional Resources

Take a Getting Started Workshop to learn Big Data Service

If you're new to Big Data Service and want to get up and running quickly, try one of the Using Cloudera Distribution including Hadoop with Big Data Service workshops. (There's one for a highly-available (HA) cluster and one for a non-HA cluster.) A series of step-by-step labs guide you through the process of setting up a simple environment and creating a small cluster.

Get started with Big Data (HA Cluster)
Learn about Big Data Service. Set up the Oracle Cloud Infrastructure environment and create a highly available (HA) and secure cluster with Cloud SQL support.
Get started with Big Data (Non-HA Cluster)
Learn about Big Data Service. Set up the Oracle Cloud Infrastructure environment and create a non-HA cluster with Cloud SQL support.
Use a load balancer to access services on Big Data (HA Cluster)
Learn about Big Data Service. Set up the Oracle Cloud Infrastructure environment and create a highly available (HA) and secure cluster with Cloud SQL support.
Use a load balancer to access services on Big Data (Non-HA Cluster)
Create a load balancer to be used as a front end for securely accessing Cloudera Manager, Hue, and Oracle Data Studio on your non-highly-available (non-HA) Big Data Service cluster.
Connecting Oracle DataSource for Apache Hadoop and Big Data to Autonomous Data Warehouse
Learn how to connect Oracle DataSource for Apache Hadoop(OD4H) on Big Data Service to Autonomous Data Warehouse (ADW).

Learn Hadoop

Watch the videos on this Oracle Learning playlist to learn about Apache Hadoop and the Hadoop Ecosystem.