Big Data Service provides enterprise-grade Hadoop as a service, with end-to-end security, high performance, and ease of management and upgradeability.
Big Data Service is an Oracle Cloud Infrastructure service designed for a diverse set of big data use cases and workloads. From short-lived clusters used to tackle specific tasks to long-lived clusters that manage large data lakes, Big Data Service scales to meet an organization's requirements at a low cost and with the highest levels of security.
Note
The data at rest in Block Volumes used by the Big Data Service service is encrypted by default.
Big Data Service includes:
An Hadoop stack that includes an installation of Oracle Distribution including Apache Hadoop (ODH). ODH includes Apache Ambari, Apache Hadoop, Apache HBase, Apache Hive, Apache Spark, and other services for working with and securing big data.
Oracle Cloud Infrastructure features and resources, including identity
management, networking, compute, storage, and monitoring.
A REST API for creating and managing clusters.
The ability to create clusters of any size, based on native Oracle Cloud
Infrastructure shapes. For example, you can create small, short-lived
clusters in flexible virtual environments, very large, long-running clusters on
dedicated hardware, or any combination between.
Optional secure, high availability (HA) clusters.
Oracle Cloud SQL integration, for analyzing data across Apache Hadoop, Apache
Kafka, NoSQL, and object stores using Oracle SQL query language.
Full access to customize what is deployed on your Big Data Service clusters.
Big Data Service releases patches that are visible in the OCI Console. These patches must be applied to keep your Big Data Service clusters up to date and supported. See Patching in Big Data Service for more details on the Big Data Service release patch.
About Oracle Distribution Including Apache Hadoop (ODH)
ODH is built from the ground up, natively integrated into Oracle's data platform. ODH is fully managed, with the same Hadoop components you know and build on today. ODH is available as versions ODH 2.x and ODH 1.x.
Apache Hive supports functions for data masking which may include weak algorithms. For strong encryption algorithm custom functions can be written. For more information see Apache Hive UDF Reference at: hive/languagemanual+udf.
Big Data Service releases software feature updates and patches in a quarterly cadence. The software feature updates and patches can include one or more of ODH (Oracle Distribution for Hadoop) updates including component version updates and bug fixes, CVE (Common Vulnerabilities and Exposures) fixes, OS (Operating System) updates, OS upgrades and OS bug fixes.
For the latest releases, see Big Data Service
release notes.
Big Data Service users are supported if their Big Data Service software version is either the latest Big Data Service release (N), or one version older than the latest Big Data Service release (N-1) or two versions older than the latest Big Data Service release (N-2).
The following table lists the Big Data Service release and patch versions for each release.
Big Data Service Release
ODH Version
JDK Version
OS Version
3.0.29
ODH 2.0.10.22
JDK 1.8.0_411
OS 1.29.0
3.0.28
ODH 2.0.9.41
ODH 1.1.13.21
JDK 1.8.0_411
OS 1.28.0
3.0.27
ODH 2.0.8.45
ODH 1.1.12.16
ODH 0.9.10.6
JDK 1.8.0_411
OS 1.27.0
3.0.26
ODH 2.0.7.11
ODH 1.1.11.7
ODH 0.9.9.7
JDK 1.8.0_381
OS 1.26.0
3.0.25
ODH 2.0.6.5
ODH 1.1.10.4
ODH 0.9.8.3
JDK 1.8.0_381
OS 1.25.0
ODH 2.x Based on Apache Hadoop 3.3.3 🔗
The following table lists the components included in ODH and their versions.
Component
Version
Apache Ambari
2.7.5
Apache Flink
1.15.2
Apache Flume
1.10.0
Apache Hadoop (HDFS, YARN, MR)
3.3.3
Apache HBase
2.4.13
Apache Hive
3.1.3
Apache Hue
4.10.0
Apache JupyterHub
2.1.1
Apache Kafka
3.2.0
Apache Livy
0.7.1
Apache Oozie
5.2.1
Apache Parquet MR
1.10
Apache Ranger and InfrSolr
2.3.0 and 0.1.0
Apache Spark
3.2.1
Apache Sqoop
1.4.7
Apache Tez
0.10.2
Apache Zookeeper
3.7.1
Kerberos
1.1-15
ODH Utilities
1.0
Schema Registry
1.0.0
Trino
389
Additional value added service
ORAAH
included
ODH 1.x Based on Apache Hadoop 3.1 🔗
The following table lists the components included in ODH 1.x and their versions.
Component
Version
Apache Ambari
2.7.5
Apache Flink
1.15.2
Apache Flume
1.10.0
Apache Hadoop (HDFS, YARN, MR)
3.1.2
Apache HBase
2.2.6
Apache Hive
3.1.2
Apache Hue
4.10.0
Apache JupyterHub
2.1.1
Apache Kafka
3.2.0
Apache Livy
0.7.1
Apache Oozie
5.2.0
Apache Parquet MR
1.10
Apache Ranger and InfrSolr
2.1.0 and 0.1.0
Apache Spark
3.0.2
Apache Sqoop
1.4.7
Apache Tez
0.10.0
Apache Zookeeper
3.5.9
Kerberos
1.1-15
ODH Utilities
1.0
Schema Registry
1.0.0
Trino
360
Additional value added service
ORAAH
included
Accessing Big Data Service 🔗
You access Big Data Service using the Console, OCI CLI, REST APIs, or SDKs.
The OCI Console is an easy-to-use, browser-based interface. To access the Console, you must use a supported browser.
The OCI CLI provides both quick access and full functionality without the need for programming. Use the Cloud Shell environment to run your CLIs.
OCI provides SDKs that interact with Big Data without the need to create a framework.
Resource Identifiers 🔗
Big Data Service resources, like most types of resources in Oracle Cloud Infrastructure, have a unique, Oracle-assigned identifier called an Oracle Cloud ID (OCID).
For information about the OCID format and other ways to identify your resources, see Resource Identifiers.
Regions and Availability Domains 🔗
Regions and availability domains indicate the physical and logical organization of your Big Data Service resources. A region is a localized geographic area, and an availability domain is one or more data centers located within a region.
When you sign up for Oracle Cloud Infrastructure (OCI), a set of service limits is configured for your tenancy. The service limit is the quota or allowance set on a resource. These limits might be increased for you automatically based on your OCI resource usage and account standing. See Service Limits.
Default Service Limits 🔗
Among the limits set on your tenancy are limits on the number of Big Data Service cluster nodes you can create. More specifically, you're restricted to a certain number of nodes of a certain shape.
The following table shows the default limits to various cluster shapes. These are your limits if you didn't make other arrangements when you bought your subscription and if you haven't already asked for an increase.
In practice, you increase the number of nodes, or instances, in a cluster. ("Nodes" and "instances" mean the same thing in this context. OCI services usually use the term "instance," but Big Data Service follows the Hadoop convention of using the term "node.")
However, the limits are usually expressed as a number of Oracle Compute Units (OCPUs). Each type of Big Data Service node shape has a set number of OCPUs. The number after the decimal in the node shape name indicates the number of OCPUs in a single node of that shape. For example, a VM.Standard2.1 node has one OCPU, a VM.Standard2.4 node has four OCPUs, and a BM.DenseIO2.52 node has 52 OCPUs.
For example, if your subscription uses monthly universal credits, the default limit for node shape VM.Standard2.4 is 48 OCPUs, which equals 12 nodes. The calculation is as follows: 48 OCPUs service limit divided by 4 OCPUs per node equals 12 nodes.
Finding Tenancy Limits 🔗
Note
You must have permission to view limits and usage. See "To view your tenancy's limits and usage" under Service Limits.
Big Data Service is integrated with OCI Search. Search lets you find resources within a tenancy and important information about clusters and configuration objects, such as API keys, metastore configurations, lake configurations.
Examples of search queries:
Example 1: Search for all Big Data Service resources
Copy
query bigdataservice resources
Example 2: Search for all active Big Data Service clusters
Copy
query bigdataservice resources where lifecycleState = 'ACTIVE'
Big Data Service is fully integrated with OCI Search and supports specific resource types.
Certain actions performed on Big Data Service clusters emit events.
You can define rules that trigger a specific action when an event occurs. For example, you might define a rule that sends a notification to administrators when someone deletes a resource. See Overview of Events and Getting Started with Events.
The following table lists Big Data Service
event types.