Big Data Service provides enterprise-grade Hadoop as a service, with end-to-end security, high performance, and ease of management and upgradeability.
Big Data Service is an Oracle Cloud Infrastructure service designed for a diverse set of big data use cases and workloads. From short-lived clusters used to tackle specific tasks to long-lived clusters that manage large data lakes, Big Data Service scales to meet an organization's requirements at a low cost and with the highest levels of security.
Note
The data at rest in Block Volumes used by the Big Data Service service is encrypted by default.
Big Data Service includes:
An Hadoop stack that includes an installation of Oracle Distribution including Apache Hadoop (ODH). ODH includes Apache Ambari, Apache Hadoop, Apache HBase, Apache Hive, Apache Spark, and other services for working with and securing big data.
Oracle Cloud Infrastructure features and resources, including identity
management, networking, compute, storage, and monitoring.
A REST API for creating and managing clusters.
The ability to create clusters of any size, based on native Oracle Cloud
Infrastructure shapes. For example, you can create small, short-lived
clusters in flexible virtual environments, very large, long-running clusters on
dedicated hardware, or any combination between.
Optional secure, high availability (HA) clusters.
Oracle Cloud SQL integration, for analyzing data across Apache Hadoop, Apache
Kafka, NoSQL, and object stores using Oracle SQL query language.
Full access to customize what is deployed on your Big Data Service clusters.
Big Data Service releases patches that are visible in the OCI Console. These patches must be applied to keep your Big Data Service clusters up to date and supported. See Patching in Big Data Service for more details on the Big Data Service release patch.
About Oracle Distribution Including Apache Hadoop (ODH)
ODH is built from the ground up, natively integrated into Oracle's data platform. ODH is fully managed, with the same Hadoop components you know and build on today. ODH is available as versions ODH 2.x and ODH 1.x.
Apache Hive supports functions for data masking which may include weak algorithms. For strong encryption algorithm custom functions can be written. For more information see Apache Hive UDF Reference at: hive/languagemanual+udf.
OCI provides SDKs that interact with Big Data without the need to create a framework.
Resource Identifiers 🔗
Big Data Service resources, like most types of resources in Oracle Cloud Infrastructure, have a unique, Oracle-assigned identifier called an Oracle Cloud ID (OCID).
For information about the OCID format and other ways to identify your resources, see Resource Identifiers.
Regions and Availability Domains 🔗
Regions and availability domains indicate the physical and logical organization of your Big Data Service resources. A region is a localized geographic area, and an availability domain is one or more data centers located within a region.
When you sign up for Oracle Cloud Infrastructure (OCI), a set of service limits is configured for your tenancy. The service limit is the quota or allowance set on a resource. These limits might be increased for you automatically based on your OCI resource usage and account standing. See Service Limits.
Default Service Limits 🔗
Among the limits set on your tenancy are limits on the number of Big Data Service cluster nodes you can create. More specifically, you're restricted to a certain number of nodes of a certain shape.
The following table shows the default limits to various cluster shapes. These are your limits if you didn't make other arrangements when you bought your subscription and if you haven't already asked for an increase.
In practice, you increase the number of nodes, or instances, in a cluster. ("Nodes" and "instances" mean the same thing in this context. OCI services usually use the term "instance," but Big Data Service follows the Hadoop convention of using the term "node.")
However, the limits are usually expressed as a number of Oracle Compute Units (OCPUs). Each type of Big Data Service node shape has a set number of OCPUs. The number after the decimal in the node shape name indicates the number of OCPUs in a single node of that shape. For example, a VM.Standard2.1 node has one OCPU, a VM.Standard2.4 node has four OCPUs, and a BM.DenseIO2.52 node has 52 OCPUs.
For example, if your subscription uses monthly universal credits, the default limit for node shape VM.Standard2.4 is 48 OCPUs, which equals 12 nodes. The calculation is as follows: 48 OCPUs service limit divided by 4 OCPUs per node equals 12 nodes.
Finding Tenancy Limits 🔗
Note
You must have permission to view limits and usage. See "To view your tenancy's limits and usage" under Service Limits.
Big Data Service is integrated with OCI Search. Search lets you find resources within a tenancy and important information about clusters and configuration objects, such as API keys, metastore configurations, lake configurations.
Examples of search queries:
Example 1: Search for all Big Data Service resources
Copy
query bigdataservice resources
Example 2: Search for all active Big Data Service clusters
Copy
query bigdataservice resources where lifecycleState = 'ACTIVE'
Big Data Service is fully integrated with OCI Search and supports specific resource types.
Certain actions performed on Big Data Service clusters emit events.
You can define rules that trigger a specific action when an event occurs. For example, you might define a rule that sends a notification to administrators when someone deletes a resource. See Overview of Events and Getting Started with Events.
The following table lists Big Data Service
event types.