Creating and Modifying Clusters
Creating a Cluster
Create an Big Data Service cluster from the Oracle Cloud Console.
Before you can create a cluster, you must have:
-
An Oracle Cloud account or trial, with your sign-in credentials. See Request and Manage Free Oracle Cloud Promotions.
-
A Virtual Cloud Network (VCN), with subnets. See Network Resources.
- Permission to create a cluster. See Creating Policies.
- A Secure Shell (SSH) key pair. See Creating a Key Pair.
The Cluster creation wizard asks you to provide information about your network and to make choices based on your network. To prepare for those questions, have the name of your network, its compartment, and its regional subnet name ready.
Adding Worker and Edge Nodes to a Cluster
When you add worker, compute only worker, or edge nodes to a cluster, you expand both compute and storage. The new nodes use the same instance shape and amount of block storage as the existing worker nodes in the cluster.
Adding Block Storage to Worker Nodes
Block storage is a network-attached storage volume that you can use like a regular hard drive. You can attach extra block storage to the worker nodes of a cluster.
Nodes in a cluster can have remote, network-attached, block storage or local, direct-attached, Non-Volatile Memory Express (NVMe) storage. Remote block storage is flexible and economical, while local NVMe storage provides the highest performance. The default storage type is determined when the cluster is created, based on the shape chosen for the cluster. The high-performance bare metal nodes and dense I/O virtual machine nodes are created with NVMe storage. Other kinds of virtual machine nodes are created with block storage.
You can attach extra storage to any cluster. You can't remove storage.
To add a block volume to the cluster:
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, find the cluster to which you want to add block storage.
- Click the actions menu, and select Add Block Storage.
-
In the Add Block Storage dialog box, enter information,
as follows:
- Additional Block Storage per Node (in GB) - Enter a number to indicate how many gigabytes of block storage to add, between 150GB and 32TB, in increments of 50GB.
- Cluster Admin Password - Enter the administration password for the cluster.
- Click Add.
Adding Cloud SQL to a Cluster
You can add Oracle Cloud SQL to a cluster so you can use SQL to query your big data sources.
Cloud SQL is not included with Big Data Service. You must pay an extra fee for use of Cloud SQL.
When you add Cloud SQL support to a cluster, a query server node is added and big data cell servers are created on all worker nodes.
For clusters with Oracle Distribution including Apache Hadoop, Cloud SQL is only supported for non-HA clusters.
Configuring an External Metastore
Data Catalog provides a highly available and scalable metastore for Hive implementations.
You can configure this Data Catalog metastore as an external metastore for your Big Data Service cluster.
You can create one external metastore for your Big Data Service cluster.
To create an external metastore configuration for Big Data Service, you must have the permissions and policies to access and manage a valid Data Catalog metastore.
When you create an external metastore for the Big Data Service cluster, the local metastore for the cluster is made inactive and the new external Data Catalog metastore is activated.
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, click the name of your cluster.
- In the left panel, under Resources click Metastore Configurations.
- Click Create external metastore configuration.
- In the Create external metastore configuration panel, review the Metastore name of the external Data Catalog metastore that is available in the specified compartment for you to configure.
- Select the API key alias you specified when you created the Object Storage API key.
- Enter the API key passphrase. You specified this passphrase when you created the API key.
- Enter the Cluster admin password for the cluster. You specified this password when you created the cluster.
- Click Create.
After successful creation of the external metastore for the Big Data Service cluster, the local metastore for the cluster is made inactive and the new external Data Catalog metastore is activated.
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, click the name of your cluster.
- In the left panel, under Resources click Metastore Configurations.
- The default local metastore is listed. An external metastore is also listed if you have created it. Click the local or external metastore name to view additional details.
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, click the name of your cluster.
- In the left panel, under Resources click Metastore Configurations.
- Click the actions menu for the metastore configuration you want to test and select Test configuration.
- In the Test configuration dialog, enter the Cluster admin password for your cluster and click Test configuration.
- Review the connection status and when done, click Close.
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, click the name of your cluster.
- In the left panel, under Resources click Metastore Configurations.
- Click the actions menu for the external metastore configuration and select Update API key.
- In the Update API key panel, select the new API key.
- Enter the API key passphrase for the new API key.
- Enter the Cluster admin password for the cluster.
- Click Update.
When you activate the local metastore, the external metastore becomes inactive.
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, click the name of your cluster.
- In the left panel, under Resources click Metastore Configurations.
- Click the actions menu for the local metastore configuration and select Activate.
- In the Activate local metastore dialog, enter the Cluster admin password for the cluster.
- Click Activate.
You can delete an inactive external metastore. The external metastore becomes inactive when you activate the local metastore.
You can not delete the default local metastore for your cluster.
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, click the name of your cluster.
- In the left panel, under Resources click Metastore Configurations.
- Click the actions menu for the inactive external metastore configuration and select Delete. You can not delete an active external metastore. To make the external metastore inactive, activate the local metastore.
Modifying a Cluster
Renaming a Cluster
You can change the name of any cluster.
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- Click the action menu for cluster you want to rename, and select Rename Cluster.
- In the Rename Cluster dialog box, enter a new name for the cluster and click Rename.
Changing the Shape of Cluster Nodes
You can change the shapes of the nodes of a cluster after the cluster is created, with the following restrictions:
- All the master nodes and utility nodes in a cluster must use the same shape, and all worker nodes must use the same shape. However, the master nodes and utility nodes can use a different shape than the worker nodes. Therefore, when you change the shapes of nodes, you can change the shape of all the master and utility nodes together, and you can change the shape of all the worker nodes together.
- You can change the shapes only of nodes using standard shapes, and you can only change them to other standard shapes. For information about standard shapes, see Compute Shapes. For information about shapes supported in Big Data Service, see Planning the Cluster Layout, Shape, and Storage.
To change the shape of a cluster:
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, click the name of your cluster.
- Click Change Shape.
-
In the Change Shape panel, do the following:
- Choose node type - The available options are as follows:
- Master/utility: Select this option to change the shape of all the master and utility nodes
- Worker: Select this option to change the shape of all the worker nodes
- Compute only worker: Select this option to change the shape of compute only worker nodes.
- Edge: Select this option to change the shape of edge nodes.
- Cloud SQL: Select this option to change the shape of all the Cloud SQL nodes, if installed
- Existing Shape - This read-only field shows the current shape of the nodes of the type you selected for Choose your node type, above.
- New Shape - Select a new shape for the nodes of the selected type.
- Cluster Admin Password - Enter the admin password for the cluster. (The password was assigned when the cluster was created.)
- Choose node type - The available options are as follows:
- Click Change Shape.
Autoscaling a Cluster
You can create an autoscale configuration for a cluster so that the compute shapes and numbers of the worker nodes are automatically increased or decreased, based on the CPU utilization thresholds.
Autoscaling allows you to maintain optimum performance of your cluster, while keeping costs as low as possible. Autoscaling monitors your CPU utilization and automatically adjusts the CPU capacity, based on the configuration parameters you set.
When a cluster is autoscaled, the new details should be reflected in Apache Ambari or Cloudera Manager. To register that change with Apache Ambari or Cloudera Manager, a new cluster admin password is created when you create an autoscale configuration. The password is deleted when the autoscale configuration is deleted.
You can have one autoscale configuration per supported node type. Therefore, on a cluster with both worker and compute only worker nodes, you can have up to two autoscale policies.
Using Object Storage API keys
Big Data Service uses the OCI API signing key mechanism to connect to Object Storage. You can access Object Storage buckets encrypted with user managed keys from the Big Data Service cluster nodes. For details on data encryption, see securing Object Storage.
To use the Object Storage API keys, you must create a Big Data Service cluster with version 3.0.4 or later. The Big Data Service version is displayed on the Cluster Information tab of the Cluster Details page. .
allow any-user to {USER_INSPECT, USER_READ, USER_UPDATE, USER_APIKEY_ADD, USER_APIKEY_REMOVE} in tenancy where request.principal.id = target.user.id
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, click the name of your cluster.
- In the left panel, under Resources click Object Storage API keys.
- Click Create key.
- In the Create API key panel, enter a key alias to uniquely identify this key in the cluster.
- Enter the OCID of the user who can use this API key. To retreive the user OCID, from the Console navigate to Identity & Security → Users. From the actons menu for the user, click Copy OCID.
- Enter and confirm a passphrase. This passphrase is used to encrypt the API key and cannot be changed later.
- Select a default region that is used to establish the Object Storage endpoint name.
- Click Create.
You can view and copy the public key of the Object Storage API key from its configuraton file.
- Access the cluster details page of the cluster that has the API key you want to view.
- In the left panel, under Resources click Object Storage API keys.
- From the actions menu of the API key you want to view, click View configuration file.
- The public key details of the API key are displayed in the View configuration file dialog.
- View the configuration file details or copy the public key.
- Access the cluster details page of the cluster that has the API key you want to test.
- In the left panel, under Resources click Object Storage API keys.
- From the actions menu of the API key you want to test, click Test connection.
-
Enter the Object Storage URI for the bucket you want to connect to in the URI format
oci://MyBucket@MyNamespace/
. - Enter the passphrase of the API key. You specified this passphrase when you created the API key.
- Click Test connection. The status of the test connection is displayed.
When an Object Storage API key is deleted, all user access to run Objet Storage jobs on the Big Data Service clusters is revoked.
- Access the cluster details page of the cluster that has the API key you want to delete.
- In the left panel, under Resources click Object Storage API keys.
- From the actions menu of the API key you want to delete, click Delete.
- To confirm the deletion, enter the key alias of the key you want to delete.
- Click Delete.
Restarting a Cluster Node
You can restart a node in a running cluster.
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, click the name of the cluster with the node you want to restart.
- On the Cluster Details page, under List of cluster nodes, click the action menu for the node you want to restart. Select Restart Node from the menu.
- In the Restart Node dialog box, enter the name of the node to restart, and click Restart.
Deleting a Cluster Node
You can delete a node in a running cluster.
Removing a node might fail when decommissing it. There's an upper limit of 40 minutes, and if decommissing the node doesn't complete in the 40 minutes, the request fails. Time taken to decommission a node depends on the number of blocks in the node that need to be moved. Therefore, we recommend you decommission a node first, followed by removing the node from OCI console.
To decommission a node in Ambari, do the following:
- Access Apache Ambari.
- From the side toolbar, under Hosts, and then select the worker node to be decommissioned.
- From the list of components installed on that host, select Data Node.
- Click the ... Action icon, and then select Decommission.
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- In the list of clusters, click the name of the cluster with the node you want to delete.
- On the Cluster Details page, under List of cluster nodes, click the action menu for the node you want to delete. Select Delete Node from the menu.
- In the Delete Node dialog box, enter the name of the node to delete, and click Delete.
- Select the Force delete even when decommissioning fails checkbox if you want to delete the node even when decommissioning of the node fails.
Removing Cloud SQL from a Cluster
Oracle Cloud SQL can be added to a Big Data Service cluster, for an extra fee. If Cloud SQL has been added to a cluster, you can remove it, and you'll no longer be charged for Cloud SQL on the cluster.
Removing Cloud SQL from a cluster terminates the query server node and deletes any files on that node. This is an irreversible action.
Removing Cloud SQL from the cluster:
- Removes Cloud SQL cells from the cluster worker nodes
- Terminates the query server node and deletes any files or work that you have on that host. (The VM is terminated.)
- Has no impact on Hive metadata or the sources that Cloud SQL accesses.
- Ends the billing for Cloud SQL. You no longer pay for Cloud SQL once it is removed.
To remove Cloud SQL from a cluster:
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- Click action menu for the cluster you want to modify and select Remove Cloud SQL.
- In the Remove Cloud SQL dialog box, enter the cluster admin password and click Remove.
Installing Available Updates
To install the available updates for a cluster, follow these steps:
Terminating a Cluster
You can terminate any cluster.
Terminating a cluster deletes the cluster and removes all the data contained in local storage or block storage. This is an irreversible action.
To terminate the cluster:
- Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
- In the left panel, under Compartment, select the compartment that hosts your cluster.
- Click the action menu for the cluster you want to terminate and select Terminate Big Data Cluster.
- In the Terminate Big Data Cluster dialog box, enter the name of the cluster and click Terminate.