Creating and Modifying Clusters

Creating a Cluster

Create an Big Data Service cluster from the Oracle Cloud Console.

Before you can create a cluster, you must have:

The Cluster creation wizard asks you to provide information about your network and to make choices based on your network. To prepare for those questions, have the name of your network, its compartment, and its regional subnet name ready.

To create a cluster:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select a compartment to host the cluster.
  3. Click Create Cluster.
  4. Enter the following information:
    • Cluster Name: Enter a name to identify the cluster.

    • Cluster Admin Password: Enter a string to be used as the cluster password. You need this password to sign into Apache Ambari or Cloudera Manager depending on your cluster version and to perform certain actions on the cluster through the Cloud Console.

    • Confirm Cluster Admin Password: Reenter the password.

    • Secure and Highly Available (HA): Check this box to make the cluster secure and highly available. A secure cluster has the full Hadoop security stack, including HDFS Transparent Encryption, Kerberos, and Apache Sentry. This setting can't be changed for the life of the cluster.

    • Kerberos realm name: This field appears when you select the Secure and Highly Available (HA) check box. The default value is BDSCLOUDSERVICE.ORACLE.COM. However, you can provide a different value. Typically, the realm name is the same as your DNS domain name except that the realm name is in uppercase. This convention helps differentiate problems with the Kerberos service from the problems with the DNS namespace, while keeping a name that is familiar.

      A valid Kerberos realm name must consist of 2-32 ASCII characters and must be a combination of uppercase letters, numbers, dashes (-), and dots (.). It must also start and end with uppercase letters. If you plan to integrate your BDS cluster with an existing Active Directory server, you must ensure that the Kerberos realm name of your BDS cluster is different from the DNS names of the Active Directory domains.

    • Cluster Version: Select a distribution and a version of the Hadoop distribution to use for your cluster. Choose one of the following:

      • Select ODH 2.0, ODH 1.0, or ODH 0.9 (Oracle Distribution including Apache Hadoop) to use Oracle's implementation of Hadoop.
      • Select a version of the Cloudera Distribution Including Apache Hadoop (CDH) software to use for this cluster. The listed versions are fully supported. See the Cloudera documentation for descriptions of the features in each release.
    • Cluster Profile: Select the cluster profile for the cluster (available for ODH 2.0 and ODH 1.0 versions only). See Understanding Instance Types and Shapes for more information. If Kafka Broker is selected, step 7 is needed.
  5. In the Hadoop Nodes section of the page, configure the types, shapes, and numbers of the compute instances (servers) to host the master and worker nodes of the cluster. For information about the choices you can make, see Understanding Instance Types and Shapes. Not all shapes are available by default, although you can request those not listed. See Requesting a Service Limit Increase.

    Enter details for Master/Utility Nodes:

    • Choose Instance Type: Click the Virtual Machine box or the Bare Metal box, to indicate what type of compute instances you want for the master nodes.

    • Choose Master/Utility Node Shape: Select the shape for the master and utility nodes. See Understanding Instance Types and Shapes and see for details about the available shapes.

      Requesting a Service Limit Increase
    • Block Storage Size per Master/Utility Node (in GB): Enter the block storage size, in gigabytes (GB); for each master and utility node.

    • Number of Master & Utility Nodes: A high-availability (HA) cluster always has 4 master/utility nodes, and a non-HA cluster always has 2 master/utility nodes. Therefore, this read-only field shows 4 nodes for an HA cluster or 2 nodes for a non-HA cluster.

  6. Enter details for Worker Nodes
    • Choose Instance Type: Click the Virtual Machine box or the Bare Metal box, to indicate what kind of compute instances you want.

    • Choose Worker Node Shape: Select the shape for the worker nodes. See Understanding Instance Types and Shapes and Requesting a Service Limit Increase for details about the available shapes.

    • Block Storage Size per Worker Node: Enter the block storage size, in gigabytes (GB), for each worker node.

    • Number of Worker Nodes: Enter the number of worker nodes for the cluster, with a minimum 3 nodes.

  7. Enter details for Kafka Broker Nodes. Applicable only if Kafka Broker profiles are selected in step 4, otherwise this section isn't applicable.
    • Choose Instance Type: Click the Virtual Machine box or the Bare Metal box, to indicate what kind of compute instances you want.

    • Choose Kafka Broker Node Shape: Select the shape for the Kafka broker nodes. See Understanding Instance Types and Shapes and Requesting a Service Limit Increase for details about the available shapes.

    • Block Storage Size per Kafka Broker Node: Enter the block storage size, in gigabytes (GB), for each Kafka broker node.

    • Number of Kafka Broker Nodes: Enter the number of Kafka Broker nodes for the cluster, with a minimum 3 for secure clusters and 1 for nonsecure clusters.

  8. In the Network Settings section, complete the network details for your cluster.
    • Cluster Private Network: Enter a CIDR block for the cluster private network that will be created for the cluster.

      The cluster private network is created in the Oracle tenancy (not your customer tenancy), and it's used exclusively for private communication among the nodes of the cluster. No other traffic travels over this network, it isn't accessible by outside hosts, and you can't modify it once it's created. All ports are open on this network.

      In the CIDR Block field, enter a CIDR block to assign the range of contiguous IP addresses available for this private network, or accept the default 10.0.0/16. This CIDR block range cannot overlap the CIDR block range in your customer network, discussed in the next step.

    • Customer Network: Enter information to add the cluster to your Virtual Cloud Network (VCN) and a regional subnet in that VCN.

      Choose VCN in <compartment>: Accept the current compartment, or click Change Compartment to select a different one. Then select the name of an existing VCN in that compartment to use for the cluster. The VCN must contain a regional subnet.

      Choose Regional Subnet in <compartment>: Choose a regional subnet to use for the cluster.

      Important: If you plan to make any of the IP addresses in the subnet public (to allow access to a node from the internet), you must select a public subnet for the cluster. For more information, see VCNs and Subnets.

    • Network Options
      Deploy Oracle-managed Service gateway and NAT gateway (Quick Start): Select this option to simplify your network configuration by allowing Oracle to provide and manage these communication gateways. When you select this option, a service gateway and a Network Address Translation (NAT) gateway are deployed for private use by the cluster. These gateways are created in the Oracle tenancy and can't be modified after the cluster is created.
      • A NAT gateway enables nodes without public IP addresses to initiate connections to and receive responses from the internet but not to receive inbound connections initiated from the internet. See NAT Gateway.
      • A service gateway enables nodes without public IP addresses to privately access Oracle services, without exposing the data to an internet gateway or a NAT gateway. See Service Gateway.

      Follow these guidelines:

      • Choose this option to give all nodes in the cluster private network full outbound access to the public internet. When you select this option, you won't be able to limit that access in any way (for example by restricting egress to only a few IP ranges).

        If you select this option, your cluster won't be able to use service gateways or NAT gateways in your customer network.

      • If you don't choose this option, you must create gateways in your customer VCN. When you do this, you can also create security rules to limit egress to specified IP ranges.

      • If you map the private IP addresses of the cluster nodes to public IP addresses, then a NAT gateway isn't needed. See Map a Private IP Address to a Public IP Address.

        Use the gateways in your selected Customer VCN (Customizable): Select this option to permit the cluster to use the gateways in your customer VCN. You must create and configure these resources yourself.
        Note

        If you create your network by using one of the network creation wizards in the console, gateways are created for you, but you might need to configure them further to suit your needs. See Virtual Networking Quickstart. For complete information about setting up a gateway, see Service Gateway.

    • Encryption: Select one of the following:
      • Encrypt using Oracle-managed keys: Select to leave all encryption related matters to Oracle.

      • Encrypt using customer-managed keys: Select if you have access to and want to use a valid customer-managed encryption key. Select the following:

        • Choose Vault in <compartment>: Accept the current compartment, or click Change Compartment to select a different compartment. Then select the name of an existing vault in that compartment.
        • Choose Master encryption key in <compartment>: Accept the current compartment, or click Change Compartment to select a different compartment. Then select an existing master encryption key in that compartment.

      For additional information on creating and managing vaults, see To create a new vault and Managing Vaults. For additional information on creating and managing master encryption keys, see To create a new master encryption key and Managing Keys.

  9. Under Additional Options, do the following:
    • SSH Key: Enter an SSH public key in any of the following ways:

      • Select Choose SSH Key File, then either

        • Drag and drop a public SSH key file into the box,

        • or click select one... and navigate to and choose a public SSH key file from your local file system.

      • Select Paste SSH Key and paste the contents from a public SSH key file into the box.

    • Bootstrap script URL: Enter a publicly accessible URL for the bootstrap script. The script runs on all the cluster nodes after a cluster is created, when the shape of a cluster changes, or when you add or remove nodes from a cluster. You can use this script to install, configure , and manage custom components in a cluster.
  10. Tags: Use tags to help you organize and list resources. Enter tags as described in Tagging Overview.
  11. Click Create Cluster.

Adding Worker and Edge Nodes to a Cluster

When you add worker, compute only worker, or edge nodes to a cluster, you expand both compute and storage. The new nodes use the same instance shape and amount of block storage as the existing worker nodes in the cluster.

To add nodes to a cluster:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, find the cluster to which you want to add worker nodes.
  4. In the Add nodes panel that appears, enter the following details:
    Node type
    Select the node type. The available options are as follows:
    • Worker: A cluster must have at least three worker nodes.
    • Compute only worker: You can't add a compute only worker node while creating a cluster. Therefore, when you add a compute only worker node for the first time, you can update the shape and block storage size. After it's updated, these fields become read-only.
    • Edge: You can't add an edge node while creating a cluster. Therefore, when you add an edge node for the first time, you can update the shape and block storage size. After it's updated, these fields become read-only.
    Node shape
    This read-only field displays the shape used for the existing worker nodes. This shape is used for all the nodes you add. For information about the shapes, see Understanding Instance Types and Shapes.
    Block storage per node
    This read-only field displays the block storage used for the existing worker nodes. The same amount of storage is used for all the nodes you add.
    Number of worker nodes
    Enter the number of worker nodes or compute only worker nodes to be added to the cluster. A cluster can have from 3 to 256 worker nodes. An ODH cluster can have from 0 to 256 compute worker nodes.
    Cluster admin password
    Enter the administration password for the cluster.
  5. Click Add.

Adding Block Storage to Worker Nodes

Block storage is a network-attached storage volume that you can use like a regular hard drive. You can attach extra block storage to the worker nodes of a cluster.

Note

Nodes in a cluster can have remote, network-attached, block storage or local, direct-attached, Non-Volatile Memory Express (NVMe) storage. Remote block storage is flexible and economical, while local NVMe storage provides the highest performance. The default storage type is determined when the cluster is created, based on the shape chosen for the cluster. The high-performance bare metal nodes and dense I/O virtual machine nodes are created with NVMe storage. Other kinds of virtual machine nodes are created with block storage.

You can attach extra storage to any cluster. You can't remove storage.

To add a block volume to the cluster:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, find the cluster to which you want to add block storage.
  4. Click the actions menu, and select Add Block Storage.
  5. In the Add Block Storage dialog box, enter information, as follows:
    • Additional Block Storage per Node (in GB) - Enter a number to indicate how many gigabytes of block storage to add, between 150GB and 32TB, in increments of 50GB.
    • Cluster Admin Password - Enter the administration password for the cluster.
  6. Click Add.

Adding Cloud SQL to a Cluster

You can add Oracle Cloud SQL to a cluster so you can use SQL to query your big data sources.

Note

Cloud SQL is not included with Big Data Service. You must pay an extra fee for use of Cloud SQL.

When you add Cloud SQL support to a cluster, a query server node is added and big data cell servers are created on all worker nodes.

For information about using Cloud SQL with Big Data Service see Using Cloud SQL with Big Data.
Note

For clusters with Oracle Distribution including Apache Hadoop, Cloud SQL is only supported for non-HA clusters.

Configuring an External Metastore

Data Catalog provides a highly available and scalable metastore for Hive implementations.

You can configure this Data Catalog metastore as an external metastore for your Big Data Service cluster.

Creating an external metastore

You can create one external metastore for your Big Data Service cluster.

Prerequisites:

To create an external metastore configuration for Big Data Service, you must have the permissions and policies to access and manage a valid Data Catalog metastore.

Note

When you create an external metastore for the Big Data Service cluster, the local metastore for the cluster is made inactive and the new external Data Catalog metastore is activated.
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Metastore Configurations.
  5. Click Create external metastore configuration.
  6. In the Create external metastore configuration panel, review the Metastore name of the external Data Catalog metastore that is available in the specified compartment for you to configure.
  7. Select the API key alias you specified when you created the Object Storage API key.
  8. Enter the API key passphrase. You specified this passphrase when you created the API key.
  9. Enter the Cluster admin password for the cluster. You specified this password when you created the cluster.
  10. Click Create.

After successful creation of the external metastore for the Big Data Service cluster, the local metastore for the cluster is made inactive and the new external Data Catalog metastore is activated.

Viewing metastore details
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Metastore Configurations.
  5. The default local metastore is listed. An external metastore is also listed if you have created it. Click the local or external metastore name to view additional details.
From the actions menu for each metastore, you can test the metastore configuration, activate a metastore, update API key, or delete the metastore. You can not delete the local metastore.
Testing the local or external metastore configuration
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Metastore Configurations.
  5. Click the actions menu for the metastore configuration you want to test and select Test configuration.
  6. In the Test configuration dialog, enter the Cluster admin password for your cluster and click Test configuration.
  7. Review the connection status and when done, click Close.
Updating API key for an external metastore
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Metastore Configurations.
  5. Click the actions menu for the external metastore configuration and select Update API key.
  6. In the Update API key panel, select the new API key.
  7. Enter the API key passphrase for the new API key.
  8. Enter the Cluster admin password for the cluster.
  9. Click Update.
Activating the local metastore

When you activate the local metastore, the external metastore becomes inactive.

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Metastore Configurations.
  5. Click the actions menu for the local metastore configuration and select Activate.
  6. In the Activate local metastore dialog, enter the Cluster admin password for the cluster.
  7. Click Activate.
When you activate the local metastore, the external metastore is automatically made inactive.
Deleting an external metastore

You can delete an inactive external metastore. The external metastore becomes inactive when you activate the local metastore.

Note

You can not delete the default local metastore for your cluster.
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Metastore Configurations.
  5. Click the actions menu for the inactive external metastore configuration and select Delete. You can not delete an active external metastore. To make the external metastore inactive, activate the local metastore.

Modifying a Cluster

Renaming a Cluster

You can change the name of any cluster.

To rename the cluster:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. Click the action menu for cluster you want to rename, and select Rename Cluster.
  4. In the Rename Cluster dialog box, enter a new name for the cluster and click Rename.

Changing the Shape of Cluster Nodes

You can change the shapes of the nodes of a cluster after the cluster is created, with the following restrictions:

  • All the master nodes and utility nodes in a cluster must use the same shape, and all worker nodes must use the same shape. However, the master nodes and utility nodes can use a different shape than the worker nodes. Therefore, when you change the shapes of nodes, you can change the shape of all the master and utility nodes together, and you can change the shape of all the worker nodes together.
  • You can change the shapes only of nodes using standard shapes, and you can only change them to other standard shapes. For information about standard shapes, see Compute Shapes. For information about shapes supported in Big Data Service, see Planning the Cluster Layout, Shape, and Storage.

To change the shape of a cluster:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. Click Change Shape.
  5. In the Change Shape panel, do the following:
    • Choose node type - The available options are as follows:
      • Master/utility: Select this option to change the shape of all the master and utility nodes
      • Worker: Select this option to change the shape of all the worker nodes
      • Compute only worker: Select this option to change the shape of compute only worker nodes.
      • Edge: Select this option to change the shape of edge nodes.
      • Cloud SQL: Select this option to change the shape of all the Cloud SQL nodes, if installed
    • Existing Shape - This read-only field shows the current shape of the nodes of the type you selected for Choose your node type, above.
    • New Shape - Select a new shape for the nodes of the selected type.
    • Cluster Admin Password - Enter the admin password for the cluster. (The password was assigned when the cluster was created.)
  6. Click Change Shape.

Autoscaling a Cluster

You can create an autoscale configuration for a cluster so that the compute shapes and numbers of the worker nodes are automatically increased or decreased, based on the CPU utilization thresholds.

Autoscaling allows you to maintain optimum performance of your cluster, while keeping costs as low as possible. Autoscaling monitors your CPU utilization and automatically adjusts the CPU capacity, based on the configuration parameters you set.

Note

When a cluster is autoscaled, the new details should be reflected in Apache Ambari or Cloudera Manager. To register that change with Apache Ambari or Cloudera Manager, a new cluster admin password is created when you create an autoscale configuration. The password is deleted when the autoscale configuration is deleted.
Create an Autoscale Configuration

You can have one autoscale configuration per supported node type. Therefore, on a cluster with both worker and compute only worker nodes, you can have up to two autoscale policies.

To create an autoscale configuration:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Autoscale Configurations.
  5. Click Create Autoscale Configuration.
  6. In the Create Autoscale Configuration panel, enter the details as described in the following table:
    Field Description
    Autoscale configuration name Enter a name for this configuration.
    Cluster admin password Enter the admin password for the cluster. (You assign an admin password when you create a cluster.)
    Node type Select the type of node. The available options are as follows:
    • Worker: Select this option to add block storage and compute to your cluster.
    • Compute only worker: Select this option to add only compute to your cluster.
    Autoscale type The available options are as follows:
    • Horizontal: Select this option to add or remove nodes from the cluster. This option is available only if you select Compute only worker node type.
    • Vertical: Select this option to change the shape of the cluster nodes.
    Trigger type Initiates the autoscale when the specified metrics are met or when they exceed the threshold.
    Performance Metrics Measures the average CPU utilization of the cluster nodes over a period of time.
    Scale-up rule This field appears when you select the Vertical autoscale type. The scale-up rule sets the conditions for scaling up the cluster (use a larger compute shape). Enter the following details:
    • Average CPU threshold percentage: Set a percentage of average CPU utilization, so that when the CPU operates at that average percentage (or higher) for the minimum duration in minutes, the cluster is scaled up.
    • Minimum duration in minutes: Set a duration, so that when the average CPU threshold percentage operates for that duration, the cluster is scaled up.
    If you selected a Flex shape, then also specify the following details:
    • Maximum limits: Enter the maximum number of OCPUs and memory the cluster can have.
    • Step size: Specify the number of OCPUs and memory that should be added to the cluster each time the autoscale is triggered.
    For example, if the scale-up rule is:
    • Threshold percentage = 80%
    • Minimum duration = 30 minutes
    and the shapes of the worker nodes are VM.Standard2.4, when the CPU utilization averages 80% or more for 30 minutes, the worker nodes are scaled up to VM.Standard2.8. For Flex shapes, the shape of worker nodes changes depending on the OCPU and memory specified.
    Scale-down rule This field appears when you select the Vertical autoscale type. The scale-down rule sets the conditions for scaling down the cluster (use a smaller compute shape). Enter the following details:
    • Average CPU threshold percentage: Set a percentage of average CPU utilization, so that when the CPU operates at that average percentage (or lower) for the minimum duration in minutes, the cluster is scaled down.
    • Minimum duration in minutes: Set a duration, so that when the average CPU threshold percentage operates for that duration, the cluster is scaled down.
    If you selected a Flex shape, then also specify the following details:
    • Minimum limits: Enter the minimum number of OCPUs and memory the cluster can have.
    • Step size: Specify the number of OCPUs and memory that should be removed from the cluster each time the autoscale is triggered.
    For example, if the scale-down rule is:
    • Threshold percentage = 20%
    • Minimum duration = 30 minutes
    and the shapes of the worker nodes are VM.Standard2.8, when the CPU utilization averages 20% or less for 30 minutes, the worker nodes are scaled down to VM.Standard2.4. For Flex shapes, the shape of worker nodes changes depending on the OCPU and memory specified.
    Scale-out rule This field appears when you select the Horizontal autoscale type.

    Horizontal scaling applies only to compute only worker nodes.

    The scale-out rule sets the conditions for scaling out the cluster. Enter the following details:
    • Average CPU threshold percentage: Set a percentage of average CPU utilization, so that when the CPU operates at that average percentage (or higher) for the minimum duration in minutes, the cluster is scaled out.
    • Minimum duration in minutes: Set a duration, so that when the average CPU threshold percentage operates for that duration, the cluster is scaled out.
    • Maximum number of nodes: Enter the maximum number of nodes that the cluster can have. When autoscale is triggered, the number of nodes specified in Step size are added to the cluster if the number of nodes in the cluster after addition is lesser than or equal to the number of nodes specified in this field.
    • Step size: The number of nodes to add to the cluster when autoscale is triggered.
    Scale-in rule This field appears when you select the Horizontal autoscale type.

    Horizontal scaling applies only to compute only worker nodes.

    The scale-in rule sets the conditions for scaling in the cluster (use a smaller compute shape). Enter the following details:
    • Average CPU threshold percentage: Set a percentage of average CPU utilization, so that when the CPU operates at that average percentage (or higher) for the minimum duration in minutes, the cluster is scaled out.
    • Minimum duration in minutes: Set a duration, so that when the average CPU threshold percentage operates for that duration, the cluster is scaled out.
    • Minimum number of nodes: Enter the minimum number of nodes that the cluster can have. When autoscale is triggered, the number of nodes specified in Step size are removed from the cluster if the number of nodes in the cluster after removal is greater than or equal to the number of nodes specified in this field.
    • Step size: The number of nodes to remove from the cluster when autoscale is triggered.
    For example, if the scale-in rule is:
    • Threshold percentage = 20%
    • Minimum duration = 30 minutes
    • Maximum number of nodes = 1
    • Step size = 1

    and the shapes of the worker nodes are VM.Standard2.8, when the CPU utilization averages 20% or more for 30 minutes with the maximum number of nodes are one and step size is one, the worker nodes are scaled in to VM.Standard2.4.

    Schedule-based Horizontal condition This field appears when you select scheduled-based Horizontal autoscale type.

    Note: Scheduled-based Horizontal autoscaling applies only to compute only worker nodes.

    For schedule-based Horizontal autoscaling, enter the following details:

    • Days: Enter the days of the week that you want the condition to run.
    • Time: Enter a trigger time for the condition to begin. For example, 9:00 AM.
    • Time Optional: Enter second trigger time on the same day. For example, 4:00 PM.
    • Number of nodes: Enter the target number of nodes at first trigger.
    • Number of nodes Optional: Enter the target number of nodes at second trigger.

    For example, if the schedule-based Horizontal condition is:

    • Days = Monday, Tuesday, Friday
    • Time = 9:00 AM
    • Number of nodes = 10
    • Time Optional = 4:00 PM
    • Number of nodes Optional = 1

    In this example, there are 10 available nodes between 9:00 AM and 4:00 PM on the specified days. However, when the time reaches 4:00 PM, the nodes drop to one and remain at one until another trigger is reached.

    Scheduled-based Vertical condition This field appears when you select scheduled-based Verticle autoscale type.

    For schedule-based Vertical autoscaling, enter the following details:

    • Days: Enter the days of the week that you want the condition to run.
    • Time: Enter a trigger time for the condition to begin. For example, 9:00 AM.
    • Time Optional: Enter second trigger time on the same day. For example, 4:00 PM.
    • Target shape: Select a target shape from the list. If you select a flex shape, you must customize the following resources:
      • Number of OCPUs
      • Amount of memory (GB)
    • Target shape Optional: Select a second trigger target shape.

    For example, if the schedule-based Vertical condition is:

    • Days = Monday, Tuesday, Friday
    • Time = 9:00 AM
    • Target shape = VM.Standard.2.8
    • Time Optional = 4:00 PM
    • Targe shape Optional = VM.Standard.2.4

    In the example, the first trigger is set for 9:00 AM with a target shape of VM.Standard.2.8. The second trigger is set for 4:00 PM with a target shape of VM.Standard.2.4.

  7. Click Create.

    It takes a few minutes for the configuration to take effect. During this time, the cluster is in the Updating state.

    When an autoscale event is triggered, the worker nodes are updated on a rolling basis; that is, one node is updated at a time.

Using Object Storage API keys

Big Data Service uses the OCI API signing key mechanism to connect to Object Storage. You can access Object Storage buckets encrypted with user managed keys from the Big Data Service cluster nodes. For details on data encryption, see securing Object Storage.

Note

To use the Object Storage API keys, you must create a Big Data Service cluster with version 3.0.4 or later. The Big Data Service version is displayed on the Cluster Information tab of the Cluster Details page. .
Create access policy
A tenancy's administrator group users can manage API keys for any user. To allow other users to create and manage Object Storage API keys for themselves, create a policy using the following statement in the root compartment.
allow any-user to {USER_INSPECT, USER_READ, USER_UPDATE, USER_APIKEY_ADD, USER_APIKEY_REMOVE} in tenancy where request.principal.id = target.user.id
Create an Object Storage API key
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of your cluster.
  4. In the left panel, under Resources click Object Storage API keys.
  5. Click Create key.
  6. In the Create API key panel, enter a key alias to uniquely identify this key in the cluster.
  7. Enter the OCID of the user who can use this API key. To retreive the user OCID, from the Console navigate to Identity & Security → Users. From the actons menu for the user, click Copy OCID.
  8. Enter and confirm a passphrase. This passphrase is used to encrypt the API key and cannot be changed later.
  9. Select a default region that is used to establish the Object Storage endpoint name.
  10. Click Create.
The API key is listed in the Object Storage API keys page. When the API key is successfully created, it's status changes to Active.
View the configuration file

You can view and copy the public key of the Object Storage API key from its configuraton file.

  1. Access the cluster details page of the cluster that has the API key you want to view.
  2. In the left panel, under Resources click Object Storage API keys.
  3. From the actions menu of the API key you want to view, click View configuration file.
  4. The public key details of the API key are displayed in the View configuration file dialog.
  5. View the configuration file details or copy the public key.
Test the connection to Object Storage
  1. Access the cluster details page of the cluster that has the API key you want to test.
  2. In the left panel, under Resources click Object Storage API keys.
  3. From the actions menu of the API key you want to test, click Test connection.
  4. Enter the Object Storage URI for the bucket you want to connect to in the URI format oci://MyBucket@MyNamespace/.
  5. Enter the passphrase of the API key. You specified this passphrase when you created the API key.
  6. Click Test connection. The status of the test connection is displayed.
Delete an Object Storage API key

When an Object Storage API key is deleted, all user access to run Objet Storage jobs on the Big Data Service clusters is revoked.

  1. Access the cluster details page of the cluster that has the API key you want to delete.
  2. In the left panel, under Resources click Object Storage API keys.
  3. From the actions menu of the API key you want to delete, click Delete.
  4. To confirm the deletion, enter the key alias of the key you want to delete.
  5. Click Delete.

Restarting a Cluster Node

You can restart a node in a running cluster.

To restart a node:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of the cluster with the node you want to restart.
  4. On the Cluster Details page, under List of cluster nodes, click the action menu for the node you want to restart. Select Restart Node from the menu.
  5. In the Restart Node dialog box, enter the name of the node to restart, and click Restart.

Deleting a Cluster Node

You can delete a node in a running cluster.

Note

Removing a node might fail when decommissing it. There's an upper limit of 40 minutes, and if decommissing the node doesn't complete in the 40 minutes, the request fails. Time taken to decommission a node depends on the number of blocks in the node that need to be moved. Therefore, we recommend you decommission a node first, followed by removing the node from OCI console.

To decommission a node in Ambari, do the following:

  1. Access Apache Ambari.
  2. From the side toolbar, under Hosts, and then select the worker node to be decommissioned.
  3. From the list of components installed on that host, select Data Node.
  4. Click the ... Action icon, and then select Decommission.
To delete a node from the OCI console:
  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. In the list of clusters, click the name of the cluster with the node you want to delete.
  4. On the Cluster Details page, under List of cluster nodes, click the action menu for the node you want to delete. Select Delete Node from the menu.
  5. In the Delete Node dialog box, enter the name of the node to delete, and click Delete.
  6. Select the Force delete even when decommissioning fails checkbox if you want to delete the node even when decommissioning of the node fails.

Removing Cloud SQL from a Cluster

Oracle Cloud SQL can be added to a Big Data Service cluster, for an extra fee. If Cloud SQL has been added to a cluster, you can remove it, and you'll no longer be charged for Cloud SQL on the cluster.

Note

Removing Cloud SQL from a cluster terminates the query server node and deletes any files on that node. This is an irreversible action.

Removing Cloud SQL from the cluster:

  • Removes Cloud SQL cells from the cluster worker nodes
  • Terminates the query server node and deletes any files or work that you have on that host. (The VM is terminated.)
  • Has no impact on Hive metadata or the sources that Cloud SQL accesses.
  • Ends the billing for Cloud SQL. You no longer pay for Cloud SQL once it is removed.

To remove Cloud SQL from a cluster:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. Click action menu for the cluster you want to modify and select Remove Cloud SQL.
  4. In the Remove Cloud SQL dialog box, enter the cluster admin password and click Remove.

Installing Available Updates

To install the available updates for a cluster, follow these steps:

  1. Select the cluster for which you want to install the updates and navigate to its details page.
  2. On the left side of the page, under Resources, click Updates. A list of available updates is displayed.
  3. In the Available updates section, click Install against the patch version that you want to update.
  4. In the Install patch dialog box that appears, enter the cluster admin password and click Install.
    After the update is installed, the Cluster information box displays the latest version of the cluster.

Terminating a Cluster

You can terminate any cluster.

Caution

Terminating a cluster deletes the cluster and removes all the data contained in local storage or block storage. This is an irreversible action.

To terminate the cluster:

  1. Open the navigation menu and click Analytics and AI. Under Data Lake, click Big Data Service.
  2. In the left panel, under Compartment, select the compartment that hosts your cluster.
  3. Click the action menu for the cluster you want to terminate and select Terminate Big Data Cluster.
  4. In the Terminate Big Data Cluster dialog box, enter the name of the cluster and click Terminate.