Creating Network Resources

Before you can create and use Big Data Service clusters, you must create and configure a network. Oracle Cloud Infrastructure Networking service provides a wide range of features for establishing a secure networking topology for your Big Data Service.

For complete documentation about Oracle Infrastructure Networking, see Overview of Networking and the subsequent networking topics in the Oracle Cloud Infrastructure documentation. The following sections discuss networking details that are specific to Big Data Service.

Terminology

The term network refers to a Virtual Cloud Network (VCN) or a subnet in a VCN. When the difference is pertinent, the VCN, or subnet is used.

Instance, host, and node is used interchangeably. However, because the hosts that comprise a Hadoop cluster are called nodes, nodes is used throughout this documentation.

Understanding Networking

In Oracle Cloud Infrastructure, a network consists of at least one Virtual Cloud Network (VCN) with at least one subnet, along with Virtual Network Interface Cards (VNICs), gateways, route tables, security rules, and other virtual networking features. For a simple development environment, you may only need a single VCN with a single subnet in a single region, possibly with access to the public internet. For a complex production environment, you may want to connect your VCN to an on-premises network, and you may want to peer with other VCNs in other regions.

A network used for Big Data Service must meet the general requirements for any Oracle Cloud Infrastructure network, as described in Overview of Networking and the subsequent networking topics in the Oracle Cloud Infrastructure documentation. In addition to those requirements, consider the following information specific to Big Data Service:

Creating and Using Subnets

Subnets divide a VCN by assigning ranges of IP addresses that don't overlap with other subnets in the VCN. Consider the following when creating your network for Big Data Service:

A subnet must be regional and it may be public:

  • In Oracle Cloud Infrastructure, a subnet can exist in a single availability domain or across an entire region. A regional subnet is required for Big Data Service. Therefore, when you create the VCN for Big Data Service, you must create at least one regional subnet in it.

  • Cluster nodes are by default private. If you plan to make your cluster available for access from the public internet, you must use a public subnet. In that case, when you create the VCN, the regional subnet you create (see above) must also be public. See Making Nodes Reachable.

You specify which VCN and which subnet to use for a cluster when you create the cluster. See Creating a Cluster.

Making Nodes Reachable

As mentioned above, cluster nodes are private by default. Nodes are created with private IP addresses, and all ports are closed by default (with the exception of port 22, which is open for SSH access). Therefore, you must configure the network to allow access to the nodes.

Configuring Security Rules for Nodes

A security rule allows a particular type of traffic in or out of a VNIC (which connects nodes to the network). Each security rule specifies direction (ingress or egress), stateful or stateless, source type and source (for ingress rules), destination type and destination (for egress rules), IP protocol, source port, destination port, and ICMP type and code.

To allow network traffic to and from a cluster node, you must configure the security rules for the node. Do this for all nodes that you want to make reachable, whether from the public internet, from a private network, or from both.

SeeDefining Security Rules in this documentation and Security Rules in the Oracle Cloud Infrastructure documentation.

Making Nodes Accessible From the Internet

To make nodes publicly accessible from the internet, you must:

See also Access to the Internet in the Oracle Cloud Infrastructure documentation.

Customer Network and the Cluster Private Network

Big Data Service clusters are dual-homed. The nodes of the cluster are connected to both a cluster private network in the Oracle tenancy and to a customer network in the tenancy.

About the Cluster Private Network

The cluster private network is a Virtual Cloud Network (VCN) and is created in the Oracle tenancy when a cluster is created. Characteristics of this network are:

  • When you create a cluster, you're prompted to specify a CIDR block to allocate a range of IP addresses for the network. This CIDR block can't overlap the CIDR block of the customer private network.
  • The private IP addresses of the cluster nodes are assigned from the CIDR block of the private subnet in this VCN.
  • The network is used exclusively for private communication among the nodes of the cluster. For example, distributed data processing, service monitoring, and so on. All ports are open by default.

  • You can choose to deploy a service gateway and a network address gateway on this network, but you can't otherwise configure gateways, routing tables, or security lists on this network to control network traffic to and from your cluster. See Networking Gateway Options When Creating a Cluster, below.

About the Customer Network

The customer network is a in the customer tenancy. The VCN must already exist (and must have a regional subnet) before a cluster can be created. Details about this network are:

  • When you create a cluster, you're prompted to choose an existing VCN and subnet to associate with the cluster.

  • The subnet you choose for the cluster must be a regional subnet. If you want to make any of the nodes available to traffic from the public internet, you must choose a public subnet. If you're using IPSec VPN or Oracle Cloud Infrastructure FastConnect to connect to your on-premises network, you can use a private subnet, but that means traffic through the public internet won't be allowed.
  • You can configure gateways, routing tables, and security lists on this network to control network traffic to and from your cluster.
  • In your customer VCN, some ports are open for Hadoop components to communicate with each other. We recommend that you encrypt the network communication between these ports using encryption algorithms, such as AES 256.

Networking Gateway Options When Creating a Cluster

When you create a cluster, you must choose between these two options:
  • Choose Deploy Oracle-managed Service gateway and NAT gateway (Quick Start) to deploy a service gateway and a NAT gateway in the cluster private network.
    • A NAT gateway enables nodes without public IP addresses to initiate connections to and receive responses from the internet but not to receive inbound connections initiated from the internet. See NAT Gateway.
    • A service gateway enables nodes without public IP addresses to privately access Oracle services, without exposing the data to an internet gateway or a NAT gateway. See Service Gateway.

    When you select this option, you can't limit that access in any way. For example by restricting egress to only a few IP ranges. When you choose this option:

    • The service gateway and the NAT gateway are used for all the operations described above, for the lifetime of the cluster. You can't change it after the cluster has been created, and any service gateways or NAT gateways in the network are ignored.
    • This NAT gateway gives all nodes in the cluster private network full outbound access to the public internet.
    • You can't further restrict traffic that's directed to the NAT gateway or the service gateway. For example, you can't redirect traffic to or from specific IP addresses.
  • Choose Use the gateways in your selected Customer VCN (Customizable) to use a service gateway and a NAT gateway in your customer network.

    When you choose this option:

    • You have complete control over the routing of network traffic to and from your cluster.
    • You must create and configure the gateways yourself. See Service Gateway.

      If you create your network by using one of the network creation wizards in the console, some gateways are created for you, but you may have to configure them to suit your needs. See Virtual Networking Quickstart.

    • You must create and configure security rules to restrict traffic through the gateways.
    • You can change the configuration any time.
    • If you map the private IP addresses of the cluster nodes to public IP addresses, a NAT gateway isn't needed. See Map a Private IP Address to a Public IP Address.