Using JupyterHub in Big Data Service 3.0.27 or Later

Use JupyterHub to manage Big Data Service 3.0.27 or later ODH 2.x notebooks for groups of users.

Prerequisites

Accessing JupyterHub

Access JupyterHub through the browser for Big Data Service 3.0.27 or later ODH 2.x clusters.
  1. Access Apache Ambari.
  2. From the side toolbar, under Services click JupyterHub.

Spawning Notebooks

The following Spawner configurations are supported on Big Data Service 3.0.27 and later ODH 2.x clusters.

Complete the following:

  1. Native Authentication:
    1. Sign in using signed in user credentials
    2. Enter username.
    3. Enter password.
  2. Using SamlSSOAuthenticator:
    1. Sign in with SSO sign in.
    2. Complete sign in with the configured SSO application.

Spawning Notebooks on an HA Cluster

For AD integrated cluster:

  1. Sign in using either of the preceding methods. The authorization works only if the user is present on the Linux host. JupyterHub searches for the user on the Linux host while trying to spawn the notebook server.
  2. You're redirected to a Server Options page where you must request a Kerberos ticket. This ticket can be requested using either Kerberos principal and the keytab file, or the Kerberos password. The cluster admin can provide the Kerberos principal and the keytab file, or the Kerberos password. The Kerberos ticket is needed to get access on the HDFS directories and other big data services that you want to use.

Spawning Notebooks on a non-HA Cluster

For AD integrated cluster:

Sign in using either of the preceding methods. The authorization works only if the user is present on the Linux host. JupyterHub searches for the user on the Linux host while trying to spawn the notebook server.

Manage JupyterHub

A JupyterHub admin user can perform the following tasks to manage notebooks in JupyterHub on Big Data Service 3.0.27 or later ODH 2.x nodes.

To manage Oracle Linux 7 services with the systemctl command, see Working With System Services.

To sign in an Oracle Cloud Infrastructure instance, see Connecting to Your Instance.

Manage Users and Permissions

Use one of the two authentication methods to authenticate users to JupyterHub so that they can create notebooks, and optionally administer JupyterHub on Big Data Service 3.0.27 or later ODH 2.x clusters.

JupyterHub users must be added as OS users on all Big Data Service cluster nodes for Non-Active Directory (AD) Big Data Service clusters, where users aren't automatically synced across all cluster nodes. Administrators can use the JupyterHub User Management script to add users and groups before signing in to JupyterHub.

Prerequisite

Complete the following before accessing JupyterHub:

  1. SSH sign in to the node where JupyterHub is installed.
  2. Navigate to /usr/odh/current/jupyterhub/install.
  3. To provide the details of all users and groups in the sample_user_groups.json file, run:
    sudo python3 UserGroupManager.py sample_user_groups.json
              
              Verify user creation by executing the following command:
              id <any-user-name>

Supported Authentication Types

  • NativeAuthenticator: This authenticator is used for small or medium-sized JupyterHub applications. Sign up and authentication are implemented as native to JupyterHub without relying on external services.
  • SSOAuthenticator: This authenticator provides a subclass of jupyterhub.auth.Authenticator that acts as an SAML2 Service Provider. Direct it to an appropriately configured SAML2 Identity Provider and it enables single sign-on for JupyterHub.

Mounting Oracle Object Storage Bucket Using rclone with User Principal Authentication

You can mount Oracle Object Storage using rclone with User Principal Authentication (API Keys) on a Big Data Service cluster node using rclone and fuse3, tailored for JupyterHub users.

Complete this procedure for Big Data Service 3.0.28 or later ODH 2.x clusters to enable seamless access and management of Object Storage directly from your JupyterHub environment, enhancing your data handling capabilities.
  1. Access Apache Ambari.
  2. From the side toolbar, under Services click JupyterHub.
  3. Click Summary, and then click JUPYTERHUB_SERVER.
  4. Obtain the host info from the host information displayed.
  5. Sign in to the Big Data Service host using the SSH credentials used while creating the cluster. For more information, see Connecting to a Cluster Node Using SSH.
  6. To verify the installation of rclone and fuse3 on the node, run:
    rclone version 
    # Ensure version is v1.66
    
    fusermount3 --version 
    # Ensure FUSE version 3 is installed
  7. Create API key and setup rclone configuration. For more information, see Set Up Authentication with an OCI User and API Key, Obtain the OCI Tenancy Namespace and Bucket Compartment.
  8. Set up the rclone configuration. For more information, see Configure Rclone for OCI Object Storage.
  9. To mount the Object Storage bucket, run the following command as a sign in user.

    The following runs the mount operation with signed in user `Jupyterhub`. The daemon process runs as a Linux process on the node where this operation is triggered.

    sudo -u jupyterhub rclone mount remote_name:bucket1 /home/jupyterhub/mount_dir --allow-non-empty --file-perms 0666 --dir-perms 0777 --vfs-cache-mode=full --dir-cache-time=30s --vfs-write-back=2m --cache-info-age=60m --daemon
    
    Note

    To work with Jupyter Notebooks ensure the mount location is inside the sign in user's home directory, and ensure the mount directory is empty.
    sudo -u jupyterhub ls -ltr /home/jupyterhub/mount_dir
  10. (Optional) To verify that mount is successful, run the following. This example lists the contents of mount_dir bucket.
    sudo -u jupyterhub ls -ltr /home/jupyterhub/mount_dir
    
  11. Run cleanup procedures.

    When running in background mode you must stop the mount manually. Use the following cleanup operations when JupyterHub and Notebook servers aren't in use.

    On Linux:
    sudo -u jupyterhub fusermount3 -u /home/jupyterhub/mount_dir
    The umount operation can fail, for example when the mountpoint is busy. When that happens, it's the user's responsibility to you must stop the mount manually.
    sudo -u jupyterhub umount -l /home/jupyterhub/mount_dir : lazy unmount
    sudo -u jupyterhub umount -f /home/jupyterhub/mount_dir : force unmount
    

Manage Conda Environments in JupyterHub

Note

You can manage Conda environments on Big Data Service 3.0.28 or later ODH 2.x clusters.
  • Create a conda environment with specific dependencies and create four kernels (Python/PySpark/Spark/SparkR) which point to the created conda environment.
  • Conda environments and kernels created using this operation are available to all notebook server users.
  • Separate create conda env operation is to decouple the operation with restart of service.

Create a Load Balancer and Backend Set

For more information creating backend sets, see Creating a Load Balancer Backend Set.

Launch Trino-SQL Kernels

JupyterHub PyTrino kernel provides an SQL interface that allows you to run Trino queries using JupySQL. This is available for Big Data Service 3.0.28 or later ODH 2.x clusters.