From the side toolbar, under Services click JupyterHub.
Spawning Notebooks π
The following Spawner configurations are supported on Big Data Service 3.0.27 and later ODH 2.x clusters.
Complete the following:
Native Authentication:
Sign in using signed in user credentials
Enter username.
Enter password.
Using SamlSSOAuthenticator:
Sign in with SSO sign in.
Complete sign in with the configured SSO application.
Spawning Notebooks on an HA Cluster
For AD integrated cluster:
Sign in using either of the preceding methods. The authorization works only if the user is present on the Linux host. JupyterHub searches for the user on the Linux host while trying to spawn the notebook server.
You're redirected to a Server Options page where you must request a Kerberos ticket. This ticket can be requested using either Kerberos principal and the keytab file, or the Kerberos password. The cluster admin can provide the Kerberos principal and the keytab file, or the Kerberos password. The Kerberos ticket is needed to get access on the HDFS directories and other big data services that you want to use.
Spawning Notebooks on a non-HA Cluster
For AD integrated cluster:
Sign in using either of the preceding methods. The authorization works only if the user is present on the Linux host. JupyterHub searches for the user on the Linux host while trying to spawn the notebook server.
Manage JupyterHub π
A JupyterHub admin user can perform the following tasks to manage notebooks in JupyterHub on Big Data Service 3.0.27 or later ODH 2.x nodes.
As an admin, you can stop or disable JupyterHub so it doesn't consume resources, such as memory. Restarting might also help with unexpected issues or behavior.
Note
Stop or start JupyterHub through Ambari for Big Data Service 3.0.27 or later clusters.
From the side toolbar, under Services click JupyterHub.
Click Actions, and then click Run Service Check.
Manage Users and Permissions π
Use one of the two authentication methods to authenticate users to JupyterHub so that they can create notebooks, and optionally administer JupyterHub on Big Data Service 3.0.27 or later ODH 2.x clusters.
JupyterHub users must be added as OS users on all Big Data Service cluster nodes for Non-Active Directory (AD) Big Data Service clusters, where users aren't automatically synced across all cluster nodes. Administrators can use the JupyterHub User Management script to add users and groups before signing in to JupyterHub.
Prerequisite
Complete the following before accessing JupyterHub:
SSH sign in to the node where JupyterHub is installed.
Navigate to /usr/odh/current/jupyterhub/install.
To provide the details of all users and groups in the sample_user_groups.json file, run:
Copy
sudo python3 UserGroupManager.py sample_user_groups.json
Verify user creation by executing the following command:
id <any-user-name>
Supported Authentication Types
NativeAuthenticator: This authenticator is used for small or medium-sized JupyterHub applications. Sign up and authentication are implemented as native to JupyterHub without relying on external services.
SSOAuthenticator: This authenticator provides a subclass of jupyterhub.auth.Authenticator that acts as an SAML2 Service Provider. Direct it to an appropriately configured SAML2 Identity Provider and it enables single sign-on for JupyterHub.
These prerequisites must be met to authorize a user in a Big Data Service HA cluster using native authentication.
The user must be existing in the Linux host. Run the following command to add a new
Linux user on all the nodes of a cluster.
# Add linux user
dcli -C "useradd -d /home/<username> -m -s /bin/bash <username>"
To start a notebook server, a user must provide the principal and the keytab file path/password and request a Kerberos ticket from the JupyterHub interface. To create a keytab, the cluster admin must add Kerberos principal with a password and with a keytab file. Run the following commands on the first master node (mn0) in the cluster.
# Create a kdc principal with password or give access to existing keytabs.
kadmin.local -q "addprinc <principalname>"
Password Prompt: Enter passwrod
# Create a kdc principal with keytab file or give access to existing keytabs.
kadmin.local -q 'ktadd -k /etc/security/keytabs/<principal>.keytab principal'
The new user must have correct Ranger permissions to store files in the HDFS directory hdfs:///users/<username> as the individual notebooks are stored in /users/<username>/notebooks. The cluster admin can add the required permission from the Ranger interface by opening the following URL in a web browser.
https://<un0-host-ip>:6182
The new user must have correct permissions on Yarn, Hive, and Object Storage to read and write data, and run Spark jobs. Alternatively, user can use Livy impersonation (run Big Data Service jobs as Livy user) without getting explicit permissions on Spark, Yarn, and other services.
Run the following command to give the new user access to the HDFS directory.
# Give access to hdfs directory
# kdc realm is by default BDSCLOUDSERVICE.ORACLE.COM
kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-<clustername>@<kdc_realm>
sudo su hdfs -c "hdfs dfs -mkdir /user/<username>"
sudo su hdfs -c "hdfs dfs -chown -R jupy5 /user/<username>"
These prerequisites must be met to authorize a user in a Big Data Service non-HA cluster using native authentication.
The user must be existing in the Linux host. Run the following command to add a new
Linux user on all the nodes of a cluster.
# Add linux user
dcli -C "useradd -d /home/<username> -m -s /bin/bash <username>"
The new user must have correct permissions to store files in the HDFS directory hdfs:///users/<username>. Run the following command to give the new user access to the HDFS directory.
# Give access to hdfs directory
sudo su hdfs -c "hdfs dfs -mkdir /user/<username>"
sudo su hdfs -c "hdfs dfs -chown -R jupy5 /user/<username>"
Admin users are responsible for configuring and managing JupyterHub. Admin users are also responsible for authorizing newly signed up users on JupyterHub.
Single logout URL, Logout response URL: https://<Jupyterhub-Host>:<Port>/hub/logout
Activate the application.
Assign users to the application.
Navigate to the created application and click Download Identity Provider Metadata, and then copy this metadata file to the JupyterHub host and ensure it has Read access for all users.
From the side toolbar, under Services click JupyterHub.
Click Config > Settings > Notebook Server Authenticator.
Select SamlSSOAuthenticator.
Click Save.
Click Advanced.
Update the parameters in the Advanced Jupiter-configSamlSSOAuthenticator-Configs section:
c.Saml2Authenticator.saml2_metadata_file name: The path to the Identity Provider (IDP) metadata file in the JupyterHub installed node. For example: '/tmp/IDCSMetadata.xml'.
c.Saml2Authenticator.saml2_entity_id: A unique identifier for maintaining the mapping from the Identity Provider (IDP) to the Service Provider (JupyterHub). This identifier must be the same in both the IDP application configurations and the Service Provider (JupyterHub). For example: https://myjupyterhub/saml2_auth/ent
c.Saml2Authenticator.saml2_login_URL: The Single Sign-On (SSO) sign in URL. For Oracle IDCS Users can obtain this from the IDP metadata.xml file. In metadata.xml file file search for AssertionConsumerService tag and get the value of location attribute. For OKTA, copy the value of sign in URL present on the sign in tab. For example: https://idcs-1234.identity.oraclecloud.com/fed/v1/sp/sso
#c.Saml2Authenticator.saml2_metadata_URL: Optional. The URL of the Identity Provider (IDP) metadata file. Be sure the provided URL is reachable from the JupyterHub installed node. Either saml2_metadata_filename or saml2_metadata_url is required. For example: https://idcs-1234.identity.oraclecloud.com/sso/saml/metadata
#c.Saml2Authenticator.saml2_attribute_username: Optional. Specify an attribute to be considered as the user from the SAML assertion. If no attribute is specified, the sign-in username is treated as the user. Enter 'Email'.
#c.Saml2Authenticator.saml2_private_file_path and #c.Saml2Authenticator.saml2_public_file_path: Optional. If the Identity Provider (IDP) encrypts assertion data, the Service Provider (SP) JupyterHub, must provide the necessary private and public keys to decrypt the assertion data. For example:
From the side toolbar, under Services click JupyterHub.
Click Configs. The following configs are supported:
Spawner Configuration:
ODHSystemdSpawner: A custom spawner used to spawn single-user notebook servers using systemd on the local node where JupyterHub Server is installed.
ODHYarnSpawner: A custom Spawner for JupyterHub that launches notebook servers on YARN clusters. This is the default spawner used by Big Data Service.
Common Configuration: These are configurations such as Binding IP and port where JupyterHub would run.
Authenticator Configuration: We support two Authenticators that can be used for authenticating users signing in to JupyterHub. For more information on the authentication types, see Manage Users and Permissions.
Persistence mode:
HDFS: This allows you to persist notebooks over HDFS
Git: This allows you to use a JupyterLab extension for version control using Git, this allows persistence of notebooks on remote servers through Git.
Spawning Notebooks π
The following Spawner configurations are supported on Big Data Service 3.0.27 and later ODH 2.x clusters.
Complete the following:
Native Authentication:
Sign in using signed in user credentials
Enter username.
Enter password.
Using SamlSSOAuthenticator:
Sign in with SSO sign in.
Complete sign in with the configured SSO application.
Spawning Notebooks on an HA Cluster
For AD integrated cluster:
Sign in using either of the preceding methods. The authorization works only if the user is present on the Linux host. JupyterHub searches for the user on the Linux host while trying to spawn the notebook server.
You're redirected to a Server Options page where you must request a Kerberos ticket. This ticket can be requested using either Kerberos principal and the keytab file, or the Kerberos password. The cluster admin can provide the Kerberos principal and the keytab file, or the Kerberos password. The Kerberos ticket is needed to get access on the HDFS directories and other big data services that you want to use.
Spawning Notebooks on a non-HA Cluster
For AD integrated cluster:
Sign in using either of the preceding methods. The authorization works only if the user is present on the Linux host. JupyterHub searches for the user on the Linux host while trying to spawn the notebook server.
As an admin user, you can store the individual user notebooks in Object Storage instead of HDFS. When you change the content manager from HDFS to Object Storage, the existing notebooks aren't copied over to Object Storage. The new notebooks are saved in Object Storage.
Mounting Oracle Object Storage Bucket Using rclone with User Principal Authentication π
You can mount Oracle Object Storage using rclone with User Principal Authentication (API Keys) on a Big Data Service cluster node using rclone and fuse3, tailored for JupyterHub users.
Complete this procedure for Big Data Service 3.0.28 or later ODH 2.x clusters to enable seamless access and management of Object Storage directly from your JupyterHub environment, enhancing your data handling capabilities.
From the side toolbar, under Services click JupyterHub.
Click Summary, and then click JUPYTERHUB_SERVER.
Obtain the host info from the host information displayed.
Sign in to the Big Data Service host using the SSH credentials used while creating the cluster. For more information, see Connecting to a Cluster Node Using SSH.
To verify the installation of rclone and fuse3 on the node, run:
Copy
rclone version
# Ensure version is v1.66
fusermount3 --version
# Ensure FUSE version 3 is installed
To mount the Object Storage bucket, run the following command as a sign in user.
The following runs the mount operation with signed in user `Jupyterhub`. The daemon process runs as a Linux process on the node where this operation is triggered.
To work with Jupyter Notebooks ensure the mount location is inside the sign in user's home directory, and ensure the mount directory is empty.
sudo -u jupyterhub ls -ltr /home/jupyterhub/mount_dir
(Optional)
To verify that mount is successful, run the following. This example lists the contents of mount_dir bucket.
Copy
sudo -u jupyterhub ls -ltr /home/jupyterhub/mount_dir
Run cleanup procedures.
When running in background mode you must stop the mount manually. Use the following cleanup operations when JupyterHub and Notebook servers aren't in use.
The umount operation can fail, for example when the mountpoint is busy. When that happens, it's the user's responsibility to you must stop the mount manually.
You can manage Conda environments on Big Data Service 3.0.28 or later ODH 2.x clusters.
Create a conda environment with specific dependencies and create four kernels (Python/PySpark/Spark/SparkR) which point to the created conda environment.
Conda environments and kernels created using this operation are available to all notebook server users.
Separate create conda env operation is to decouple the operation with restart of service.
From the side toolbar, under Services click JupyterHub.
Click Configs, and then click Advanced.
Scroll to the jupyterhub-conda-env-configs section.
Update the following fields:
Conda Additional Configurations: This field is used to provide additional parameters to be appended to the default conda creation command. The default conda creation command is 'conda create -y -p conda_env_full_path -c conda-forge pip python=3.8.
If the additional configurations are given as '--override-channels --no-default-packages --no-pin -c pytorch', then, the final conda creation command run is 'conda create -y -p conda_env_full_path -c conda-forge pip python=3.8 --override-channels --no-default-packages --no-pin -c pytorch'.
Conda Environment Name: This field is to provide a unique name for the conda environment. Provide a unique conda environment each time a new env is created.
Python Dependencies: This field lists all python/R/Ruby/Lua/Scala/Java/JavaScript/C/C++/FORTRAN and so on dependencies accessible to your conda channels in the format of requirements.txt file.
This operation creates a conda environment with specified dependencies and creates the specified kernel (Python/PySpark/Spark/SparkR) pointing to the created conda environment.
If the specified conda env already exists, then the operation proceeds to the kernel creation step directly
Conda environments or kernels created using this operation are available only to a specific user
Manually run the python script kernel_install_script.py in sudo mode:
Dependencies provided in the standard requirements .txt format.
Manually delete conda envs or kernels for any user.
Available Configs for Customization
--user (mandatory): OS and JupyterHub user for whom kernel and conda env is created.
--conda_env_name (mandatory): Provide a unique name for the conda environment each time a new en is created for --user.
--kernel_name: (mandatory) Provide a unique kernel name.
--kernel_type: (mandatory) Must be one of the following (python / PysPark / Spark / SparkR)
--custom_requirements_txt_file_path: (optional) If any Python/R/Ruby/Lua/Scala/Java/JavaScript/C/C++/FORTRAN and so on., dependencies are installed using conda channels, you must specify those libraries in a requirements .txt file and provide the full path.
This field provides additional parameters to be appended to the default conda creation command.
The default conda creation command is: 'conda create -y -p conda_env_full_path -c conda-forge pip python=3.8'.
If --conda_additional_configs is given as '--override-channels --no-default-packages --no-pin -c pytorch', then, the final conda creation command run is 'conda create -y -p conda_env_full_path -c conda-forge pip python=3.8 --override-channels --no-default-packages --no-pin -c pytorch'.
Setting Up User-Specific Conda Environment π
Verify that JupyterHub is installed through the Ambari UI.
SSH into the cluster, and then navigate to /var/lib/ambari-server/resources/mpacks/odh-ambari-mpack-2.0.8/stacks/ODH/1.1.12/services/JUPYTER/package/scripts/.
This sample script execution with the given parameters creates a conda env conda_jupy_env_1 for the user bds, installs custom dependencies for conda_jupy_env_1, and creates a spark kernel with name spark_bds_1. After successful completion of this operation, spark_bds_1 kernel is displayed in JupyterHub UI of the bds user only.
For more information on creating a public Load Balancer, see Creating a Load Balancer, and complete the following details.
Open the navigation menu, click Networking, and then click Load balancers. Click Load balancer. The Load balancers page appears.
Under List scope, select the Compartment where the cluster is located.
In the Load balancer name field enter a name to identify the Load Balancer. For example, JupyterHub-LoadBalancer.
In the Choose Visibility type section, select Public.
In the Assign a public IP address section, select Reserved IP address.
Select Create new reserved IP address.
In the Public IP name field, enter a name. For example, jupyterhub-ip
In the Create in compartment, select the compartment where the cluster is located.
In the Choose networking section, complete the following:
In the Virtual cloud network <Compartment> section, select the VCN used by the cluster.
In the Subnet in <Compartment> field, select the subnet used by the cluster.
Click Next. The Choose backends page appears.
In the Specify a load balancing policy, select IP hash.
Note
Don't add Backends at this point.
In the Specify health check policy section, complete the following:
In the Port field, enter 8000.
In the URL Path (URI), enter //hub/api.
Select Use SSL.
In the Certificate resource section, complete the following:
Select Load balancer managed certificate from the dropdown.
Select Paste SSL certificate.
In the SSL certificate field, copy and paste a certificate directly into this field.
Select Paste CA certificate.
In the CA certificate field, enter the Oracle certificate by using /etc/security/serverKeys/bdsOracleCA.crt which is present in the cluster. For public certificate authorities(CAs), this certificate can be obtained directly from their site.
(Optional)
Select Specify private key.
Select Paste private key.
In the Private key field, paste a private key directly into this field.
Click Show advanced options to access more options.
Click the Backend set tab, and then enter the Backend set name. For example, JupyterHub-Backends.
Click Session persistence, and then select Enable load balancer cookie persistence. Cookies are auto generated.
Click Next. The Configure listener page appears. Complete the following:
In the Listener name field, enter a name for the listener. For example: JupyterHub-Listener.
Select HTTPS for the Specify the type of traffic your listener handles.
In the Specify the port your listener monitors for ingress traffic field, enter 8000.
Select Paste SSL certificate.
In the SSL certificate field, copy and paste a certificate directly into this field.
Select Load balancer managed certificate from the dropdown.
Select Paste CA certificate.
In the CA certificate field, enter CA certificate of the cluster.
Select Specify private key.
Select Paste private key.
In the Private key field, paste a private key directly into this field.
After add node operation, cluster admin must manually update Load Balancer host entry in the newly added nodes. Applicable to all the node additions to cluster. For example, worker node, compute only, and nodes.
Certificate must be manually updated to Load Balancer in case of expiry. This step ensures Load Balancer isn't using stale certificates and avoids health check/communication failures to backend sets. For more information, see Updating an Expiring Load Balancer Certificate to update expired certificate.
Launch Trino-SQL Kernels π
JupyterHub PyTrino kernel provides an SQL interface that allows you to run Trino queries using JupySQL. This is available for Big Data Service 3.0.28 or later ODH 2.x clusters.
# Using the %%sql magic command to execute a multi-line SQL query with the limit variable
top_threshold = 3
%%sql
SELECT custkey, name, acctbal
FROM tpch.sf1.customer
WHERE acctbal > 1000
ORDER BY acctbal DESC limit {{top_threshold}}
Trino session parameters can be configured from the JupyterHub Ambari UI. These session parameters are applied to all user sessions. For more information on session parameters, see Properties reference#.
SqlMagic configurations provide you with flexible control over the behavior and appearance of SQL operations ran in Jupyter notebooks. These parameters can be configured from the JupyterHub Ambari UI and applied to all user sessions.