Working with Notebooks

Oracle Big Data Service uses the Big Data Studio notebook application as its notebook interface and coding environment.

Creating a Notebook

To create a notebook:
  1. Access the notebook application. See Accessing Big Data Studio.
  2. Click the Notebooks icon on the left.
  3. Click Create or Create Notebook and specify the following:
    • Name: Name of the notebook.

    • Description: Description of the notebook.

    • Tags: Keywords for the notebook. These keywords act as search tags when searching for the notebook. You can also add tags later by modifying the notebook.

    • Type: Select the notebook type. Choices are:

      • Default: Applies the look and feel of Zeppelin, where paragraphs can be resized and positioned next to each other.
      • Jupyter: Applies the look and feel of Jupyter, where paragraphs are displayed in a single column.
  4. Click Create.

You can perform many actions in a notebook. For information about some of them, see Overview of the Notebooks User Interface.

After creating a notebook, you can create paragraphs in the notebook. See Creating Paragraphs.

Overview of the Notebooks User Interface

Below are some of the most common screen elements you'll see when working within a notebook. Mouse over icons in the user interface to see their functions.

Icon Description

Modify Notebook

Modifies details of a notebook, such as name, description, and tags.

Run Paragraphs

Executes all paragraphs in a notebook in sequential order.

Invalidate Session

Resets any connection or code executed in a notebook.

Delete Notebook

Deletes a notebook.

Hide/Show Code

Hides or shows the code section in all paragraphs in a notebook.

Hide/Show Result

Hides or shows the results section in all paragraphs in a notebook.


Sets the notebook to read-only or write mode.

Hide/Show Panel

Shows or hides the paragraph settings bar commands, results toolbar, and settings dialog for a selected paragraph in a panel to the right of the notebook.


Creates a version of a notebook or displays versions.

Clear Result

Clears results for all paragraphs in a notebook.

Clear Paragraph Dependencies

Removes all defined paragraph dependencies.

Open as Iframe

Opens a notebook in an Iframe. This allows a notebook to be embedded inside another web page.

Share Notebook

Shares a notebook.

Clone Notebook

Creates a copy of a notebook.

Export Notebook

Exports a notebook to your computer as a .dnsb file.


Sets the preferred layout of a notebook (Zeppelin or Jupyter layout format).

Default Template/another template

Applies the overall look and feel of the notebook using the default template or another template.

Importing a Notebook

You can import notebooks from your local computer into the Big Data Studio notebook application. This includes Jupyter notebooks.

To import a notebook:
  1. Access the notebook application. See Accessing Big Data Studio.
  2. On the Notebooks page, click the Import Notebooks icon in the upper right.
  3. Drag and drop or browse for the notebook you want to import. Valid file extensions are .json, .dsnb, and .ipynb.
  4. Click Import.

Cloning a Notebook

When you clone a notebook you make a copy of it.

To clone a notebook:
  1. Access the notebook application. See Accessing Big Data Studio.
  2. From the Notebooks page, open the notebook you want to clone.
  3. Click the Clone Notebook icon and create the copy.

    Unless the name is changed, the cloned notebook is named Copy of current notebook name.

  4. If the copy is read-only and you want to run or change it, click the lock icon in the upper right of the open notebook to unlock it.

Supported Interpreters

The Big Data Studio notebook application provides interpreters that execute code in different languages. The interpreters listed below are supported.

Each interpreter has a set of properties that can be customized by an administrator and applied across all notebooks. See Configuring Interpreters.

Start a paragraph with %interpreter as shown below and then code your paragraph with the corresponding syntax.


The Spark and PySpark interpreters support YARN client mode. Running the interpreters in YARN client mode means that a Spark driver is started by Big Data Studio and Spark application commands are sent to YARN node managers to be run. This way YARN manages memory instead of the cluster node.
Type Interpreter Usage



Specify the language type as shown below and then use the syntax for that language.






Database settings must be configured correctly. Administrators should go to Interpreters > jdbc > jdbc(default) in the Big Data Studio web UI and verify or configure the following settings:

  • default.url: database connection string

  • default user: user with permission to connect to the database

  • default password: password for the database


(Cloud SQL)



To use this interpreter, Cloud SQL must be added to the cluster. See Adding Cloud SQL . Cloud SQL enables you to use SQL to query your big data sources. See Using Cloud SQL with Big Data.

Administrators must also change the default password used with the interpreter:

  1. Connect to the Query Server node as root user and run the following:
    # su - oracle
    sqlplus / as sysdba
    sql> alter session set container=bdsqlusr;
    sql> alter user datastudio_user identified by "new_password";
    This sets the new password. If you want to create a superuser (optional), use this:
    # su - oracle
    sqlplus / as sysdba
    sql> alter session set container=bdsqlusr;
    sql> create user new_user_name identified by "new_password";
    sql> GRANT ALL PRIVILEGES TO new_user_name;
  2. Go to Interpreters > jdbc > jdbc.cloudsql in the Big Data Studio web UI, enter the new password in the default.password field, and click Update. If you created a superuser, you need to update the default.user field. If Cloud SQL hasn't been added to the cluster, you won't see the jdbc.cloudsql tab.





On unsecured clusters, no additional configuration is needed.

On secured clusters, the datastudio user needs to be added to the group that has permissions to write in the database. For more information, see Granting Access to Hive Tables.

md (markdown)









Cluster administrators can install other Python packages on the cluster host. Use SSH to connect to the cluster, and then use pip to install Python packages.

When packages are installed and the new libraries are located in one of the following locations, the libraries become available to the module search path of the Python interpreter in the notebook.



%r and then R syntax

Cluster administrators can install other R packages on the cluster host. Use SSH to connect to the cluster, and then use R CMD or R install.packages to install R packages:

R CMD [options] package_name.tar.gz
R install.packages(url_for_packages, NULL)

When packages are installed the new libraries are located in /usr/lib64/R/. Libraries in this directory are added to the R interpreter. Libraries in other directories are not available to the notebook application.




Creating Paragraphs

Paragraphs are used in notebooks to interactively explore data. Paragraphs contain executable code that returns results in many different formats.

You can use different interpreters in the same notebook. Interpreter settings can be configured by an administrator. See Configuring Interpreters.

To create a paragraph:

  1. Access the notebook application. See Accessing Big Data Studio.
  2. From the Notebooks page, open the notebook for which you want to add paragraphs (or create a new notebook).
  3. Hover your mouse above or below a paragraph to display the interpreter toolbar:

    The Interpreter toolbar displayed as a navigation bar with 12 circles. Each circle has an icon on it. For example, a plus sign. The rest of the images are not clear.

  4. Click an icon in the toolbar to select an interpreter and add a new paragraph that uses that interpreter. When you click some of the icons, the paragraph that's created includes example code that you can run.

    Icons are not provided for all interpreters. For interpreters for which there aren't icons, click the plus sign (+) and start the paragraph with %interpreter. For example, for Spark, click + in the toolbar and start the paragraph with %spark. For a list of supported interpreters, see Supported Interpreters.

  5. Code the paragraph as desired. Changes are saved automatically.

For information about some of the actions you can perform in paragraphs, see Overview of Paragraphs User Interface.

Overview of Paragraphs User Interface

Below are some of the most common screen elements you'll see when working with paragraphs. Mouse over icons in the user interface to see their functions.

Icon Description

Execute Paragraph

Executes the code or query in a paragraph.

Enter Dependency Mode

Adds or removes dependent paragraphs.


Adds comments to a paragraph.


Expands a paragraph and shows the paragraph in full-screen mode, or collapses that view.

Show/Hide Line Numbers

Shows or hides line numbers in the code in a paragraph (applies only to the code section).


Manages the visibility settings in a paragraph. This controls how a paragraph can be viewed by the author and other users who have access to the notebook.


Provides a number of actions. Use this to:
  • Move a paragraph up or down

  • Clear the paragraph result

  • Open a paragraph as an Iframe

  • Clone a paragraph

  • Delete a paragraph

Accesing HDFS from Spark and PySpark

To access HDFS in a notebook and read and write to HDFS, you need to grant access to your folders and files to the user that the Big Data Studio notebook application will access HDFS as.

When Big Data Studio accesses HDFS (and other Hadoop cluster services), these users are used:

  • interpreteruser is the user and group used with unsecured clusters.

  • datastudio is the user and group used with secured clusters (Kerberos-enabled).

Note that these users are not the users that are used to access the Big Data Studio web UI.

On a secured cluster you need to obtain a Kerberos ticket for your user before running any hdfs commands. For example, to run the command kinit oracle for the user oracle.

Granting Access to HDFS Folders

You can grant read and write access to HDFS folders. You can also remove that access.

To grant access to HDFS folders:
  1. Create an HDFS directory to which you want to provide access if you don't already have one. In this example, the directory is /user/oracle/test.
    hdfs dfs -mkdir /user/oracle/test
  2. Provide access through HDFS ACLs.
    1. Check the current ACL access. For example:
      hdfs dfs -getfacl /user/oracle/test

      You'll see something like this:

      # file: /user/oracle/test
      # owner: oracle
      # group: oracle
    2. Make the files and subdirectories within the directory readable by the group:
      • For unsecured clusters, set permissions to the group interpreteruser. For example:

        hdfs dfs -setfacl -m group:interpreteruser:rwx /user/oracle/test
      • For secured clusters, set permissions to the group datastudio. For example:

        hdfs dfs -setfacl -m group:datastudio:rwx /user/oracle/test
    3. Set the default ACL setting for the parent directory:
      • For unsecured clusters, set default permissions to the group interpreteruser. For example:

        hdfs dfs -setfacl -m default:group:interpreteruser:rwx /user/oracle/test
      • For secured clusters, set default permissions to the group datastudio. For example:

        hdfs dfs -setfacl -m default:group:datastudio:rwx /user/oracle/test
    4. Add the correct permissions for the user. You can also remove access that has already been granted. For example:
      • Set read permission

        • Unsecured clusters:

          hdfs dfs -setfacl -m user:interpreteruser:r-x /user/oracle/test
        • Secured clusters:

          hdfs dfs -setfacl -m user:datastudio:r-x /user/oracle/test
      • Set write permission

        • Unsecured clusters:

          hdfs dfs -setfacl -m user:interpreteruser:rwx /user/oracle/test
        • Secured clusters:

          hdfs dfs -setfacl -m user:datastudio:rwx /user/oracle/test
      • Remove access

        • Unsecured clusters:

          hdfs dfs -setfacl -m user:interpreteruser:--- /user/oracle/test
        • Secured clusters:

          hdfs dfs -setfacl -m user:datastudio:--- /user/oracle/test
  3. Check the currect ACL access. For example:
    hdfs dfs -getfacl /user/oracle/test
    # file: /user/oracle/test
    # owner: oracle
    # group: oracle

    You'll see something like the following. In this example, permissions are shown for interpreteruser.

  4. Test the configuration by running the following script in a notebook. This example uses Spark. You should now be able to read and write in HDFS.
        import scala.sys.process.Process
        import org.apache.spark.SparkConf
        import org.apache.hadoop.fs.{FileSystem, Path}
        //Get the Name Node Service
        val df_name_xml = Process(Seq("bash", "-c", "grep -A1  -- dfs.nameservices  /etc/hadoop/conf/hdfs-site.xml  | grep -v dfs ") ).lineStream.mkString
        var df_name = df_name_xml.trim.replaceAll("<value>|</value>", "")
        println (df_name)
        val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
        var namenode=df_name //NameNode Nameservice
        var testfile="test.csv"
        var dir = "/user/oracle/test"
        val rdd = sc.parallelize(List(
            (0, 60),
            (0, 56),
            (0, 54),
            (0, 62),
            (0, 61),
            (0, 53),
            (0, 55),
            (0, 62),
            (0, 64),
            (1, 73),
            (1, 78),
            (1, 67),
            (1, 68),
            (1, 78)
        var input = sc.textFile(s"hdfs://$namenode:8020$dir/$testfile")
        fs.delete(new Path(s"$dir/$testfile"))
Granting Access to HDFS Files

You can grant read access to specific HDFS files. You can also remove that access.

To grant access to HDFS files:
  1. If you don't already have an HDFS file to which you want to provide access, you can create one. This example uses /user/oracle/test.txt.
    $ ls -l / > /tmp/test.txt
    $ hdfs dfs -put /tmp/test.txt  /user/oracle
    $ hdfs dfs -chmod o-r   /user/oracle/test.txt
  2. Set the correct permissions, or remove access. For example:
    • Set read permission

      • Unsecured clusters:

        hdfs dfs -setfacl -m user:interpreteruser:r--  /user/oracle/test.txt
      • Secured clusters:

        hdfs dfs -setfacl -m user:datastudio:r--  /user/oracle/test.txt
    • Remove access

      • Unsecured clusters:

        hdfs dfs -setfacl -m user:interpreteruser:--- /user/oracle/test.txt
      • Secured clusters:

        hdfs dfs -setfacl -m user:datastudio:--- /user/oracle/test.txt
  3. Test the configuration by running the following script in a notebook. This example uses Spark.
    import scala.sys.process.Process
    import org.apache.spark.SparkConf
    //Get the Name Node Service
    val df_name_xml = Process(Seq("bash", "-c", "grep -EA1 -- 'dfs.namenode.servicerpc-address<|dfs.nameservices' /etc/hadoop/conf/hdfs-site.xml | grep -v dfs ") ).lineStream.mkString
    var df_name = df_name_xml.trim.replaceAll("<value>|</value>|:8022", "")
    println (df_name)
    var namenode=df_name //NameNode Nameservice
    var testfile="test.txt"
    var dir = "/user/oracle"
    var x = sc.textFile(s"hdfs://$namenode:8020$dir/$testfile")
PySpark Examples for HDFS

The following examples show how to use PySpark with HDFS. You need access to HDFS as described in the previous topics. Make sure you're using the %pyspark interpreter.

Get a Spark Session

You need a Spark session to work with HDFS.

from pyspark.sql import SparkSession
sparkSession = SparkSession.builder.appName("SparkSessionName").getOrCreate()

Read in HDFS

To read using SparkSession:

df_load ='/hdfs/full/path/to/folder')

Write to HDFS

Use a DataFrame to write to HDFS. To write using SparkSession:

data = [('One', 1), ('Two', 2), ('Three', 3)]
# Create DataFrame
df = sparkSession.createDataFrame(data)
# Write into HDFS 
# /hdfs/full/path/to/folder is created when dfs.write is executed
df.write.mode options specify the behavior of the save operation when data already exists:
  • append: Appends contents of this DataFrame to existing data.

  • overwrite: Overwrites existing data.

  • ignore: Ignores this operation if data already exists.

  • error or errorifexists: Throws an exception if data already exists.

Example 1: Write to a Directory

This example writes to a /tmp directory to which all HDFS users have access.

from pyspark.sql import SparkSession
sparkSession = SparkSession.builder.appName("pyspark-write").getOrCreate()
# Create data
data = [('One', 1), ('Two', 2), ('Three', 3), ('Four', 4), ('Five', 5)]
df = sparkSession.createDataFrame(data)
# Write into HDFS 

Example 2: Read the Information

This example reads the information in the /tmp/testpysparkcsv directory created in the previous example.

from pyspark.sql import SparkSession
sparkSession = SparkSession.builder.appName("pyspark-read").getOrCreate()
# Read from HDFS
df_load ='/tmp/testpysparkcsv')

Accessing Hive from Spark and PySpark

To access Hive in a notebook and read and write to Hive, you need to grant access to your folders and files to the user that the Big Data Studio notebook application will access Hive as.

When Big Data Studio accesses Hive, these users are used:

  • interpreteruser is the user and group used with unsecured clusters.

  • datastudio is the user and group used with secured clusters (Kerberos-enabled).

Note that these users are not the users that are used to access the Big Data Studio web UI.

Granting Access to Hive Tables

You can enable Big Data Studio to access all Hive tables by adding the datastudio user to the hive admin group. The user performing these steps must be logged into a node of the cluster as the root user.


In secured CDH clusters, the datastudio user and group can be granted more fine-grained access to specific tables by creating and using Sentry roles. For more information, see Hive SQL Syntax for Use with Sentry in the Cloudera documentation.

To grant access to Hive tables:

  1. Add the datastudio user to a group with access to Hive. In the following examples, the group is hive.

    Unsecured clusters:

    dcli -C usermod -a -G hive interpreteruser

    Secured clusters:

    dcli -C usermod -a -G hive datastudio
  2. Verify that the user is part of a group with access to Hive.

    Unsecured cluster:

    dcli -C id interpreteruser

    Secured cluster:

    dcli -C id datastudio
  3. Test the configuration by running the following script in a notebook. This example uses Spark. You should now be able to read and write in Hive databases.
    import org.apache.spark.sql.SparkSession
    val conf = sc.getConf
    val customSql = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
    customSql.sql(s"use default").show()
    customSql.sql(s"show tables").show()
    customSql.sql("CREATE TABLE IF NOT EXISTS testds ( eid int, name String,  salary String, destination String)").show()
    customSql.sql(s"describe  testds").show()
Removing Access to Hive

To remove access to Hive, remove the user from the group that has access. In this example, the group is hive.

Unsecured clusters:

dcli -C gpasswd -d interpreteruser hive

Secured clusters:

dcli -C gpasswd -d datastudio hive
PySpark Examples for Hive

The following examples show how to use PySpark with Hive. You need access to Hive as described in the previous topics. Make sure you're using the %pyspark interpreter.

Use spark.sql to Work with Hive

You use spark.sql to work with Hive. spark.sql function call:

# Run the SQL instruction in quiet mode
spark.sql("SQL instruction")
# Show some output
spark.sql("SQL instruction").show()

Example of SQL instructions that do not show information:

spark.sql ("DROP TABLE IF EXISTS hive_table")
spark.sql("CREATE TABLE IF NOT EXISTS hive_table (number int, Ordinal_Number string, Cardinal_Number string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ")
spark.sql("load data inpath '/tmp/pysparktestfile.csv' into table pyspark_numbers_from_file")
spark.sql("insert into table pyspark_numbers_from_file2 select * from pyspark_numbers_from_file")
spark.sql("CREATE TABLE IF NOT EXISTS pyspark_numbers (number int, Ordinal_Number string, Cardinal_Number string) USING com.databricks.spark.csv OPTIONS \
(path \"/full/path/hdfs/dataframe/folder\", header \"false\")")

Example of SQL instructions that show information:

spark.sql("show databases").show()
spark.sql("show tables").show()
spark.sql("describe hive_table ").show()
spark.sql("select * hive_Table limit 5").show()

Example 1: Create a Table

spark.sql("CREATE TABLE IF NOT EXISTS pyspark_test_table ( eid int, name String,  salary String, destination String)")
spark.sql("describe pyspark_test_table ").show()

Example 2: Remove a Table

spark.sql ("DROP TABLE IF EXISTS pyspark_test_table")

Example 3: Create a File with 9000 Rows

from pyspark.sql import SparkSession
import re

single_digit = ["","one", "two", "three" , "four" , "five" , "six" , "seven" , "eight" , "nine"] 
ten_to_nineteen  =["ten","eleven","twelve","thirteen" ,"fourteen","fifteen","sixteen","seventeen","eighteen","nineteen"]
two_digits  =["","","twenty","thirty","fourty","fithy","sixty","seventy","eighty","ninety"]
ten_power = ["hundred","thousand"]
data = []

def cardinalNum(number):
    if len (str(number)) == 1 :
        return single_digit[number]
    elif len (str(number)) == 2 :
        if str(number)[0] == "1" :
            return ten_to_nineteen[int(str(number)[1])]
            return two_digits[int(str(number)[0])] + " " + single_digit[int(str(number)[1])]
    elif len (str(number)) == 3 :
        return   single_digit[int(str(number)[0])] + " " + ten_power[0] + " " +  cardinalNum(int(str(number)[1:3]))
    elif len (str(number)) == 4 :
        return cardinalNum(int(str(number)[0:1])) + " " + ten_power[1] + " " + cardinalNum(int(str(number)[1:4]))
        return str(number)

def ordinalNum(number):
    if re.match(".*1[0-9]$", str(number)):
        return(str(number) + 'th')
    elif re.match(".*[04-9]$", str(number)):
        return(str(number) + 'th')
    elif re.match(".*1$", str(number)):
        return(str(number) + 'st')
    elif re.match(".*2$", str(number)):
        return(str(number) + 'nd')
    elif re.match(".*3$", str(number)):
        return(str(number) + 'rd')

sparkSession = SparkSession.builder.appName("pyspark-write").getOrCreate()

# Create data
for number in range(1, 9001):
    tmpdata=[ (number) ,( ordinalNum(number)) ,(cardinalNum(number))]
    #print ( str(number) + ',' + ordinalNum(number) + ',' + cardinalNum(number))

df = sparkSession.createDataFrame(data)

# Write into HDFS

Example 4: Load from the DataFrame Folder

spark.sql("CREATE TABLE IF NOT EXISTS pyspark_numbers (number int, Ordinal_Number string, Cardinal_Number string) USING com.databricks.spark.csv OPTIONS \
(path \"/tmp/testpyspark\", header \"false\")")
spark.sql ("describe pyspark_numbers").show()
spark.sql ("select * from pyspark_numbers").show()
spark.sql("select count(*) from pyspark_numbers").show()

Example 5: Load from a .csv File

spark.sql("CREATE TABLE IF NOT EXISTS pyspark_numbers_from_file (number int, Ordinal_Number string, Cardinal_Number string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ")
spark.sql("load data inpath '/tmp/pysparktestfile.csv' into table pyspark_numbers_from_file")

To create the pysparktestfile.csv file, copy the file generated in Example 3 above, for example:

# Command listing files
hdfs dfs  -ls /tmp/testpyspark/

# Output listing files
Found 2 items
-rw-r--r--   3 datastudio supergroup          0 2020-08-21 16:50 /tmp/testpyspark/_SUCCESS
 3 datastudio supergroup     416250 2020-08-21 16:50 

# Command copying file
hdfs dfs -cp -p /tmp/testpyspark/part-00000-5bf7c92a-e2ad-4f79-802e-c84c0c3b4cc0-c000.csv /tmp/pysparktestfile.csv

Example 6: Insert from One Table to Another

spark.sql("CREATE TABLE IF NOT EXISTS pyspark_numbers_new (number int, Ordinal_Number string, Cardinal_Number string) ")
spark.sql("insert into table pyspark_numbers_new  select * from pyspark_numbers_from_file")
spark.sql("select * from  pyspark_numbers_new").show()
spark.sql("select count(*) from pyspark_numbers_new").show()
spark.sql("select count(*) from pyspark_numbers_from_file").show()

Creating Versions of a Notebook

You can create a version of a notebook, which serves as a snapshot of the notebook when the version was created.

To create a version of a notebook:
  1. Access the notebook application. See Accessing Big Data Studio.
  2. From the Notebooks page, open the notebook you want to version.
  3. Click Versioning in the upper left.
  4. Select Create Version to create a new verision of the notebook, or View Version History to view and access versions that have already been created.

Exporting a Notebook

You can export notebooks from the Big Data Studio notebook application to your local computer. This includes Jupyter notebooks.

To export a notebook:
  1. Access the notebook application. See Accessing Big Data Studio.
  2. On the Notebooks page, click the Select Notebooks icon.
  3. Select the notebook you want to export.
  4. Click the Export Notebooks icon in the upper right.
  5. Change settings if desired, and then click Export. The file will be exported as a .dsnb file.
  6. Choose from export options that display, and then click OK to proceed with the export.

Deleting a Notebook

You can delete notebooks for which you have delete permissions.

To delete a notebook:
  1. Access the notebook application. See Accessing Big Data Studio.
  2. On the Notebooks page, click the Select Notebooks icon.
  3. Select the notebook you want to delete.
  4. Click the Delete Notebooks icon in the upper right, and then click Delete to confirm the action.

You can also delete a notebook by opening the notebook and clicking the Delete Notebook icon at the top.

Keyboard Shortcuts for Notebooks

You can use keyboard shortcuts to perform certain actions, such as select paragraphs and clone and export notebooks.

To see a list of available shortcuts, open a notebook, click your user name in the top right, and select Keyboard Shortcuts from the menu. You can open an overview of all shortcuts, or search for shortcuts. If you're on a page that doesn't have any shortcuts, you won't see the Keyboard Shortcuts option.

You can also search for shortcuts by pressing ctrl+shift+f.