Working with Notebooks

Oracle Big Data Service uses the Big Data Studio notebook application as its notebook interface and coding environment.

Creating a Notebook

To create a notebook:

Access the notebook application. See Accessing Big Data Studio.
Click the Notebooks icon on the left.
Click Create or Create Notebook and specify the following:
- Name: Name of the notebook.
- Description: Description of the notebook.
- Tags: Keywords for the notebook. These keywords act as search tags when searching for the notebook. You can also add tags later by modifying the notebook.
- Type: Select the notebook type. Choices are:
  - Default: Applies the look and feel of Zeppelin, where paragraphs can be resized and positioned next to each other.
  - Jupyter: Applies the look and feel of Jupyter, where paragraphs are displayed in a single column.
Click Create.

You can perform many actions in a notebook. For information about some of them, see Overview of the Notebooks User Interface.

After creating a notebook, you can create paragraphs in the notebook. See Creating Paragraphs.

Overview of the Notebooks User Interface

Below are some of the most common screen elements you'll see when working within a notebook. Mouse over icons in the user interface to see their functions.


Icon	Description
Modify Notebook	Modifies details of a notebook, such as name, description, and tags.
Run Paragraphs	Executes all paragraphs in a notebook in sequential order.
Invalidate Session	Resets any connection or code executed in a notebook.
Delete Notebook	Deletes a notebook.
Hide/Show Code	Hides or shows the code section in all paragraphs in a notebook.
Hide/Show Result	Hides or shows the results section in all paragraphs in a notebook.
Readonly/Write	Sets the notebook to read-only or write mode.
Hide/Show Panel	Shows or hides the paragraph settings bar commands, results toolbar, and settings dialog for a selected paragraph in a panel to the right of the notebook.
Versioning	Creates a version of a notebook or displays versions.
Clear Result	Clears results for all paragraphs in a notebook.
Clear Paragraph Dependencies	Removes all defined paragraph dependencies.
Open as Iframe	Opens a notebook in an Iframe. This allows a notebook to be embedded inside another web page.
Share Notebook	Shares a notebook.
Clone Notebook	Creates a copy of a notebook.
Export Notebook	Exports a notebook to your computer as a `.dnsb` file.
Zeppelin/Jupyter	Sets the preferred layout of a notebook (Zeppelin or Jupyter layout format).
Default Template/another template	Applies the overall look and feel of the notebook using the default template or another template.

Importing a Notebook

You can import notebooks from your local computer into the Big Data Studio notebook application. This includes Jupyter notebooks.

To import a notebook:

Access the notebook application. See Accessing Big Data Studio.
On the Notebooks page, click the Import Notebooks icon in the upper right.
Drag and drop or browse for the notebook you want to import. Valid file extensions are .json, .dsnb, and .ipynb.
Click Import.

Searching for a Notebook

You can search for notebooks by title, description, or tag.

To search for notebooks:

Access the notebook application. See Accessing Big Data Studio.
On the Notebooks page, enter text in the search box or click a tag. Clicking a tag displays all notebooks with that tag.

Cloning a Notebook

When you clone a notebook you make a copy of it.

To clone a notebook:

Access the notebook application. See Accessing Big Data Studio.
From the Notebooks page, open the notebook you want to clone.
Click the Clone Notebook icon and create the copy.

Unless the name is changed, the cloned notebook is named Copy of current notebook name.
If the copy is read-only and you want to run or change it, click the lock icon in the upper right of the open notebook to unlock it.

Supported Interpreters

The Big Data Studio notebook application provides interpreters that execute code in different languages. The interpreters listed below are supported.

Each interpreter has a set of properties that can be customized by an administrator and applied across all notebooks. See Configuring Interpreters.

Start a paragraph with %interpreter as shown below and then code your paragraph with the corresponding syntax.

Note

The Spark and PySpark interpreters support YARN client mode. Running the interpreters in YARN client mode means that a Spark driver is started by Big Data Studio and Spark application commands are sent to YARN node managers to be run. This way YARN manages memory instead of the cluster node.


Type	Interpreter	Usage
graalvm	`oracle.datastudio.interpreter.graalvm.GraalVMInterpreter`	Specify the language type as shown below and then use the syntax for that language. %python-graalvm %js-graalvm %java-graalvm %ruby-graalvm
jdbc (default)	`org.apache.zeppelin.jdbc.JDBCInterpreter`	`%jdbc` Database settings must be configured correctly. Administrators should go to Interpreters > jdbc > jdbc(default) in the Big Data Studio web UI and verify or configure the following settings: `default.url:` database connection string `default user:` user with permission to connect to the database `default password:` password for the database
jdbc (Cloud SQL)	`org.apache.zeppelin.jdbc.JDBCInterpreter`	`%jdbc.cloudsql` To use this interpreter, Cloud SQL must be added to the cluster. See Adding Cloud SQL . Cloud SQL enables you to use SQL to query your big data sources. See Using Cloud SQL with Big Data. Administrators must also change the default password used with the interpreter: Connect to the Query Server node as `root` user and run the following: # su - oracle sqlplus / as sysdba sql> alter session set container=bdsqlusr; sql> alter user datastudio_user identified by "new_password"; This sets the new password. If you want to create a superuser (optional), use this: # su - oracle sqlplus / as sysdba sql> alter session set container=bdsqlusr; sql> create user new_user_name identified by "new_password"; sql> GRANT ALL PRIVILEGES TO new_user_name; Go to Interpreters > jdbc > jdbc.cloudsql in the Big Data Studio web UI, enter the new password in the default.password field, and click Update. If you created a superuser, you need to update the default.user field. If Cloud SQL hasn't been added to the cluster, you won't see the jdbc.cloudsql tab.
jdbc (Hive)	`org.apache.zeppelin.jdbc.JDBCInterpreter`	`%jdbc.hive` On unsecured clusters, no additional configuration is needed. On secured clusters, the `datastudio` user needs to be added to the group that has permissions to write in the database. For more information, see Granting Access to Hive Tables.
md (markdown)	`oracle.datastudio.markdown.MarkdownInterpreter`	`%md`
pyspark	`org.apache.zeppelin.spark.PySparkInterpreter`	`%pyspark`
python	`oracle.datastudio.python.DsPythonInterpreter`	`%python` Cluster administrators can install other Python packages on the cluster host. Use SSH to connect to the cluster, and then use pip to install Python packages. When packages are installed and the new libraries are located in one of the following locations, the libraries become available to the module search path of the Python interpreter in the notebook. /usr/bin /usr/lib /usr/lib64
rserveinterpreter	`com.oracle.oml.interpreter.plainrinterpreter.PlainRInterpreter`	`%r` and then R syntax Cluster administrators can install other R packages on the cluster host. Use SSH to connect to the cluster, and then use `R CMD` or `R install.packages` to install R packages: R CMD [options] package_name.tar.gz R install.packages(url_for_packages, NULL) When packages are installed the new libraries are located in `/usr/lib64/R/`. Libraries in this directory are added to the R interpreter. Libraries in other directories are not available to the notebook application.
spark	`org.apache.zeppelin.spark.SparkInterpreter`	`%spark`

Creating Paragraphs

Paragraphs are used in notebooks to interactively explore data. Paragraphs contain executable code that returns results in many different formats.

You can use different interpreters in the same notebook. Interpreter settings can be configured by an administrator. See Configuring Interpreters.

To create a paragraph:

Access the notebook application. See Accessing Big Data Studio.
From the Notebooks page, open the notebook for which you want to add paragraphs (or create a new notebook).
Hover your mouse above or below a paragraph to display the interpreter toolbar:
Click an icon in the toolbar to select an interpreter and add a new paragraph that uses that interpreter. When you click some of the icons, the paragraph that's created includes example code that you can run.

Icons are not provided for all interpreters. For interpreters for which there aren't icons, click the plus sign (+) and start the paragraph with %interpreter. For example, for Spark, click + in the toolbar and start the paragraph with %spark. For a list of supported interpreters, see Supported Interpreters.
Code the paragraph as desired. Changes are saved automatically.

For information about some of the actions you can perform in paragraphs, see Overview of Paragraphs User Interface.

Overview of Paragraphs User Interface

Below are some of the most common screen elements you'll see when working with paragraphs. Mouse over icons in the user interface to see their functions.


Icon	Description
Execute Paragraph	Executes the code or query in a paragraph.
Enter Dependency Mode	Adds or removes dependent paragraphs.
Comments	Adds comments to a paragraph.
Expand/Collapse	Expands a paragraph and shows the paragraph in full-screen mode, or collapses that view.
Show/Hide Line Numbers	Shows or hides line numbers in the code in a paragraph (applies only to the code section).
Visibility	Manages the visibility settings in a paragraph. This controls how a paragraph can be viewed by the author and other users who have access to the notebook.
Settings	Provides a number of actions. Use this to: Move a paragraph up or down Clear the paragraph result Open a paragraph as an Iframe Clone a paragraph Delete a paragraph

Accesing HDFS from Spark and PySpark

To access HDFS in a notebook and read and write to HDFS, you need to grant access to your folders and files to the user that the Big Data Studio notebook application will access HDFS as.

When Big Data Studio accesses HDFS (and other Hadoop cluster services), these users are used:

interpreteruser is the user and group used with unsecured clusters.
datastudio is the user and group used with secured clusters (Kerberos-enabled).

Note that these users are not the users that are used to access the Big Data Studio web UI.

On a secured cluster you need to obtain a Kerberos ticket for your user before running any hdfs commands. For example, to run the command kinit oracle for the user oracle.

Granting Access to HDFS Folders

You can grant read and write access to HDFS folders. You can also remove that access.

To grant access to HDFS folders:

Create an HDFS directory to which you want to provide access if you don't already have one. In this example, the directory is /user/oracle/test.
```
hdfs dfs -mkdir /user/oracle/test
```
Provide access through HDFS ACLs.
1. Check the current ACL access. For example:
```
hdfs dfs -getfacl /user/oracle/test
```
  You'll see something like this:
```
# file: /user/oracle/test
# owner: oracle
# group: oracle
user:rwx
group:r-x
other:r-x
```
2. Make the files and subdirectories within the directory readable by the group:
  - For unsecured clusters, set permissions to the group interpreteruser. For example:
    hdfs dfs -setfacl -m group:interpreteruser:rwx /user/oracle/test
  - For secured clusters, set permissions to the group datastudio. For example:
    hdfs dfs -setfacl -m group:datastudio:rwx /user/oracle/test
3. Set the default ACL setting for the parent directory:
  - For unsecured clusters, set default permissions to the group interpreteruser. For example:
    hdfs dfs -setfacl -m default:group:interpreteruser:rwx /user/oracle/test
  - For secured clusters, set default permissions to the group datastudio. For example:
    hdfs dfs -setfacl -m default:group:datastudio:rwx /user/oracle/test
4. Add the correct permissions for the user. You can also remove access that has already been granted. For example:
  - Set read permission
    - Unsecured clusters:
      
      hdfs dfs -setfacl -m user:interpreteruser:r-x /user/oracle/test
    - Secured clusters:
      
      hdfs dfs -setfacl -m user:datastudio:r-x /user/oracle/test
  - Set write permission
    - Unsecured clusters:
      
      hdfs dfs -setfacl -m user:interpreteruser:rwx /user/oracle/test
    - Secured clusters:
      
      hdfs dfs -setfacl -m user:datastudio:rwx /user/oracle/test
  - Remove access
    - Unsecured clusters:
      
      hdfs dfs -setfacl -m user:interpreteruser:--- /user/oracle/test
    - Secured clusters:
      
      hdfs dfs -setfacl -m user:datastudio:--- /user/oracle/test

Check the currect ACL access. For example:

hdfs dfs -getfacl /user/oracle/test
# file: /user/oracle/test
# owner: oracle
# group: oracle

You'll see something like the following. In this example, permissions are shown for interpreteruser.

user:rwx
user:interpreteruser:r-x  
group:r-x
group:interpreteruser:rwx
mask:rwx
other:r-x
default:user:rwx
default:group:r-x
default:group:interpreteruser:rwx
default:mask:rwx
default:other:r-x

Test the configuration by running the following script in a notebook. This example uses Spark. You should now be able to read and write in HDFS.

%spark
    import scala.sys.process.Process
    import org.apache.spark.SparkConf
    import org.apache.hadoop.fs.{FileSystem, Path}

    //Get the Name Node Service
    val df_name_xml = Process(Seq("bash", "-c", "grep -A1  -- dfs.nameservices  /etc/hadoop/conf/hdfs-site.xml  | grep -v dfs ") ).lineStream.mkString
    var df_name = df_name_xml.trim.replaceAll("<value>|</value>", "")
    println (df_name)

    val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)

    var namenode=df_name //NameNode Nameservice
    var testfile="test.csv"
    var dir = "/user/oracle/test"

    val rdd = sc.parallelize(List(
        (0, 60),
        (0, 56),
        (0, 54),
        (0, 62),
        (0, 61),
        (0, 53),
        (0, 55),
        (0, 62),
        (0, 64),
        (1, 73),
        (1, 78),
        (1, 67),
        (1, 68),
        (1, 78)
    ))

    rdd.saveAsTextFile(s"hdfs://$namenode:8020$dir/$testfile")
    var input = sc.textFile(s"hdfs://$namenode:8020$dir/$testfile")
    input.count()
    fs.delete(new Path(s"$dir/$testfile"))

Granting Access to HDFS Files

You can grant read access to specific HDFS files. You can also remove that access.

To grant access to HDFS files:

If you don't already have an HDFS file to which you want to provide access, you can create one. This example uses /user/oracle/test.txt.
```
$ ls -l / > /tmp/test.txt
$ hdfs dfs -put /tmp/test.txt  /user/oracle
$ hdfs dfs -chmod o-r   /user/oracle/test.txt
```

Set the correct permissions, or remove access. For example:

Set read permission

Unsecured clusters:

hdfs dfs -setfacl -m user:interpreteruser:r--  /user/oracle/test.txt

Secured clusters:

hdfs dfs -setfacl -m user:datastudio:r--  /user/oracle/test.txt

Remove access

Unsecured clusters:

hdfs dfs -setfacl -m user:interpreteruser:--- /user/oracle/test.txt

Secured clusters:

hdfs dfs -setfacl -m user:datastudio:--- /user/oracle/test.txt

Test the configuration by running the following script in a notebook. This example uses Spark.

%spark
import scala.sys.process.Process
import org.apache.spark.SparkConf

//Get the Name Node Service
val df_name_xml = Process(Seq("bash", "-c", "grep -EA1 -- 'dfs.namenode.servicerpc-address<|dfs.nameservices' /etc/hadoop/conf/hdfs-site.xml | grep -v dfs ") ).lineStream.mkString
var df_name = df_name_xml.trim.replaceAll("<value>|</value>|:8022", "")
println (df_name)

var namenode=df_name //NameNode Nameservice
var testfile="test.txt"
var dir = "/user/oracle"

var x = sc.textFile(s"hdfs://$namenode:8020$dir/$testfile")
x.take(10).foreach(println)

PySpark Examples for HDFS

The following examples show how to use PySpark with HDFS. You need access to HDFS as described in the previous topics. Make sure you're using the %pyspark interpreter.

Get a Spark Session

You need a Spark session to work with HDFS.

%pyspark
from pyspark.sql import SparkSession
sparkSession = SparkSession.builder.appName("SparkSessionName").getOrCreate()

Read in HDFS

To read using SparkSession:

%pyspark
df_load = sparkSession.read.csv('/hdfs/full/path/to/folder') 
df_load.show()

Write to HDFS

Use a DataFrame to write to HDFS. To write using SparkSession:

%pyspark
data = [('One', 1), ('Two', 2), ('Three', 3)]
# Create DataFrame
df = sparkSession.createDataFrame(data)
# Write into HDFS 
df.write.mode("overwrite").csv("/hdfs/full/path/to/folder")
# /hdfs/full/path/to/folder is created when dfs.write is executed

df.write.mode options specify the behavior of the save operation when data already exists:

append: Appends contents of this DataFrame to existing data.
overwrite: Overwrites existing data.
ignore: Ignores this operation if data already exists.
error or errorifexists: Throws an exception if data already exists.

Example 1: Write to a Directory

This example writes to a /tmp directory to which all HDFS users have access.

%pyspark
from pyspark.sql import SparkSession
sparkSession = SparkSession.builder.appName("pyspark-write").getOrCreate()
# Create data
data = [('One', 1), ('Two', 2), ('Three', 3), ('Four', 4), ('Five', 5)]
df = sparkSession.createDataFrame(data)
# Write into HDFS 
df.write.mode("overwrite").csv("/tmp/testpysparkcsv")

Example 2: Read the Information

This example reads the information in the /tmp/testpysparkcsv directory created in the previous example.

%pyspark
from pyspark.sql import SparkSession
sparkSession = SparkSession.builder.appName("pyspark-read").getOrCreate()
# Read from HDFS
df_load = sparkSession.read.csv('/tmp/testpysparkcsv')

df_load.show()

Accessing Hive from Spark and PySpark

To access Hive in a notebook and read and write to Hive, you need to grant access to your folders and files to the user that the Big Data Studio notebook application will access Hive as.

When Big Data Studio accesses Hive, these users are used:

interpreteruser is the user and group used with unsecured clusters.
datastudio is the user and group used with secured clusters (Kerberos-enabled).

Note that these users are not the users that are used to access the Big Data Studio web UI.

Granting Access to Hive Tables

You can enable Big Data Studio to access all Hive tables by adding the datastudio user to the hive admin group. The user performing these steps must be logged into a node of the cluster as the root user.

Note

In secured CDH clusters, the datastudio user and group can be granted more fine-grained access to specific tables by creating and using Sentry roles. For more information, see Hive SQL Syntax for Use with Sentry in the Cloudera documentation.

To grant access to Hive tables:

Add the datastudio user to a group with access to Hive. In the following examples, the group is hive.
Unsecured clusters:
```
dcli -C usermod -a -G hive interpreteruser
```
Secured clusters:
```
dcli -C usermod -a -G hive datastudio
```
Verify that the user is part of a group with access to Hive.
Unsecured cluster:
```
dcli -C id interpreteruser
```
Secured cluster:
```
dcli -C id datastudio
```

Test the configuration by running the following script in a notebook. This example uses Spark. You should now be able to read and write in Hive databases.

%spark
import org.apache.spark.sql.SparkSession
val conf = sc.getConf
sc.stop()

val customSql = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()

customSql.sql(s"use default").show()
customSql.sql(s"show tables").show()
customSql.sql("CREATE TABLE IF NOT EXISTS testds ( eid int, name String,  salary String, destination String)").show()
customSql.sql(s"describe  testds").show()

Removing Access to Hive

To remove access to Hive, remove the user from the group that has access. In this example, the group is hive.

Unsecured clusters:

dcli -C gpasswd -d interpreteruser hive

Secured clusters:

dcli -C gpasswd -d datastudio hive

PySpark Examples for Hive

The following examples show how to use PySpark with Hive. You need access to Hive as described in the previous topics. Make sure you're using the %pyspark interpreter.

Use `spark.sql` to Work with Hive

You use spark.sql to work with Hive. spark.sql function call:

%pyspark
# Run the SQL instruction in quiet mode
spark.sql("SQL instruction")
# Show some output
spark.sql("SQL instruction").show()

Example of SQL instructions that do not show information:

%pyspark
spark.sql ("DROP TABLE IF EXISTS hive_table")
spark.sql("CREATE TABLE IF NOT EXISTS hive_table (number int, Ordinal_Number string, Cardinal_Number string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ")
spark.sql("load data inpath '/tmp/pysparktestfile.csv' into table pyspark_numbers_from_file")
spark.sql("insert into table pyspark_numbers_from_file2 select * from pyspark_numbers_from_file")
spark.sql("CREATE TABLE IF NOT EXISTS pyspark_numbers (number int, Ordinal_Number string, Cardinal_Number string) USING com.databricks.spark.csv OPTIONS \
(path \"/full/path/hdfs/dataframe/folder\", header \"false\")")

Example of SQL instructions that show information:

%pyspark
spark.sql("show databases").show()
spark.sql("show tables").show()
spark.sql("describe hive_table ").show()
spark.sql("select * hive_Table limit 5").show()

Example 1: Create a Table

%pyspark
spark.sql("CREATE TABLE IF NOT EXISTS pyspark_test_table ( eid int, name String,  salary String, destination String)")
spark.sql("describe pyspark_test_table ").show()

Example 2: Remove a Table

%pyspark
spark.sql ("DROP TABLE IF EXISTS pyspark_test_table")

Example 3: Create a File with 9000 Rows

%pyspark
from pyspark.sql import SparkSession
import re

single_digit = ["","one", "two", "three" , "four" , "five" , "six" , "seven" , "eight" , "nine"] 
ten_to_nineteen  =["ten","eleven","twelve","thirteen" ,"fourteen","fifteen","sixteen","seventeen","eighteen","nineteen"]
two_digits  =["","","twenty","thirty","fourty","fithy","sixty","seventy","eighty","ninety"]
ten_power = ["hundred","thousand"]
data = []

def cardinalNum(number):
    if len (str(number)) == 1 :
        return single_digit[number]
    elif len (str(number)) == 2 :
        if str(number)[0] == "1" :
            return ten_to_nineteen[int(str(number)[1])]
        else:
            return two_digits[int(str(number)[0])] + " " + single_digit[int(str(number)[1])]
    elif len (str(number)) == 3 :
        return   single_digit[int(str(number)[0])] + " " + ten_power[0] + " " +  cardinalNum(int(str(number)[1:3]))
    elif len (str(number)) == 4 :
        return cardinalNum(int(str(number)[0:1])) + " " + ten_power[1] + " " + cardinalNum(int(str(number)[1:4]))
    else:
        return str(number)

def ordinalNum(number):
    if re.match(".*1[0-9]$", str(number)):
        return(str(number) + 'th')
    elif re.match(".*[04-9]$", str(number)):
        return(str(number) + 'th')
    elif re.match(".*1$", str(number)):
        return(str(number) + 'st')
    elif re.match(".*2$", str(number)):
        return(str(number) + 'nd')
    elif re.match(".*3$", str(number)):
        return(str(number) + 'rd')
    else:
        return(str(number))

sparkSession = SparkSession.builder.appName("pyspark-write").getOrCreate()

# Create data
for number in range(1, 9001):
    tmpdata=[ (number) ,( ordinalNum(number)) ,(cardinalNum(number))]
    data.append(tmpdata) 
    #print ( str(number) + ',' + ordinalNum(number) + ',' + cardinalNum(number))

df = sparkSession.createDataFrame(data)

# Write into HDFS
df.write.mode("overwrite").csv("/tmp/testpyspark")

Example 4: Load from the DataFrame Folder

%pyspark
spark.sql("CREATE TABLE IF NOT EXISTS pyspark_numbers (number int, Ordinal_Number string, Cardinal_Number string) USING com.databricks.spark.csv OPTIONS \
(path \"/tmp/testpyspark\", header \"false\")")
spark.sql ("describe pyspark_numbers").show()
spark.sql ("select * from pyspark_numbers").show()
spark.sql("select count(*) from pyspark_numbers").show()

Example 5: Load from a `.csv` File

%pyspark
spark.sql("CREATE TABLE IF NOT EXISTS pyspark_numbers_from_file (number int, Ordinal_Number string, Cardinal_Number string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' ")
spark.sql("load data inpath '/tmp/pysparktestfile.csv' into table pyspark_numbers_from_file")

To create the pysparktestfile.csv file, copy the file generated in Example 3 above, for example:

# Command listing files
hdfs dfs  -ls /tmp/testpyspark/

# Output listing files
Found 2 items
-rw-r--r--   3 datastudio supergroup          0 2020-08-21 16:50 /tmp/testpyspark/_SUCCESS
-rw-r--r--  
 3 datastudio supergroup     416250 2020-08-21 16:50 
/tmp/testpyspark/part-00000-5bf7c92a-e2ad-4f79-802e-c84c0c3b4cc0-c000.csv

# Command copying file
hdfs dfs -cp -p /tmp/testpyspark/part-00000-5bf7c92a-e2ad-4f79-802e-c84c0c3b4cc0-c000.csv /tmp/pysparktestfile.csv

Example 6: Insert from One Table to Another

%pyspark
spark.sql("CREATE TABLE IF NOT EXISTS pyspark_numbers_new (number int, Ordinal_Number string, Cardinal_Number string) ")
spark.sql("insert into table pyspark_numbers_new  select * from pyspark_numbers_from_file")
spark.sql("select * from  pyspark_numbers_new").show()
spark.sql("select count(*) from pyspark_numbers_new").show()
spark.sql("select count(*) from pyspark_numbers_from_file").show()

Creating Versions of a Notebook

You can create a version of a notebook, which serves as a snapshot of the notebook when the version was created.

To create a version of a notebook:

Access the notebook application. See Accessing Big Data Studio.
From the Notebooks page, open the notebook you want to version.
Click Versioning in the upper left.
Select Create Version to create a new verision of the notebook, or View Version History to view and access versions that have already been created.

Exporting a Notebook

You can export notebooks from the Big Data Studio notebook application to your local computer. This includes Jupyter notebooks.

To export a notebook:

Access the notebook application. See Accessing Big Data Studio.
On the Notebooks page, click the Select Notebooks icon.
Select the notebook you want to export.
Click the Export Notebooks icon in the upper right.
Change settings if desired, and then click Export. The file will be exported as a .dsnb file.
Choose from export options that display, and then click OK to proceed with the export.

Deleting a Notebook

You can delete notebooks for which you have delete permissions.

To delete a notebook:

Access the notebook application. See Accessing Big Data Studio.
On the Notebooks page, click the Select Notebooks icon.
Select the notebook you want to delete.
Click the Delete Notebooks icon in the upper right, and then click Delete to confirm the action.

You can also delete a notebook by opening the notebook and clicking the Delete Notebook icon at the top.

Keyboard Shortcuts for Notebooks

You can use keyboard shortcuts to perform certain actions, such as select paragraphs and clone and export notebooks.

To see a list of available shortcuts, open a notebook, click your user name in the top right, and select Keyboard Shortcuts from the menu. You can open an overview of all shortcuts, or search for shortcuts. If you're on a page that doesn't have any shortcuts, you won't see the Keyboard Shortcuts option.

You can also search for shortcuts by pressing ctrl+shift+f.

Oracle Cloud Infrastructure Documentation

Working with Notebooks

Creating a Notebook

Overview of the Notebooks User Interface

Importing a Notebook

Searching for a Notebook

Cloning a Notebook

Supported Interpreters

Creating Paragraphs

Overview of Paragraphs User Interface

Accesing HDFS from Spark and PySpark

Get a Spark Session

Read in HDFS

Write to HDFS

Example 1: Write to a Directory

Example 2: Read the Information

Accessing Hive from Spark and PySpark

Use spark.sql to Work with Hive

Example 1: Create a Table

Example 2: Remove a Table

Example 3: Create a File with 9000 Rows

Example 4: Load from the DataFrame Folder

Example 5: Load from a .csv File

Example 6: Insert from One Table to Another

Creating Versions of a Notebook

Exporting a Notebook

Deleting a Notebook

Keyboard Shortcuts for Notebooks

Use `spark.sql` to Work with Hive

Example 5: Load from a `.csv` File