PySpark 3.0 and Data Flow conda environment is introduced

Services: Data Science
Release Date: February 09, 2022

With the PySpark 3.0 and Data Flow CPU on Python 3.7 (version 3.0) conda environment you can apply the power of Apache Spark and MLlib to train models at scale. PySparkSQL uses parallel processing to analyze large quantities of structured and semi-structured data from within o notebook. For larger jobs, you can develop Spark applications then submit them to the Data Flow service. The slug name is pyspark30_p37_cpu_v3.

For more information, see Data Science, ADS SDK, and ocifs SDK. Take a look at our Data Science blog.

Oracle Cloud Infrastructure Documentation / Release Notes Try Free Tier

PySpark 3.0 and Data Flow conda environment is introduced

Oracle Cloud Infrastructure Documentation / Release Notes
Try Free Tier