PySpark 3.0 and Data Flow conda environment is introduced

With the PySpark 3.0 and Data Flow CPU on Python 3.7 (version 3.0) conda environment you can apply the power of Apache Spark and MLlib to train models at scale. PySparkSQL uses parallel processing to analyze large quantities of structured and semi-structured data from within o notebook. For larger jobs, you can develop Spark applications then submit them to the Data Flow service. The slug name is pyspark30_p37_cpu_v3.

For more information, see Data ScienceADS SDK, and ocifs SDK. Take a look at our Data Science blog