PySpark 3.0 and Data Flow conda environment is introduced
- Services: Data Science
- Release Date: February 09, 2022
With the PySpark 3.0 and Data Flow CPU on Python 3.7 (version 3.0) conda environment you can apply the power of Apache Spark and MLlib to train models at scale. PySparkSQL uses parallel processing to analyze large quantities of structured and semi-structured data from within o notebook. For larger jobs, you can develop Spark applications then submit them to the Data Flow service. The slug name is pyspark30_p37_cpu_v3
.
For more information, see Data Science, ADS SDK, and ocifs SDK. Take a look at our Data Science blog.