Data Science has updated conda environments plus a new NLP environment

The Data Science service provides conda environments that are specialized for different workflows. In this release, there are six new environments, four updated, and four have been deprecated. The environments that are released today are all based on Python 3.7 with the Oracle Accelerated Data Science (ADS) SDK v2.2.1 library

Slug names are the names that uniquely identify each conda environment. A new naming standard is being introduced so that you can quickly identify the key components and their versions. The slugs now have the following format:

<software_name><software_version>p<python_version><cpu|gpu>_v<conda_version>

For example, the PySpark v2.4 using Python 3.7 on a CPU and the first release of this configuration is pyspark24_p37_cpu_v1.

New conda environments:

  • Natural language processing (NLP): These environments are designed for working with text datasets and performing NLP tasks. It includes ADS v2.2.1 which has new features for working with text. There are environments for CPUs (slug: nlp_p37_cpu) and GPUs (slug: nlp_p37_gpu).
  • PyTorch V1.8: These environments specialize in machine learning and mainly used for computer vision and NLP. It provides high-level features for tensor computing and deep neural networks. There are environments for CPUs (slug: pytorch18_p37_cpu) and GPUs (slug: pytorch18_p37_gpu).
  • TensorFlow V2.3: This is an ecosystem of tools to create state-of-the-art machine learning models. Use TensorFlow to train and deploy deep neural networks for image recognition, NLP, recurrent neural networks and other machine learning applications. There are environments for CPUs (slug: tensorflow23_p37_cpu) and GPUs (slug: tensorflow23_p37_gpu).

Updated conda environments:

  • Data Exploration: Improvements in ADS for faster calculation of correlations and asynchronous hyperparameter tuning. Slug: data_expl_p37_cpu_v1.
  • Database: Updated JDBC drivers and improvements in ADS to secure your database credits in the Oracle Vault. Slug: database_p37_cpu_v1.
  • PySpark V2.4: Use this conda to create Data Flow jobs or run PySpark locally. It provides new support for working with the Oracle Autonomous Database and snappy compression in parquet files. There are environments for CPUs. Slug: pyspark24_p37_cpu_v1.
  • ONNX V1.7: Upgraded Onnx from V1.3 to V1.7. slug: onnx17_p37_cpu_v1.

Deprecated conda environments:

  • Deprecated conda environments are still available but will only receive critical security updates for one year from the deprecation date.
  • All deprecated conda environments have been replaced with updated versions.
  • The deprecated slugs are:
Deprecated Slug Replacement Slug
dbv1 database_p37_cpu_v1
explv1 data_expl_p37_cpu_v1
onnx13v1 onnx17_p37_cpu_v1
pyspv10 pyspark24_p37_cpu_v1