Pipelines and pipeline runs are introduced.

Services: Data Science
Release Date: Jan. 25, 2023

Machine learning pipelines are a crucial component of the modern data science workflow. They help automate the process of building, training, and deploying machine learning models, which allows data scientists to focus on more important tasks like data exploration and model evaluation. Machine learning, by nature, is a highly repetitive, iterative process. Changing and evolving data requires models to be continuously retrained to keep prediction accuracy high. The workflow itself, however, remains mostly the same, or at least parts of it.

At a high level, a machine learning pipeline consists of several steps, each of which performs a specific task, working together to complete a workflow. For example, the first step might be data preprocessing, where raw data is cleaned and transformed into a format that you can feed into a machine learning algorithm. The next step might be model training, where the algorithm is trained on the processed data to learn the patterns and relationships within it. You can run steps in sequence or in parallel, speeding up the time to complete the workflow. One of the key advantages of using machine learning pipelines is the ability to easily repeat and reproduce the entire workflow. This is important for ensuring the reliability and reproducibility of the results. Also, for making it easier to experiment with different algorithms and parameters finding the best model for a given problem.

Pipeline steps using custom scripts aren't reporting metrics data to the Monitoring service yet. Meanwhile, you can use a job as a step instead of a script. Metrics are available in the job run metrics page, which is linked from the pipeline step run page in the Console.

Pipeline steps using custom scripts are not reporting metrics data to the Monitoring service yet. Meanwhile, you can use a job as a step instead of a script. Metrics are available in the job run metrics page, which is linked form the pipeline step run page in the Console.

For more information, see Data Science and take a look at our Data Science blog.