Batch Inference for Jobs

Learn how to use the various types of batch inference uses with jobs.

Traditional batch inference is an asynchronous process that is executing predictions based on existing models and observations, and then stores the output. This batch inference is a single virtual machine job that you can run with Data Science jobs.

Typically, a workload varies, but it is bigger than a mini-batch inference and could require several hours or days to finish. This type of workload doesn't require producing near or real-time results. It can have extensive requirements on the CPU or GPU and memory required to run.

For best performance, you should use the AI and ML model directly rather than calling it over HTTP or another network. Using the model directly is especially important when you require heavy processing with large datasets. For example, processing images.

Shows a data set processed by a batch job using a model from the model catalog and storing the results.

Mini Batch Inference

Mini batch inference is similar to batch inference with the difference that you can split tasks into small batches using several jobs or one job that runs several small tasks simultaneously.

Since the tasks are small and the mini batches are run regularly, they usually run only for several minutes. This type of workload is run regularly using schedulers or triggers to work on small groups of data. Mini batching helps you to incrementally load and process small parts of data or inference.

You can run mini batches against a model from the model catalog when the best performance is required or against the deployed model because usually the workloads and the data input are not heavy.

Shows a data set processed by multiple mini batch jobs with multiple models from the model catalog and storing the results.

Distributed Batch Inference

You use distributed batch inference for heavy duty jobs.

Don't confuse distributed batch inference with distributed model training because they are different. Also, it's not a model deployment type of inference because typically you want to provision and use the infrastructure only during the time of batch inference, and automatically destroy it when complete.

Distributed batch inference is required on a large dataset and heavy inference that can't be processed in a timely matter on a single VM or BM and require horizontal scaling. You can have a single or several job configurations running (1+n) job runs on various types of infrastructure and split the dataset. This type of workload provides the best performance when they work against the AI and ML model directly from the model catalog using the infrastructure memory and CPU or GPU to the maximum using jobs.

Shows a data set processed by multiple distributed batch jobs with multiple models from the model catalog and storing the results.

Compare Batch Inference Workloads

A high-level comparison between the different types of workloads and the corresponding batch inference types:

Batch Inference

Mini Batch Inference

Distributed Batch Inference

Infrastructure

Large

Light to medium

Very large

VM

Single

Single or multiple (at a small scale)

Multiple

Provisioning Speed - Required

Medium

Fast

Average to slow

Scheduler - Required

Yes

Yes

Use case dependent

Trigger - Required

Yes

Yes

No

Workloads

Large

Light

Large or heavy

Datasets Size

Large

Small

Extremely large or auto-scaling

Batch Process Time (estimate though could differ depending on use case)

Medium to very long (usually from two digits minutes long process to days or hours)

Short to near real-time

Medium to very long (usually from few hours up to days)

Model Deployment

Not required

Yes, but not required

Not required

Endpoints

No

No

No