Model Limitations in Generative AI
Review the following model requirements for the OCI Generative AI custom and base models to get the most out of your models.
For key features of the pretrained base models, see Pretrained Foundational Models in Generative AI.
Matching Base Models to Clusters
Expand the following sections to review the dedicated AI cluster unit size and units that match each foundational model.
Base Model | Fine-Tuning Cluster | Hosting Cluster | Pricing Page Information | Request Cluster Limit Increase |
---|---|---|---|---|
|
|
|
|
|
|
Not available for fine-tuning |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Not available for fine-tuning |
|
|
|
You must request a limit increase to use the following resources:
Meta Llama Family
-
To host a Meta Llama 3.1 (405B) model, you must request
dedicated-unit-llama2-70-count
to increase by 8. -
To host a Meta Llama 3.1 (70B) model, you must request
dedicated-unit-llama2-70-count
to increase by 2. -
To fine-tune a Meta Llama 3.1 (70B) model, you must request
dedicated-unit-llama2-70-count
to increase by 4.
Cohere Command R Family
-
To host a Cohere Command R+ model, you must request
dedicated-unit-large-cohere-count
to increase by 2. -
To host a Cohere Command R model, you must request
dedicated-unit-small-cohere-count
to increase by 1. -
To fine-tune a Cohere Command R model, you must request
dedicated-unit-small-cohere-count
to increase by 8.
References: Service Limits for Generative AI and Request Cluster Limit Increase
Base Model | Fine-tuning Cluster | Hosting Cluster | Pricing Page Product Name | Request Cluster Limit Increase |
---|---|---|---|---|
|
Not available for fine-tuning |
|
|
|
|
Not available for fine-tuning |
|
|
|
The generation models are being deprecated. We recommend that you use the chat models instead.
Base Model | Fine-tuning Cluster | Hosting Cluster | Pricing Page Product Name | Request Cluster Limit Increase |
---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
Not available for fine-tuning |
|
|
|
The
cohere.command
summarization model is being deprecated. We recommed that you use the chat models which offer the same summarization capabilities, including control over summary length and style.Base Model | Fine-tuning Cluster | Hosting Cluster | Pricing Page Product Name | Request Cluster Limit Increase |
---|---|---|---|---|
|
Not available for fine-tuning |
|
|
|
- Units for Fine-Tuning Clusters
- Creating a fine-tuning dedicated AI cluster automatically provisions a fixed number of units based on the base model: 8 units for
cohere.command-r-16k
and 2 units for other models. You can't change this number, but you can use the same cluster to fine-tune several models. - Units for Hosting Clusters
-
- When creating a cluster, by default, one unit is created for the selected base model.
- You can increase throughput or requests per minute (RPM) by adding model replicas. For example, 2 replicas require 2 units. You can add model replicas when creating or editing a hosting cluster.
- Host up to 50 models on the same cluster, with the following restrictions:
- Host up to 50 of the same version of a fine-tuned or a pretrained model on the same cluster.
- Host different versions of the same base model, only if using
T-FEW
fine-tuning method forcohere.command
andcohere.command-light
base models.
Instead of committing to dedicated AI clusters, you can pay as you go for on-demand inferencing. With on-demand inferencing you reach the foundational models either through the Console, in the playground or through the API. For on-demand features, see Calculating Cost in Generative AI.
Adding Endpoints to Hosting Clusters
To host a model for inference on a hosting dedicated AI cluster, you must create an endpoint for that model. Then, you can add either a custom model or a pretrained foundational model to that endpoint. A hosting dedicated AI cluster can have up to 50 endpoints. Use these endpoints for the following use cases:
- Creating Endpoint Aliases
-
Create aliases with many endpoints. These 50 endpoints must either point to the same base model or the same version of a custom model. Creating many endpoints that point to the same model makes it easier to manage the endpoints, because you can use the endpoints for different users or different purposes.
- Stack Serving
-
Host several versions of a custom model on one cluster. This applies to
cohere.command
andcohere.command-light
models that are fine-tuned with theT-Few
training method. Hosting various versions of a fine-tuned model can help you to assess the custom models for different use cases.
You can increase the instance count to increase the call volume supported by a hosting cluster.
Expand the following sections to review the requirements for hosting models on the same cluster.
Hosting Cluster Unit Size | Matching Rules |
---|---|
Large Generic |
To host the same pretrained base model through several endpoints on the same cluster:
To host several custom models on the same cluster:
|
Large Generic |
To host the same pretrained base model through several endpoints on the same cluster:
To host several custom models on the same cluster:
|
Large Generic 4 |
To host the same pretrained base model through several endpoints on the same cluster:
|
Large Cohere V2_2 |
To host the same pretrained base model through several endpoints on the same cluster:
|
Small Cohere V2 |
To host the same pretrained base model through several endpoints on the same cluster:
To host several custom models on the same cluster:
You can't host different versions of a custom model trained on the |
All the text generation models are being deprecated. To decide which chat model to use instead, review the pretrained models. If you're hosting the text generation models on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules that match your base model.
Hosting Cluster Unit Size | Matching Rules |
---|---|
Small Cohere |
To host the same pretrained base model through several endpoints on the same cluster:
To host different custom models on the same cluster:
|
Large Cohere |
To host the same pretrained base model through several endpoints on the same cluster:
To host different custom models on the same cluster:
|
Llama2 70 | To host the same pretrained base model through several endpoints on the same cluster:
|
The cohere.command
summarization model is being deprecated. To decide which chat model to use instead, review the pretrained models. If you're hosting the pretrained cohere.command
summarization model on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules.
Hosting Cluster Unit Size | Matching Rules |
---|---|
Large Cohere |
To host the same pretrained base model through several endpoints on the same cluster:
To host different custom models on the same cluster:
|
For hosting the embedding models on a hosting dedicated AI cluster, use the following cluster unit size and endpoint rules.
Hosting Cluster Unit Size | Matching Rules |
---|---|
Embed Cohere | To host the same pretrained base model through several endpoints on the same cluster:
|
Training Data
Datasets for training custom models have the following requirements:
- A maximum of one fine-tuning dataset is allowed per custom model. This dataset is randomly split to a 80:20 ratio for training and validating.
- Each file must have at least 32 prompt/completion pair examples.
- The file format is
JSONL
. - Each line in the
JSONL
file has the following format:{"prompt": "<a prompt>", "completion": "<expected response given the prompt>"}\n
- The file must be stored in an OCI Object Storage bucket.
Learn about Training Data Requirements in Generative AI.
Input Data for Text Embeddings
Input data for creating text embeddings has the following requirements:
- You can add sentences, phrases, or paragraphs for embeddings either one phrase at a time, or by uploading a file.
- Only files with a
.txt
extension are allowed. - If you use an input file, each input sentence, phrase, or paragraph in the file must be separated with a newline character.
- A maximum of 96 inputs are allowed for each run.
- Each input must be less than 512 tokens. If an input is too long, select whether to cut off the start or the end of the text to fit within the token limit by setting the Truncate parameter to Start or End. If an input exceeds the 512 token limit and the Truncate parameter is set to None, you get an error message.
Learn about Creating text embeddings in OCI Generative AI.