Pretrained Foundational Models in Generative AI
You can use the following pretrained foundational models in OCI Generative AI.
Chat Models (New)
Ask questions and get conversational responses through an AI chat interface.
- Cohere Models
-
Model Available in These Regions Key Features cohere.command-r-08-2024
- Brazil East (Sao Paulo)
- Germany Central (Frankfurt)
- Japan Central (Osaka)
- UK South (London)
- US Midwest (Chicago)
- User prompt can be up to 128,000 tokens, and response can be up to 4,000 tokens for each run.
- When you fine-tune this model, user prompt for the custom model can be up to 16,000 tokens and response can be up to 4,000 tokens for each run.
- Optimized for complex tasks, offers advanced language understanding, higher capacity, and more nuanced responses, and can maintain context from its long conversation history of 128,000 tokens. Also ideal for question-answering, sentiment analysis, and information retrieval.
- Improved math, coding, and reasoning skills.
- Enhanced multilingual retrieval-augmented generation (RAG) feature with customizable citation options.
- You can fine-tune this model with your dataset.
cohere.command-r-plus-08-2024
- Brazil East (Sao Paulo)
- Germany Central (Frankfurt)
- Japan Central (Osaka)
- UK South (London)
- US Midwest (Chicago)
- User prompt can be up to 128,000 tokens, and response can be up to 4,000 tokens for each run.
- Optimized for complex tasks, offers advanced language understanding, higher capacity, and more nuanced responses, and can maintain context from its long conversation history of 128,000 tokens. Also ideal for question-answering, sentiment analysis, and information retrieval.
- Improved math, coding, and reasoning skills.
- Enhanced multilingual retrieval-augmented generation (RAG) feature with customizable citation options.
cohere.command-r-16k (deprecated)
- Brazil East (Sao Paulo)
- Germany Central (Frankfurt)
- Japan Central (Osaka) (dedicated AI cluster only)
- UK South (London)
- US Midwest (Chicago)
- User prompt can be up to 16,000 tokens, and response can be up to 4,000 tokens for each run.
- Optimized for conversational interaction and long context tasks. Ideal for text generation, summarization, translation, and text-based classification.
- You can fine-tune this model with your dataset.
- For dedicated inferencing in Osaka region, create a dedicated AI cluster and endpoint and host the model on the cluster.
cohere.command-r-plus (deprecated)
- Brazil East (Sao Paulo)
- Germany Central (Frankfurt)
- UK South (London)
- US Midwest (Chicago)
- User prompt can be up to 128,000 tokens, and response can be up to 4,000 tokens for each run.
- Optimized for complex tasks, offers advanced language understanding, higher capacity, and more nuanced responses, and can maintain context from its long conversation history of 128,000 tokens. Also ideal for question-answering, sentiment analysis, and information retrieval.
- Meta Llama Models
-
Model Available in These Regions Key Features meta.llama-3.2-11b-vision-instruct
- Brazil East (Sao Paulo) (dedicated AI cluster only)
- Japan Central (Osaka) (dedicated AI cluster only)
- US Midwest (Chicago) (dedicated AI cluster only)
- Multimodal support: Input text and images and get a text output.
- English is the only supported language for the image plus text option.
- Multilingual option supported for the text only option.
- In the Console, input a
.png
or.jpg
image of 5 MB or less. - Submitting an image without a prompt doesn't work. When you submit an image, you must submit a prompt about that image in the same request. You can then submit follow-up prompts and the model keeps the context of the conversation.
- If you host the model in the playground, to add the next image and text, you must clear the chat which results in losing context of the previous conversation by clearing the chat.
- For API, input a
base64
encoded image in each run. A 512 x 512 image is converted to about 1,610 tokens. - Model has 11 billion parameters.
- User prompt and response can be up to 128,000 tokens for each run.
- Dedicated mode only. (On-demand inferencing not available.) For dedicated inferencing, create a dedicated AI cluster and endpoint and host the model on the cluster.
meta.llama-3.2-90b-vision-instruct
- Brazil East (Sao Paulo)
- Japan Central (Osaka)
- US Midwest (Chicago)
- Multimodal support: Input text and images and get a text output.
- English is the only supported language for the image plus text option.
- Multilingual option supported for the text only option.
- In the Console, input a
.png
or.jpg
image of 5 MB or less. - Submitting an image only works when submit a prompt about that image in the same request.
- In the playground, to add the next image and text, you must clear the chat which results in losing context of the previous conversation by clearing the chat.
- For API, input a
base64
encoded image in each run. A 512 x 512 image is converted to about 1,610 tokens. - Model has 90 billion parameters.
- User prompt and response can be up to 128,000 tokens for each run.
meta.llama-3.1-70b-instruct
- Brazil East (Sao Paulo)
- Germany Central (Frankfurt)
- Japan Central (Osaka)
- UK South (London)
- US Midwest (Chicago)
- Multimodal support: text and image inputs with text output.
- Model has 70 billion parameters.
- User prompt and response can be up to 128,000 tokens for each run.
- You can fine-tune this model with your dataset.
meta.llama-3.1-405b-instruct
- Brazil East (Sao Paulo) (dedicated AI cluster only)
- Germany Central (Frankfurt) (dedicated AI cluster only)
- Japan Central (Osaka) (dedicated AI cluster only)
- UK South (London) (dedicated AI cluster only)
- US Midwest (Chicago)
- Model has 450 billion parameters.
- User prompt and response can be up to 128,000 tokens for each run.
- On-demand inferencing is only available in the US Midwest (Chicago) region. Other regions require that you create your own dedicated AI clusters and endpoints to host this model on those clusters for inferencing.
meta.llama-3-70b-instruct
(deprecated)- Brazil East (Sao Paulo)
- Germany Central (Frankfurt)
- UK South (London)
- US Midwest (Chicago)
- Model has 70 billion parameters.
- User prompt and response can be up to 8,000 tokens for each run.
- You can fine-tune this model with your dataset.
- Has a broad general knowledge, from generating ideas to refining text analysis and drafting written content, such as emails, blog posts, and descriptions.
Embedding Models
Convert text to vector embeddings to use in applications for semantic searches, text classification, or text clustering.
Model | Available in These Regions | Key Features |
---|---|---|
cohere.embed-english-v3.0 |
|
|
cohere.embed-multilingual-v3.0 |
|
|
cohere.embed-english-light-v3.0 |
|
|
cohere.embed-multilingual-light-v3.0 |
|
|
Generation Models (Deprecated)
Give instructions to generate text or extract information from text.
Important
- Not Available on-demand: All OCI Generative AI foundational pretrained models supported for the on-demand serving mode that use the text generation and summarization APIs (including the playground) are now retired. We recommend that you use the chat models instead.
- Can be hosted on clusters: If you host a summarization or a generation model such as
cohere.command
on a dedicated AI cluster, (dedicated serving mode), you can continue to use that model until it's retired. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Retiring the Models for retirement dates and definitions.
Model | Available in These Regions | Key Features |
---|---|---|
cohere.command (deprecated) |
|
|
cohere.command-light (deprecated) |
|
|
meta.llama-2-70b-chat (deprecated) |
|
|
The Summarization Model (Deprecated)
Summarize text with your instructed format, length, and tone.
Important
The
The
cohere.command
model supported for the on-demand serving mode is now retired and this model is deprecated for the dedicated serving mode. If you're hosting cohere.command
on a dedicated AI cluster, (dedicated serving mode) for summarization, you can continue to use this hosted model replica with the summarization API and in the playground until the cohere.command
model retires for the dedicated serving mode. These models, when hosted on a dedicated AI cluster are only available in US Midwest (Chicago). See Retiring the Models for retirement dates and definitions. We recommend that you use the chat models instead which offer the same summarization capabilities, including control over summary length and style.Model | Available in These Regions | Key Features |
---|---|---|
cohere.command (deprecated) |
|
|