Paying for On-Demand Inferencing
You get the following benefits with committing to on-demand inferencing in OCI Generative AI:
- Low barrier to start using Generative AI.
- Access to all available Generative AI foundational models.
- Great for experimenting and evaluating the models.
- Pay as you go for transactions. See the following note for details.
With on-demand inferencing you pay as you go for the following character lengths:
- Chat: prompt length (in characters) + response length (in characters)
- Text generation: prompt length (in characters) + response length (in characters)
- Summarization: prompt length (in characters) + response length (in characters)
- Text Embeddings: input length (in characters)
On the Pricing page, 1 character
is calculated as 1 transaction
.
If you're hosting foundational models or fine-tuning them on dedicated AI clusters, you're charged by the unit hour rather than by transaction. In this case, see Paying for Dedicated AI Clusters to learn how to calculate the dedicated AI cluster costs.
Matching Models to On-Demand Prices
See the following tables to match a foundational model to its product name on the pricing page. The pricing page lists the price for 10,000 on-demand transactions when using the playground, API, or CLI for inferencing. Then, review the examples in this section to learn how to calculate the cost based on the number of input and output characters.
Chat Models
Model Name | OCI Model Name | Pricing Page Product Name |
---|---|---|
Cohere Command R | cohere.command-r-16k |
Small Cohere |
Command R 08-2024 | cohere.command-r-08-2024 |
Small Cohere |
Cohere Command R+ | cohere.command-r-plus |
Large Cohere |
Command R+ 08-2024 | cohere.command-r-plus-08-2024 |
Large Cohere |
Meta Llama 3 | meta.llama-3-70b-instruct (deprecated) |
Large Meta |
Meta Llama 3.1 (70B) | meta.llama-3.1-70b-instruct |
Large Meta |
Meta Llama 3.1 (405B) | meta.llama-3.1-405b-instruct |
Meta Llama 3.1 405B |
Meta Llama 3.2 11B Vision | meta.llama-3.2-11b-vision-instruct |
Large Meta |
Meta Llama 3.2 90B Vision | meta.llama-3.2-90b-vision-instruct |
Large Meta |
The summarization and text generation models supported for the on-demand mode are now retired. We recommend that you use the chat models instead.
Embedding Models
Model Name | OCI Model Name | Pricing Page Product Name |
---|---|---|
Cohere English Embed V3 | cohere.embed-english-v3.0 |
Embed Cohere |
Cohere Multilingual Embed V3 | cohere.embed-multilingual-v3.0 |
Embed Cohere |
Cohere English Light Embed V3 | cohere.embed-english-light-v3.0 |
Embed Cohere |
Cohere Multilingual Light Embed V3 | cohere.embed-multilingual-light-v3.0 |
Embed Cohere |
Chat Example
Paul calls the meta.llama-3.1-70b-instruct
model with the following prompt, which is 220 characters
long:
Generate a product pitch for a USB connected compact microphone that can record surround sound. The microphone is most useful in recording music or conversations. The microphone can also be useful for recording podcasts.
The response from the model is 2,205 characters
long. Paul wants to know the cost for this call. Here are the steps to calculate the cost.
In addition to calculating the price, you can estimate the cost by selecting the AI and Machine Learning category and loading the cost estimator for OCI Generative AI.
Text Embeddings Example
Gina is converting customer contracts into embeddings for a new semantic search application. On average, Gina ingests 16 documents every hour. Each document is about 1,000 characters
long. Gina wants to get an estimate of the monthly bill for generating those embeddings. Here are the steps to calculate the cost.