Paying for On-Demand Inferencing

You get the following benefits with committing to on-demand inferencing in OCI Generative AI:

  • Low barrier to start using Generative AI.
  • Access to all available Generative AI foundational models.
  • Great for experimenting and evaluating the models.
  • Pay as you go for transactions. See the following note for details.
Note

With on-demand inferencing you pay as you go for the following character lengths:

  • Chat: prompt length (in characters) + response length (in characters)
  • Text generation: prompt length (in characters) + response length (in characters)
  • Summarization: prompt length (in characters) + response length (in characters)
  • Text Embeddings: input length (in characters)

On the Pricing page, 1 character is calculated as 1 transaction.

If you're hosting foundational models or fine-tuning them on dedicated AI clusters, you're charged by the unit hour rather than by transaction. In this case, see Paying for Dedicated AI Clusters to learn how to calculate the dedicated AI cluster costs.

Matching Models to On-Demand Prices

See the following tables to match a foundational model to its product name on the pricing page. The pricing page lists the price for 10,000 on-demand transactions when using the playground, API, or CLI for inferencing. Then, review the examples in this section to learn how to calculate the cost based on the number of input and output characters.

Chat Models

Model Name OCI Model Name Pricing Page Product Name
Cohere Command R cohere.command-r-16k Small Cohere
Command R 08-2024 cohere.command-r-08-2024 Small Cohere
Cohere Command R+ cohere.command-r-plus Large Cohere
Command R+ 08-2024 cohere.command-r-plus-08-2024 Large Cohere
Meta Llama 3 meta.llama-3-70b-instruct (deprecated) Large Meta
Meta Llama 3.1 (70B) meta.llama-3.1-70b-instruct Large Meta
Meta Llama 3.1 (405B) meta.llama-3.1-405b-instruct Meta Llama 3.1 405B
Meta Llama 3.2 11B Vision meta.llama-3.2-11b-vision-instruct Large Meta
Meta Llama 3.2 90B Vision meta.llama-3.2-90b-vision-instruct Large Meta
Important

The summarization and text generation models supported for the on-demand mode are now retired. We recommend that you use the chat models instead.

Embedding Models

Model Name OCI Model Name Pricing Page Product Name
Cohere English Embed V3 cohere.embed-english-v3.0 Embed Cohere
Cohere Multilingual Embed V3 cohere.embed-multilingual-v3.0 Embed Cohere
Cohere English Light Embed V3 cohere.embed-english-light-v3.0 Embed Cohere
Cohere Multilingual Light Embed V3 cohere.embed-multilingual-light-v3.0 Embed Cohere

Chat Example

Paul calls the meta.llama-3.1-70b-instruct model with the following prompt, which is 220 characters long:

Generate a product pitch for a USB connected compact microphone that can record surround sound. The microphone is most useful in recording music or conversations. The microphone can also be useful for recording podcasts.

The response from the model is 2,205 characters long. Paul wants to know the cost for this call. Here are the steps to calculate the cost.

  1. Calculate the prompt + response length (in characters).

    Let's add up the prompt length (220 characters) and the model response length (1,618 characters).

    prompt + response length = 220 + 2,205 = 2,425 characters
  2. Calculate the number of transactions.

    Prices are listed for 10,000 transactions.

    10,000 transactions = 10,000 characters, so 1 transaction = 1 character
    2,425 characters = 2,425 transactions
  3. Go to AI Pricing and under OCI Generative AI, for Oracle Cloud Infrastructure Generative AI - Large Meta, find the <Large-Meta-unit-price>.
    Paul uses the meta.llama-3.1-70b-instruct model which matches to the product, Generative AI OCI - Large Meta on the AI Pricing page for Generative AI.
  4. Calculate the price for 1,838 characters.
    price = (2,425 transactions )/ (10,000 transactions) x $<Large-Meta-unit-price>
Tip

In addition to calculating the price, you can estimate the cost by selecting the AI and Machine Learning category and loading the cost estimator for OCI Generative AI.

Text Embeddings Example

Gina is converting customer contracts into embeddings for a new semantic search application. On average, Gina ingests 16 documents every hour. Each document is about 1,000 characters long. Gina wants to get an estimate of the monthly bill for generating those embeddings. Here are the steps to calculate the cost.

  1. Calculate the input length (in characters).

    Let's add up the input character length for each hour.

    input character length for 16 documents = 16 x 1,000 = 16,000 characters per hour
  2. Go to AI Pricing and under OCI Generative AI, for Oracle Cloud Infrastructure Generative AI - Embed Cohere, find the <Embed-Cohere-unit-price>.
    Gina uses the cohere.embed model which matches to the product, Oracle Cloud Infrastructure Generative AI - Embed Cohere on the AI Pricing page for Generative AI.
  3. Calculate the number of transactions per hour.

    Gina ingests 16,000 characters per hour. Prices are listed for 10,000 transactions.

    10,000 transactions = 10,000 characters, so 1 transaction = 1 character
    16,000 characters = 16,000 transactions
  4. Find the hourly price for the 16,000 characters that Gina ingests hourly.
    hourly price = 
    (16,000 transactions ) / (10,000 transactions) x $<Embed-Cohere-unit-price>
  5. Find the monthly price for the longest month of the year.
    One month = 31 x 24 hours = 744 hours
    monthly price = 744 hours x hourly price