Paying for On Demand Inferencing

You get the following benefits with committing to on demand inferencing in Generative AI:

  • Low barrier to start using Generative AI.
  • Access to all available Generative AI foundational models.
  • Great to experiment and provide model capabilities.
  • Pay as you go for transactions. See the following note for details.
Note

With on demand inferencing you pay as you go for the following character lengths:

  • Chat: prompt length (in characters) + response length (in characters)
  • Text generation: prompt length (in characters) + response length (in characters)
  • Summarization: prompt length (in characters) + response length (in characters)
  • Text Embeddings: input length (in characters)

The following examples calculate on demand inferencing cost for text generation and text embeddings in OCI Generative AI. For calculating dedicated AI cluster cost, see Paying for Dedicated AI Clusters.

Corresponding a Foundational Model to a Product

To find the unit price for 10,000 transactions of on demand inferencing, match the foundational model that you use for inferencing to the product in the following table.

Capability Foundational Base Model Product for On Demand Inferencing on Pricing Page
Text Generation cohere.command Oracle Cloud Infrastructure Generative AI - Large Cohere
Text Generation cohere.command-light Oracle Cloud Infrastructure Generative AI - Small Cohere
Text Generation llama2_70b-chat Oracle Cloud Infrastructure Generative AI - Llama2-70
Summarization cohere.command Oracle Cloud Infrastructure Generative AI - Large Cohere
Embedding cohere.embed Oracle Cloud Infrastructure Generative AI - Embed Cohere

The following examples calculate on demand inferencing cost for text generation and text embeddings in OCI Generative AI. For calculating dedicated AI cluster cost, see Paying for Dedicated AI Clusters.

Text Generation Example

Paul calls the cohere.command model with the following prompt, which is 220 characters long:

Generate a product pitch for a USB connected compact microphone that can record surround sound. The microphone is most useful in recording music or conversations. The microphone can also be useful for recording podcasts.

The response from the model is 1,618 characters long. Paul wants to know the cost for this call. Here are the steps to calculate the cost.

  1. Calculate the prompt + response length (in characters).

    Let's add up the prompt length (220 characters) and the model response length (1,618 characters).

    prompt + response length = 220 + 1,618 = 1,838 characters
  2. Calculate the number of transactions.

    Prices are listed for 10,000 transactions.

    10,000 transactions = 10,000 characters, so 1 transaction = 1 character
    1,838 characters = 1,838 transactions
  3. Go to AI Pricing and under OCI Generative AI, for Oracle Cloud Infrastructure Generative AI - Large Cohere, find the <Large-Cohere-unit-price> for 10,000 transactions.
    Paul uses the cohere.command model which matches to the product, Oracle Cloud Infrastructure Generative AI - Large Cohere on the AI Pricing page for Generative AI.
  4. Calculate the price for 1,838 characters.
    price = (1,838 transactions )/ (10,000 transactions) x $<Large-Cohere-unit-price>

Text Embeddings Example

Gina is converting customer contracts into embeddings for a new semantic search application. On average, Gina ingests 16 documents every hour. Each document is about 1,000 characters long. Gina wants to get an estimate of the monthly bill for generating those embeddings. Here are the steps to calculate the cost.

  1. Calculate the input length (in characters).

    Let's add up the input character length for each hour.

    input character length for 16 documents = 16 x 1,000 = 16,000 characters per hour
  2. Go to AI Pricing and under OCI Generative AI, for Oracle Cloud Infrastructure Generative AI - Embed Cohere, find the <Embed-Cohere-unit-price> for 10,000 transactions.
    Gina uses the cohere.embed model which matches to the product, Oracle Cloud Infrastructure Generative AI - Embed Cohere on the AI Pricing page for Generative AI.
  3. Calculate the number of transactions per hour.

    Gina ingests 16,000 characters per hour. Prices are listed for 10,000 transactions.

    10,000 transactions = 10,000 characters, so 1 transaction = 1 character
    16,000 characters = 16,000 transactions
  4. Find the hourly price for the 16,000 characters that Gina ingests hourly.
    hourly price = 
    (16,000 transactions )/ (10,000 transactions) x $<Embed-Cohere-unit-price>
  5. Find the monthly price for the longest month of the year.
    One month = 31 x 24 hours = 744 hours
    monthly price = 744 hours x hourly price