Supported Models
Generative AI supports its large language models based on each model's type and serving mode. The large language models serve user requests in either an on-demand serving mode or a dedicated serving mode. Review the following sections to learn about deprecation times and to decide what serving mode works best for you.
- On-demand Serving Mode
-
- User pays for each inference call based on the length of the prompt plus the length of the response for each request.
- Available for pretrained foundational models only.
- The models run on shared GPU resources.
- The price differs for small and large models.
- When the new version releases, user gets an overlapping period where both versions are supported, to gracefully migrate to the new version.
- In this overlapping time, the old version goes to a deprecating state. After the deprecating state ends, Generative AI no longer supports the old version.
Serving Mode Model Tag Duration in Deprecating State On Demand cohere.command
Default 1 month On Demand cohere.embed
Default 6 months On Demand meta.llama-2
Default 1 months - Dedicated Serving Mode
-
- User pays for dedicated clusters that run 24/7 during the user's committed time.
- Available for pretrained out-of-the-box models and custom models.
- The GPU resources are dedicated to the user and not shared.
- The price depends on the model and the number of instances that the user selects when they create or update their cluster.
- User can expect predictable latency and throughput for model performances.
- When the new version releases, user gets an overlapping period where both versions are supported to gracefully migrate to the new version.
- Because each hosting dedicated AI cluster can only host the same version of a model, users can decide the model version that their dedicated AI cluster hosts and not migrate within the overlapping time period. Instead, users can request a long-term support on select model versions.
- The following table displays the maximum time that users can keep a model version alive.
Serving Mode | Model | Tag | Model Keep Alive Time Window |
---|---|---|---|
Dedicated | cohere.command |
Default | 3 months |
Dedicated | cohere.command |
Long-term Support | Up to 18 months |
Dedicated | cohere.embed |
Default | 6 months |
Dedicated | cohere.embed |
Long-term Support | Up to 18 months |
Dedicated | meta.llama-2 |
Default | 3 months |
Dedicated | meta.llama-2 |
Long-term Support | Up to 18 months |
Note
Deprecation times might change in the future.
Deprecation times might change in the future.