Retiring the Models
OCI Generative AI retires its large language models (LLMs) based on each model's type and serving mode. The LLMs serve user requests in either an on-demand serving mode or a dedicated serving mode. Review the following sections to learn about deprecation and removal times and to decide what serving mode works best for you.
About Retirement and Deprecation
- Retirement
- When a model is retired, it's no longer available for use in the Generative AI service.
- Deprecation
- When a model is deprecated it remains available in the Generative AI service, but will have a defined amount of time that it can be used before it's retired.
- On-Demand Serving Mode
-
The on-demand serving mode is available only for ready-to-use pretrained foundational models and has the following characteristics:
- You pay as you go for each inference call when using the playground or the Chat API.
- When OCI Generative AI releases a new model version or family, you might get an overlapping period during which both versions or two families of the same model are supported until the older model version or family is retired.
- All model family and versions are not available in all supported OCI regions. For the available models in each region, see the key features in Pretrained Foundational Models in Generative AI.
- Dedicated Serving Mode
-
The dedicated serving mode is available for custom and pretrained foundational models and has the following characteristics:
- You get a dedicated set of GPUs.
- You can fine-tune custom models on the dedicated AI clusters.
- You can host replicas of foundational and fine-tuned models on the dedicated AI clusters.
- You commit in advance to certain hours of using the dedicated AI clusters. For prices, see the pricing page.
- Because each hosting dedicated AI cluster can only host the same version of each model, if you decide to keep using the model version that the dedicated AI cluster is already hosting and not migrate within the overlapping time period, you can request long-term support for that version.
- Existing endpoints will continue to run.
Important
If you need a dedicated serving mode model to stay alive longer than the retirement date, create a support ticket.
All models supported by the on-demand serving mode that use the text generation and summarization APIs (including the playground) are now retired.
The following table shows the retirement dates for models supported for the on-demand serving mode.
Model | Release Date | Retirement Date | Suggested Replacement Options |
---|---|---|---|
meta.llama-3.1-405b-instruct |
2024-09-19 | At least one month after the release of the 1st replacement model. | Tentative |
meta.llama-3.1-70b-instruct |
2024-09-19 | No sooner than 2025-02-21 | Tentative |
cohere.command-r-plus |
2024-06-18 | At least one month after the release of the 1st replacement model. | Tentative |
cohere.command-r-16k |
2024-06-04 | At least one month after the release of the 1st replacement model. | Tentative |
cohere.embed-english-v3.0 |
2024-02-07 | At least 6 months after the release of the 1st replacement model. | Tentative |
cohere.embed-multilingual-v3.0 |
2024-02-07 | At least 6 months after the release of the 1st replacement model. | Tentative |
cohere.embed-english-light-v3.0 |
2024-02-07 | At least 6 months after the release of the 1st replacement model. | Tentative |
cohere.embed-multilingual-light-v3.0 |
2024-02-07 | At least 6 months after the release of the 1st replacement model. | Tentative |
meta.llama-3-70b-instruct |
2024-06-04 | 2024-11-12 |
|
cohere.command |
2024-02-07 | 2024-10-02 |
|
cohere.command-light |
2024-02-07 | 2024-10-02 |
|
meta.llama-2-70b-chat |
2024-01-22 | 2024-10-02 |
|
Deprecation times might change in the future.
If you need a dedicated serving mode model to stay alive longer than the retirement date, create a support ticket.
The following table shows the retirement dates for models supported for the dedicated serving mode.
Model | Release Date | Retirement Date | Suggested Replacement Options |
---|---|---|---|
meta.llama-3.1-405b-instruct |
2024-09-19 | At least 6 months after the release of the 1st replacement model. | Tentative |
meta.llama-3.1-70b-instruct |
2024-09-19 | At least 6 months after the release of the 1st replacement model. | Tentative |
cohere.command-r-plus |
2024-06-18 | At least 6 months after the release of the 1st replacement model. | Tentative |
cohere.command-r-16k |
2024-06-04 | At least 6 months after the release of the 1st replacement model. | Tentative |
cohere.embed-english-v3.0 |
2024-02-07 | At least 6 months after the release of the 1st replacement model. | Tentative |
cohere.embed-multilingual-v3.0 |
2024-02-07 | At least 6 months after the release of the 1st replacement model. | Tentative |
cohere.embed-english-light-v3.0 |
2024-02-07 | At least 6 months after the release of the 1st replacement model. | Tentative |
cohere.embed-multilingual-light-v3.0 |
2024-02-07 | At least 6 months after the release of the 1st replacement model. | Tentative |
meta.llama-3-70b-instruct |
2024-06-04 | No sooner than 2025-03-19 |
|
cohere.command |
2024-02-07 | No sooner than 2025-01-18 |
|
cohere.command-light |
2024-02-07 | No sooner than 2025-01-04 |
|
meta.llama-2-70b-chat |
2024-01-22 | No sooner than 2025-01-04 |
|
Deprecation times might change in the future.
The Generative AI service strives to mitigate quickly against any security issues or bug fixes that are present for any of the supported pretrained foundational models. Check the release notes to learn if you need to migrate to a different version.