Scenario 6: Lighter Embeddings Workload Benchmarks in Generative AI
The lighter embeddings scenario is similar to the text embeddings scenario 5, except that we reduce the size of each request to 16 documents, each with 512 tokens. Smaller files with fewer words could be supported by scenario 6.
Review the terms used in the hosting dedicated AI cluster benchmarks. For a list of scenarios and their descriptions, see Text Embedding Scenarios. The text embedding scenario is performed in the following region.
Brazil East (Sao Paulo)
- Model:
cohere.embed-english-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster -
Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM) 1 1.19 54 8 1.41 348 32 3.47 600 128 12.08 558 - Model:
cohere.embed-multilingual-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster -
Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM) 1 1.28 42 8 1.38 288 32 3.44 497 128 11.94 702
Germany Central (Frankfurt)
- Model:
cohere.embed-english-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster -
Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM) 1 1.19 54 8 1.41 348 32 3.47 600 128 12.08 558 - Model:
cohere.embed-multilingual-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster -
Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM) 1 1.28 42 8 1.38 288 32 3.44 497 128 11.94 702
UK South (London)
- Model:
cohere.embed-english-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster -
Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM) 1 1.19 54 8 1.41 348 32 3.47 600 128 12.08 558 - Model:
cohere.embed-multilingual-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster -
Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM) 1 1.28 42 8 1.38 288 32 3.44 497 128 11.94 702
US Midwest (Chicago)
- Model:
cohere.embed-english-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster -
Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM) 1 1.19 54 8 1.41 348 32 3.47 600 128 12.08 558 - Model:
cohere.embed-english-light-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster -
Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM) 1 0.85 48 8 1.15 354 32 3.15 594 128 11.26 846 - Model:
cohere.embed-multilingual-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster -
Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM) 1 1.28 42 8 1.38 288 32 3.44 497 128 11.94 702 - Model:
cohere.embed-multilingual-light-v3.0
hosted on one Embed Cohere unit of a dedicated AI cluster -
Concurrency Request-level Latency (second) Request-level Throughput (Request per minute) (RPM) 1 1.03 54 8 1.35 300 32 3.11 570 128 11.50 888