Scenario 5: Text Embeddings Benchmarks in Generative AI

The text embedding scenario mimics embedding generation as part of the data ingestion pipeline of a vector database.

The text embedding scenario is only applicable to the embedding models. In this scenario, all requests are the same size, which is 96 documents, each one with 512 tokens. An example would be a collection of large PDF files, each file with 30,000+ words that a user wants to ingest into a vector DB.

Review the terms used in the hosting dedicated AI cluster benchmarks. For a list of scenarios and their descriptions, see Text Embedding Scenarios. The text embedding scenario is performed in the following region.

Brazil East (Sao Paulo)

Model: cohere.embed-english-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	2.53	24
8	4.35	108
32	14.93	120
128	47.66	150

Model: cohere.embed-multilingual-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	2.25	24
8	4.33	120
32	14.94	144
128	49.21	198

Germany Central (Frankfurt)

Model: cohere.embed-english-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	2.53	24
8	4.35	108
32	14.93	120
128	47.66	150

Model: cohere.embed-multilingual-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	2.25	24
8	4.33	120
32	14.94	144
128	49.21	198

UK South (London)

Model: cohere.embed-english-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	2.53	24
8	4.35	108
32	14.93	120
128	47.66	150

Model: cohere.embed-multilingual-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	2.25	24
8	4.33	120
32	14.94	144
128	49.21	198

US Midwest (Chicago)

Model: cohere.embed-english-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	2.53	24
8	4.35	108
32	14.93	120
128	47.66	150

Model: cohere.embed-english-light-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	1.75	30
8	3.93	108
32	14.44	113
128	48.00	120

Model: cohere.embed-multilingual-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	2.25	24
8	4.33	120
32	14.94	144
128	49.21	198

Model: cohere.embed-multilingual-light-v3.0 hosted on one Embed Cohere unit of a dedicated AI cluster


Concurrency	Request-level Latency (second)	Request-level Throughput (Request per minute) (RPM)
1	1.69	42
8	3.80	118
32	14.26	126
128	37.17	138

Oracle Cloud Infrastructure Documentation

Scenario 5: Text Embeddings Benchmarks in Generative AI

Brazil East (Sao Paulo)

Germany Central (Frankfurt)

UK South (London)

US Midwest (Chicago)