Skip to content

Commit 13f0cb8

Browse files
REST embedder guide (#3164)
--------- Co-authored-by: Louis Dureuil <[email protected]>
1 parent 4d011f3 commit 13f0cb8

File tree

2 files changed

+372
-0
lines changed

2 files changed

+372
-0
lines changed

config/sidebar-learn.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,11 @@
4949
"label": "Use AI-powered search with user-provided embeddings",
5050
"slug": "search_with_user_provided_embeddings"
5151
},
52+
{
53+
"source": "learn/ai_powered_search/configure_rest_embedder.mdx",
54+
"label": "Configure a REST embedder",
55+
"slug": "configure_rest_embedder"
56+
},
5257
{
5358
"source": "learn/ai_powered_search/choose_an_embedder.mdx",
5459
"label": "Which embedder should I choose?",
Lines changed: 367 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,367 @@
1+
---
2+
title: Configure a REST embedder
3+
description: Create Meilisearch embedders using any provider with a REST API
4+
---
5+
6+
# Configure a REST embedder
7+
8+
You can integrate any text embedding generator with Meilisearch if your chosen provider offers a public REST API.
9+
10+
The process of integrating a REST embedder with Meilisearch varies depending on the provider and the way it structures its data. This guide shows you where to find the information you need, then walks you through configuring your Meilisearch embedder based on the information you found.
11+
12+
## Find your embedder provider's documentation
13+
14+
Each provider requires queries to follow a specific structure.
15+
16+
Before beginning to create your embedder, locate your provider's documentation for embedding creation. This should contain the information you need regarding API requests, request headers, and responses.
17+
18+
For example, [Mistral's embeddings documentation](https://docs.mistral.ai/api/#tag/embeddings) is part of their API reference. In the case of [Cloudflare's Workers AI](https://developers.cloudflare.com/workers-ai/models/bge-base-en-v1.5/#Parameters), expected input and response are tied to your chosen model.
19+
20+
## Set up the REST source and URL
21+
22+
Open your text editor and create an embedder object. Give it a name and set its source to `"rest"`:
23+
24+
```json
25+
{
26+
"EMBEDDER_NAME": {
27+
"source": "rest"
28+
}
29+
}
30+
```
31+
32+
Next, configure the URL Meilisearch should use to contact the embedding provider:
33+
34+
```json
35+
{
36+
"EMBEDDER_NAME": {
37+
"source": "rest",
38+
"url": "PROVIDER_URL"
39+
}
40+
}
41+
```
42+
43+
Setting an embedder name, a `source`, and a `url` is mandatory for all REST embedders.
44+
45+
## Configure the data Meilisearch sends to the provider
46+
47+
Meilisearch's `request` field defines the structure of the input it will send to the provider. The way you must fill this field changes for each provider.
48+
49+
For example, Mistral expects two mandatory parameters: `model` and `input`. It also accepts one optional parameter: `encoding_format`. Cloudflare instead only expects a single field, `text`.
50+
51+
### Choose a model
52+
53+
In many cases, your provider requires you to explicitly set which model you want to use to create your embeddings. For example, in Mistral, `model` must be a string specifying a valid Mistral model.
54+
55+
Update your embedder object adding this field and its value:
56+
57+
```json
58+
{
59+
"EMBEDDER_NAME": {
60+
"source": "rest",
61+
"url": "PROVIDER_URL",
62+
"request": {
63+
"model": "MODEL_NAME"
64+
}
65+
}
66+
}
67+
```
68+
69+
In Cloudflare's case, the model is part of the API route itself and doesn't need to be specified in your `request`.
70+
71+
### The embedding prompt
72+
73+
The prompt corresponds to the data that the provider will use to generate your document embeddings. Its specific name changes depending on the provider you chose. In Mistral, this is the `input` field. In Cloudflare, it's called `text`.
74+
75+
Most providers accept either a string or an array of strings. A single string will generate one request per document in your database:
76+
77+
```json
78+
{
79+
"EMBEDDER_NAME": {
80+
"source": "rest",
81+
"url": "PROVIDER_URL",
82+
"request": {
83+
"model": "MODEL_NAME",
84+
"input": "{{text}}"
85+
}
86+
}
87+
}
88+
```
89+
90+
`{{text}}` indicates Meilisearch should replace the contents of a field with your document data, as indicated in the embedder's [`documentTemplate`](/reference/api/settings#documenttemplate).
91+
92+
An array of strings allows Meilisearch to send up to 10 documents in one request, reducing the number of API calls to the provider:
93+
94+
```json
95+
{
96+
"EMBEDDER_NAME": {
97+
"source": "rest",
98+
"url": "PROVIDER_URL",
99+
"request": {
100+
"model": "MODEL_NAME",
101+
"input": [
102+
"{{text}}",
103+
"{{..}}"
104+
]
105+
}
106+
}
107+
}
108+
```
109+
110+
When using array prompts, the first item must be `{{text}}`. If you want to send multiple documents in a single request, the second array item must be `{{..}}`. When using `"{{..}}"`, it must be present in both `request` and `response`.
111+
112+
When using other embedding providers, `input` might be called something else, like `text` or `prompt`:
113+
114+
```json
115+
{
116+
"EMBEDDER_NAME": {
117+
"source": "rest",
118+
"url": "PROVIDER_URL",
119+
"request": {
120+
"model": "MODEL_NAME",
121+
"text": "{{text}}"
122+
}
123+
}
124+
}
125+
```
126+
127+
### Provide other request fields
128+
129+
You may add as many fields to the `request` object as you need. Meilisearch will include them when querying the embeddings provider.
130+
131+
For example, Mistral allows you to optionally configure an `encoding_format`. Set it by declaring this field in your embedder's `request`:
132+
133+
```json
134+
{
135+
"EMBEDDER_NAME": {
136+
"source": "rest",
137+
"url": "PROVIDER_URL",
138+
"request": {
139+
"model": "MODEL_NAME",
140+
"input": ["{{text}}", "{{..}}"],
141+
"encoding_format": "float"
142+
}
143+
}
144+
}
145+
```
146+
147+
## The embedding response
148+
149+
You must indicate where Meilisearch can find the document embeddings in the provider's response. Consult your provider's API documentation, paying attention to where it places the embeddings.
150+
151+
Cloudflare's embeddings are located in an array inside `response.result.data`. Describe the full path to the embedding array in your embedder's `response`. The first array item must be `"{{embedding}}"`:
152+
153+
```json
154+
{
155+
"EMBEDDER_NAME": {
156+
"source": "rest",
157+
"url": "PROVIDER_URL",
158+
"request": {
159+
"text": "{{text}}"
160+
},
161+
"response": {
162+
"result": {
163+
"data": ["{{embedding}}"]
164+
}
165+
}
166+
}
167+
}
168+
```
169+
170+
If the response contains multiple embeddings, use `"{{..}}"` as its second value:
171+
172+
```json
173+
{
174+
"EMBEDDER_NAME": {
175+
"source": "rest",
176+
"url": "PROVIDER_URL",
177+
"request": {
178+
"model": "MODEL_NAME",
179+
"input": [
180+
"{{text}}",
181+
"{{..}}"
182+
]
183+
},
184+
"response": {
185+
"data": [
186+
{
187+
"embedding": "{{embedding}}"
188+
},
189+
"{{..}}"
190+
]
191+
}
192+
}
193+
}
194+
```
195+
196+
When using `"{{..}}"`, it must be present in both `request` and `response`.
197+
198+
It is possible the response contains a single embedding outside of an array. Use `"{{embedding}}"` as its value:
199+
200+
```json
201+
{
202+
"EMBEDDER_NAME": {
203+
"source": "rest",
204+
"url": "PROVIDER_URL",
205+
"request": {
206+
"model": "MODEL_NAME",
207+
"input": "{{text}}"
208+
},
209+
"response": {
210+
"data": {
211+
"text": "{{embedding}}"
212+
}
213+
}
214+
}
215+
}
216+
```
217+
218+
It is also possible the response is a single item or array not nested in an object:
219+
220+
```json
221+
{
222+
"EMBEDDER_NAME": {
223+
"source": "rest",
224+
"url": "PROVIDER_URL",
225+
"request": {
226+
"model": "MODEL_NAME",
227+
"input": [
228+
"{{text}}",
229+
"{{..}}"
230+
]
231+
},
232+
"response": [
233+
"{{embedding}}",
234+
"{{..}}"
235+
]
236+
}
237+
}
238+
```
239+
240+
The prompt data type does not necessarily match the response data type. For example, Cloudflare always returns an array of embeddings, even if the prompt in your request was a string.
241+
242+
Meilisearch silently ignores `response` fields not pointing to an `"{{embedding}}"` value.
243+
244+
## The embedding header
245+
246+
Your provider might also request you to add specific headers to your request. For example, Azure's AI services require an `api-key` header containing an API key.
247+
248+
Add the `headers` field to your embedder object:
249+
250+
```json
251+
{
252+
"EMBEDDER_NAME": {
253+
"source": "rest",
254+
"url": "PROVIDER_URL",
255+
"request": {
256+
"text": "{{text}}"
257+
},
258+
"response": {
259+
"result": {
260+
"data": ["{{embedding}}"]
261+
}
262+
},
263+
"headers": {
264+
"FIELD_NAME": "FIELD_VALUE"
265+
}
266+
}
267+
}
268+
```
269+
270+
By default, Meilisearch includes a `Content-Type` header. It may also include an authorization bearer token, if you have supplied an API key.
271+
272+
## Configure remainder of the embedder
273+
274+
`source`, `request`, `response`, and `header` are the only fields specific to REST embedders.
275+
276+
Like other remote embedders, you're likely required to supply an `apiKey`:
277+
278+
```json
279+
{
280+
"EMBEDDER_NAME": {
281+
"source": "rest",
282+
"url": "PROVIDER_URL",
283+
"request": {
284+
"model": "MODEL_NAME",
285+
"input": ["{{text}}", "{{..}}"],
286+
"encoding_format": "float"
287+
},
288+
"response": {
289+
"data": [
290+
{
291+
"embedding": "{{embedding}}"
292+
},
293+
"{{..}}"
294+
]
295+
},
296+
"apiKey": "PROVIDER_API_KEY",
297+
}
298+
}
299+
```
300+
301+
You should also set a `documentTemplate`. Good templates are short and include only highly relevant document data:
302+
303+
```json
304+
{
305+
"EMBEDDER_NAME": {
306+
"source": "rest",
307+
"url": "PROVIDER_URL",
308+
"request": {
309+
"model": "MODEL_NAME",
310+
"input": ["{{text}}", "{{..}}"],
311+
"encoding_format": "float"
312+
},
313+
"response": {
314+
"data": [
315+
{
316+
"embedding": "{{embedding}}"
317+
},
318+
"{{..}}"
319+
]
320+
},
321+
"apiKey": "PROVIDER_API_KEY",
322+
"documentTemplate": "SHORT_AND_RELEVANT_DOCUMENT_TEMPLATE"
323+
}
324+
}
325+
```
326+
327+
## Update your index settings
328+
329+
Now the embedder object is complete, update your index settings:
330+
331+
```sh
332+
curl \
333+
-X PATCH 'MEILISEARCH_URL/indexes/INDEX_NAME/settings/embedders' \
334+
-H 'Content-Type: application/json' \
335+
--data-binary '{
336+
"EMBEDDER_NAME": {
337+
"source": "rest",
338+
"url": "PROVIDER_URL",
339+
"request": {
340+
"model": "MODEL_NAME",
341+
"input": ["{{text}}", "{{..}}"],
342+
},
343+
"response": {
344+
"data": [
345+
{
346+
"embedding": "{{embedding}}"
347+
},
348+
"{{..}}"
349+
]
350+
},
351+
"apiKey": "PROVIDER_API_KEY",
352+
"documentTemplate": "SHORT_AND_RELEVANT_DOCUMENT_TEMPLATE"
353+
}
354+
}'
355+
```
356+
357+
## Conclusion
358+
359+
In this guide you have seen a few examples of how to configure a REST embedder in Meilisearch. Though it used Mistral and Cloudflare, the general steps remain the same for all providers:
360+
361+
1. Find the provider's REST API documentation
362+
2. Identify the embedding creation request parameters
363+
3. Include parameters in your embedder's `request`
364+
4. Identify the embedding creation response
365+
5. Reproduce the path to the returned embeddings in your embedder's `response`
366+
6. Add any required HTTP headers to your embedder's `header`
367+
7. Update your index settings with the new embedder

0 commit comments

Comments
 (0)