|
| 1 | +--- |
| 2 | +title: Configure a REST embedder |
| 3 | +description: Create Meilisearch embedders using any provider with a REST API |
| 4 | +--- |
| 5 | + |
| 6 | +# Configure a REST embedder |
| 7 | + |
| 8 | +You can integrate any text embedding generator with Meilisearch if your chosen provider offers a public REST API. |
| 9 | + |
| 10 | +The process of integrating a REST embedder with Meilisearch varies depending on the provider and the way it structures its data. This guide shows you where to find the information you need, then walks you through configuring your Meilisearch embedder based on the information you found. |
| 11 | + |
| 12 | +## Find your embedder provider's documentation |
| 13 | + |
| 14 | +Each provider requires queries to follow a specific structure. |
| 15 | + |
| 16 | +Before beginning to create your embedder, locate your provider's documentation for embedding creation. This should contain the information you need regarding API requests, request headers, and responses. |
| 17 | + |
| 18 | +For example, [Mistral's embeddings documentation](https://docs.mistral.ai/api/#tag/embeddings) is part of their API reference. In the case of [Cloudflare's Workers AI](https://developers.cloudflare.com/workers-ai/models/bge-base-en-v1.5/#Parameters), expected input and response are tied to your chosen model. |
| 19 | + |
| 20 | +## Set up the REST source and URL |
| 21 | + |
| 22 | +Open your text editor and create an embedder object. Give it a name and set its source to `"rest"`: |
| 23 | + |
| 24 | +```json |
| 25 | +{ |
| 26 | + "EMBEDDER_NAME": { |
| 27 | + "source": "rest" |
| 28 | + } |
| 29 | +} |
| 30 | +``` |
| 31 | + |
| 32 | +Next, configure the URL Meilisearch should use to contact the embedding provider: |
| 33 | + |
| 34 | +```json |
| 35 | +{ |
| 36 | + "EMBEDDER_NAME": { |
| 37 | + "source": "rest", |
| 38 | + "url": "PROVIDER_URL" |
| 39 | + } |
| 40 | +} |
| 41 | +``` |
| 42 | + |
| 43 | +Setting an embedder name, a `source`, and a `url` is mandatory for all REST embedders. |
| 44 | + |
| 45 | +## Configure the data Meilisearch sends to the provider |
| 46 | + |
| 47 | +Meilisearch's `request` field defines the structure of the input it will send to the provider. The way you must fill this field changes for each provider. |
| 48 | + |
| 49 | +For example, Mistral expects two mandatory parameters: `model` and `input`. It also accepts one optional parameter: `encoding_format`. Cloudflare instead only expects a single field, `text`. |
| 50 | + |
| 51 | +### Choose a model |
| 52 | + |
| 53 | +In many cases, your provider requires you to explicitly set which model you want to use to create your embeddings. For example, in Mistral, `model` must be a string specifying a valid Mistral model. |
| 54 | + |
| 55 | +Update your embedder object adding this field and its value: |
| 56 | + |
| 57 | +```json |
| 58 | +{ |
| 59 | + "EMBEDDER_NAME": { |
| 60 | + "source": "rest", |
| 61 | + "url": "PROVIDER_URL", |
| 62 | + "request": { |
| 63 | + "model": "MODEL_NAME" |
| 64 | + } |
| 65 | + } |
| 66 | +} |
| 67 | +``` |
| 68 | + |
| 69 | +In Cloudflare's case, the model is part of the API route itself and doesn't need to be specified in your `request`. |
| 70 | + |
| 71 | +### The embedding prompt |
| 72 | + |
| 73 | +The prompt corresponds to the data that the provider will use to generate your document embeddings. Its specific name changes depending on the provider you chose. In Mistral, this is the `input` field. In Cloudflare, it's called `text`. |
| 74 | + |
| 75 | +Most providers accept either a string or an array of strings. A single string will generate one request per document in your database: |
| 76 | + |
| 77 | +```json |
| 78 | +{ |
| 79 | + "EMBEDDER_NAME": { |
| 80 | + "source": "rest", |
| 81 | + "url": "PROVIDER_URL", |
| 82 | + "request": { |
| 83 | + "model": "MODEL_NAME", |
| 84 | + "input": "{{text}}" |
| 85 | + } |
| 86 | + } |
| 87 | +} |
| 88 | +``` |
| 89 | + |
| 90 | +`{{text}}` indicates Meilisearch should replace the contents of a field with your document data, as indicated in the embedder's [`documentTemplate`](/reference/api/settings#documenttemplate). |
| 91 | + |
| 92 | +An array of strings allows Meilisearch to send up to 10 documents in one request, reducing the number of API calls to the provider: |
| 93 | + |
| 94 | +```json |
| 95 | +{ |
| 96 | + "EMBEDDER_NAME": { |
| 97 | + "source": "rest", |
| 98 | + "url": "PROVIDER_URL", |
| 99 | + "request": { |
| 100 | + "model": "MODEL_NAME", |
| 101 | + "input": [ |
| 102 | + "{{text}}", |
| 103 | + "{{..}}" |
| 104 | + ] |
| 105 | + } |
| 106 | + } |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +When using array prompts, the first item must be `{{text}}`. If you want to send multiple documents in a single request, the second array item must be `{{..}}`. When using `"{{..}}"`, it must be present in both `request` and `response`. |
| 111 | + |
| 112 | +When using other embedding providers, `input` might be called something else, like `text` or `prompt`: |
| 113 | + |
| 114 | +```json |
| 115 | +{ |
| 116 | + "EMBEDDER_NAME": { |
| 117 | + "source": "rest", |
| 118 | + "url": "PROVIDER_URL", |
| 119 | + "request": { |
| 120 | + "model": "MODEL_NAME", |
| 121 | + "text": "{{text}}" |
| 122 | + } |
| 123 | + } |
| 124 | +} |
| 125 | +``` |
| 126 | + |
| 127 | +### Provide other request fields |
| 128 | + |
| 129 | +You may add as many fields to the `request` object as you need. Meilisearch will include them when querying the embeddings provider. |
| 130 | + |
| 131 | +For example, Mistral allows you to optionally configure an `encoding_format`. Set it by declaring this field in your embedder's `request`: |
| 132 | + |
| 133 | +```json |
| 134 | +{ |
| 135 | + "EMBEDDER_NAME": { |
| 136 | + "source": "rest", |
| 137 | + "url": "PROVIDER_URL", |
| 138 | + "request": { |
| 139 | + "model": "MODEL_NAME", |
| 140 | + "input": ["{{text}}", "{{..}}"], |
| 141 | + "encoding_format": "float" |
| 142 | + } |
| 143 | + } |
| 144 | +} |
| 145 | +``` |
| 146 | + |
| 147 | +## The embedding response |
| 148 | + |
| 149 | +You must indicate where Meilisearch can find the document embeddings in the provider's response. Consult your provider's API documentation, paying attention to where it places the embeddings. |
| 150 | + |
| 151 | +Cloudflare's embeddings are located in an array inside `response.result.data`. Describe the full path to the embedding array in your embedder's `response`. The first array item must be `"{{embedding}}"`: |
| 152 | + |
| 153 | +```json |
| 154 | +{ |
| 155 | + "EMBEDDER_NAME": { |
| 156 | + "source": "rest", |
| 157 | + "url": "PROVIDER_URL", |
| 158 | + "request": { |
| 159 | + "text": "{{text}}" |
| 160 | + }, |
| 161 | + "response": { |
| 162 | + "result": { |
| 163 | + "data": ["{{embedding}}"] |
| 164 | + } |
| 165 | + } |
| 166 | + } |
| 167 | +} |
| 168 | +``` |
| 169 | + |
| 170 | +If the response contains multiple embeddings, use `"{{..}}"` as its second value: |
| 171 | + |
| 172 | +```json |
| 173 | +{ |
| 174 | + "EMBEDDER_NAME": { |
| 175 | + "source": "rest", |
| 176 | + "url": "PROVIDER_URL", |
| 177 | + "request": { |
| 178 | + "model": "MODEL_NAME", |
| 179 | + "input": [ |
| 180 | + "{{text}}", |
| 181 | + "{{..}}" |
| 182 | + ] |
| 183 | + }, |
| 184 | + "response": { |
| 185 | + "data": [ |
| 186 | + { |
| 187 | + "embedding": "{{embedding}}" |
| 188 | + }, |
| 189 | + "{{..}}" |
| 190 | + ] |
| 191 | + } |
| 192 | + } |
| 193 | +} |
| 194 | +``` |
| 195 | + |
| 196 | +When using `"{{..}}"`, it must be present in both `request` and `response`. |
| 197 | + |
| 198 | +It is possible the response contains a single embedding outside of an array. Use `"{{embedding}}"` as its value: |
| 199 | + |
| 200 | +```json |
| 201 | +{ |
| 202 | + "EMBEDDER_NAME": { |
| 203 | + "source": "rest", |
| 204 | + "url": "PROVIDER_URL", |
| 205 | + "request": { |
| 206 | + "model": "MODEL_NAME", |
| 207 | + "input": "{{text}}" |
| 208 | + }, |
| 209 | + "response": { |
| 210 | + "data": { |
| 211 | + "text": "{{embedding}}" |
| 212 | + } |
| 213 | + } |
| 214 | + } |
| 215 | +} |
| 216 | +``` |
| 217 | + |
| 218 | +It is also possible the response is a single item or array not nested in an object: |
| 219 | + |
| 220 | +```json |
| 221 | +{ |
| 222 | + "EMBEDDER_NAME": { |
| 223 | + "source": "rest", |
| 224 | + "url": "PROVIDER_URL", |
| 225 | + "request": { |
| 226 | + "model": "MODEL_NAME", |
| 227 | + "input": [ |
| 228 | + "{{text}}", |
| 229 | + "{{..}}" |
| 230 | + ] |
| 231 | + }, |
| 232 | + "response": [ |
| 233 | + "{{embedding}}", |
| 234 | + "{{..}}" |
| 235 | + ] |
| 236 | + } |
| 237 | +} |
| 238 | +``` |
| 239 | + |
| 240 | +The prompt data type does not necessarily match the response data type. For example, Cloudflare always returns an array of embeddings, even if the prompt in your request was a string. |
| 241 | + |
| 242 | +Meilisearch silently ignores `response` fields not pointing to an `"{{embedding}}"` value. |
| 243 | + |
| 244 | +## The embedding header |
| 245 | + |
| 246 | +Your provider might also request you to add specific headers to your request. For example, Azure's AI services require an `api-key` header containing an API key. |
| 247 | + |
| 248 | +Add the `headers` field to your embedder object: |
| 249 | + |
| 250 | +```json |
| 251 | +{ |
| 252 | + "EMBEDDER_NAME": { |
| 253 | + "source": "rest", |
| 254 | + "url": "PROVIDER_URL", |
| 255 | + "request": { |
| 256 | + "text": "{{text}}" |
| 257 | + }, |
| 258 | + "response": { |
| 259 | + "result": { |
| 260 | + "data": ["{{embedding}}"] |
| 261 | + } |
| 262 | + }, |
| 263 | + "headers": { |
| 264 | + "FIELD_NAME": "FIELD_VALUE" |
| 265 | + } |
| 266 | + } |
| 267 | +} |
| 268 | +``` |
| 269 | + |
| 270 | +By default, Meilisearch includes a `Content-Type` header. It may also include an authorization bearer token, if you have supplied an API key. |
| 271 | + |
| 272 | +## Configure remainder of the embedder |
| 273 | + |
| 274 | +`source`, `request`, `response`, and `header` are the only fields specific to REST embedders. |
| 275 | + |
| 276 | +Like other remote embedders, you're likely required to supply an `apiKey`: |
| 277 | + |
| 278 | +```json |
| 279 | +{ |
| 280 | + "EMBEDDER_NAME": { |
| 281 | + "source": "rest", |
| 282 | + "url": "PROVIDER_URL", |
| 283 | + "request": { |
| 284 | + "model": "MODEL_NAME", |
| 285 | + "input": ["{{text}}", "{{..}}"], |
| 286 | + "encoding_format": "float" |
| 287 | + }, |
| 288 | + "response": { |
| 289 | + "data": [ |
| 290 | + { |
| 291 | + "embedding": "{{embedding}}" |
| 292 | + }, |
| 293 | + "{{..}}" |
| 294 | + ] |
| 295 | + }, |
| 296 | + "apiKey": "PROVIDER_API_KEY", |
| 297 | + } |
| 298 | +} |
| 299 | +``` |
| 300 | + |
| 301 | +You should also set a `documentTemplate`. Good templates are short and include only highly relevant document data: |
| 302 | + |
| 303 | +```json |
| 304 | +{ |
| 305 | + "EMBEDDER_NAME": { |
| 306 | + "source": "rest", |
| 307 | + "url": "PROVIDER_URL", |
| 308 | + "request": { |
| 309 | + "model": "MODEL_NAME", |
| 310 | + "input": ["{{text}}", "{{..}}"], |
| 311 | + "encoding_format": "float" |
| 312 | + }, |
| 313 | + "response": { |
| 314 | + "data": [ |
| 315 | + { |
| 316 | + "embedding": "{{embedding}}" |
| 317 | + }, |
| 318 | + "{{..}}" |
| 319 | + ] |
| 320 | + }, |
| 321 | + "apiKey": "PROVIDER_API_KEY", |
| 322 | + "documentTemplate": "SHORT_AND_RELEVANT_DOCUMENT_TEMPLATE" |
| 323 | + } |
| 324 | +} |
| 325 | +``` |
| 326 | + |
| 327 | +## Update your index settings |
| 328 | + |
| 329 | +Now the embedder object is complete, update your index settings: |
| 330 | + |
| 331 | +```sh |
| 332 | +curl \ |
| 333 | + -X PATCH 'MEILISEARCH_URL/indexes/INDEX_NAME/settings/embedders' \ |
| 334 | + -H 'Content-Type: application/json' \ |
| 335 | + --data-binary '{ |
| 336 | + "EMBEDDER_NAME": { |
| 337 | + "source": "rest", |
| 338 | + "url": "PROVIDER_URL", |
| 339 | + "request": { |
| 340 | + "model": "MODEL_NAME", |
| 341 | + "input": ["{{text}}", "{{..}}"], |
| 342 | + }, |
| 343 | + "response": { |
| 344 | + "data": [ |
| 345 | + { |
| 346 | + "embedding": "{{embedding}}" |
| 347 | + }, |
| 348 | + "{{..}}" |
| 349 | + ] |
| 350 | + }, |
| 351 | + "apiKey": "PROVIDER_API_KEY", |
| 352 | + "documentTemplate": "SHORT_AND_RELEVANT_DOCUMENT_TEMPLATE" |
| 353 | + } |
| 354 | + }' |
| 355 | +``` |
| 356 | + |
| 357 | +## Conclusion |
| 358 | + |
| 359 | +In this guide you have seen a few examples of how to configure a REST embedder in Meilisearch. Though it used Mistral and Cloudflare, the general steps remain the same for all providers: |
| 360 | + |
| 361 | +1. Find the provider's REST API documentation |
| 362 | +2. Identify the embedding creation request parameters |
| 363 | +3. Include parameters in your embedder's `request` |
| 364 | +4. Identify the embedding creation response |
| 365 | +5. Reproduce the path to the returned embeddings in your embedder's `response` |
| 366 | +6. Add any required HTTP headers to your embedder's `header` |
| 367 | +7. Update your index settings with the new embedder |
0 commit comments