Skip to content

Conversation

miguelgrinberg
Copy link
Contributor

@miguelgrinberg miguelgrinberg commented Sep 22, 2025

This change adds a few features that support the use of Pydantic models with the DSL module, instead of the standard models defined as subclasses of the AsyncDocument class.

As part of this work some additions have been made to the typing implementation of DSL documents.

  • Support for the Annotated syntax when defining document fields in the DSL module. Examples:
class TypedDocAnnotated(AsyncDocument):
    ip: Annotated[Optional[str], field.Ip()]
    k1: Annotated[str, field.Keyword(required=True)]
    k2: Annotated[M[str], field.Keyword()]
    k3: Annotated[str, mapped_field(field.Keyword(), default="foo")]
  • Option to exclude a class variable from the list of attributes used to create the ES mapping:
class Doc(AsyncDocument):
    some_var: str = mapped_field(exclude=True)
  • New BaseESModel and AsyncBaseESModel classes that inherit from Pydantic's BaseModel and add Elasticsearch superpowers. In particular, any model defined with one of these as its base class will have meta and _doc private attributes and to_doc() and from_doc() methods. The meta attribute includes metadata for each document, things such as id or score. The _doc attribute is a dynamically generated Document or AsyncDocument instance that can be used whenever access to the Elasticsearch index is needed. The methods convert between Pydantic models and ES documents.

    Aside from the extra attributes, this class works exactly like BaseModel and can be used to define data attributes and their validation rules, and the ES document is derived from them automatically. In particular, this class can be used in FastAPI routes, as shown in the quotes example included in this PR. Any annotations intended for the DSL module can be included in the Annotated[] type hint of the respective fields. The Index inner class can be included as well.

class Quote(BaseESModel):
    quote: str
    author: Annotated[str, dsl.Keyword()]
    tags: Annotated[list[str], dsl.Keyword()]
    embedding: Annotated[list[float], dsl.DenseVector()] = Field(init=False, default=[])

    class Index:
        name = 'quotes'

@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch 5 times, most recently from 1f3f66c to b9ada0f Compare September 24, 2025 11:59
@miguelgrinberg miguelgrinberg changed the title Support Annotated typing hint Pydantic integration Sep 24, 2025
@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch 7 times, most recently from fe23bb4 to 8902d66 Compare September 26, 2025 11:29
Copy link

github-actions bot commented Sep 26, 2025

🔍 Preview links for changed docs

@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch 4 times, most recently from c15e7b5 to 4884de8 Compare September 26, 2025 13:24
@miguelgrinberg miguelgrinberg marked this pull request as ready for review September 26, 2025 13:24
@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch from 4884de8 to b19fa58 Compare September 26, 2025 15:04
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM. I only have comments on the example app. I only skimmed the frontend code.

Comment on lines +56 to +63
doc = None
try:
doc = await Quote._doc.get(id)
except NotFoundError:
pass
if not doc:
raise HTTPException(status_code=404, detail="Item not found")
return Quote.from_doc(doc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this code the same?

Suggested change
doc = None
try:
doc = await Quote._doc.get(id)
except NotFoundError:
pass
if not doc:
raise HTTPException(status_code=404, detail="Item not found")
return Quote.from_doc(doc)
try:
doc = await Quote._doc.get(id)
return Quote.from_doc(doc)
except NotFoundError:
raise HTTPException(status_code=404, detail="Item not found")

This also applies to get_quote and delete_quote.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is close, but not the same. My version doesn't raise any typing errors, yours does. The problem is that our AsyncDocument.get() method can return None if the document is not found and AsyncElasticsearch.get() is not configured to raise the not found exception. That makes all uses of the get() method awkward, because you always have to account for the possible None.

async def create_quote(req: Quote) -> Quote:
embed_quotes([req])
doc = req.to_doc()
doc.meta.id = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for security. We do not want the client to decide what the id should be, so by blanking it before calling save() we make sure if they provided one it is discarded.

Quotes database example, which demonstrates the Elasticsearch integration with
Pydantic models. This example features a React frontend and a FastAPI back end.

![Quotes app screenshot](screenshot.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great quotes! Do you maybe have an example that does rely on embeddings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what you mean here. This example uses embeddings, both on their own and combined with BM25 search.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, the results would have been the same with BM25 only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe there is something else than "dogs and books" that we can use. That's just a nitpick.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, you are talking about the screenshot. In fact, "dogs and books" does not match the Groucho Marx quote at the top when using BM25, because that quote has "dog" and "book" in it in singular. I have to test this, but maybe using "canine" instead of dog makes the example more clear and returns the same results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "books and pets" search term returns a few good ones, including the one from Groucho Marx at the top of the list.

@miguelgrinberg miguelgrinberg force-pushed the dsl-support-annotated-syntax branch from b19fa58 to d2638e6 Compare October 17, 2025 16:21
@miguelgrinberg miguelgrinberg merged commit f842d9b into elastic:main Oct 20, 2025
15 checks passed
@miguelgrinberg miguelgrinberg deleted the dsl-support-annotated-syntax branch October 20, 2025 13:29
github-actions bot pushed a commit that referenced this pull request Oct 20, 2025
* Support `Annotated` typing hint

* Add option to exclude DSL class field from mapping

* Pydantic integration with the BaseESModel class

* object and nested fields

* complete CRUD example

* Use a smaller dataset

* documentation

* Update examples/quotes/backend/pyproject.toml

Co-authored-by: Quentin Pradet <[email protected]>

* Update examples/quotes/backend/quotes.py

Co-authored-by: Quentin Pradet <[email protected]>

* Use a better screenshot

---------

Co-authored-by: Quentin Pradet <[email protected]>
(cherry picked from commit f842d9b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants