-
Notifications
You must be signed in to change notification settings - Fork 163
Description
Performance in general is on my radar as things to tackle next, as this library gains traction, and the top of a 1.0 release checklist.
In general after some thought I don't think caching / memoization is the right way to tackle this. A few reasons why:
- it requires careful thought about how it behaves under concurrency, specifically with respect to memory visibility
- could have a big memory footprint on large codebases with a lot of composite dataclasses, and potentially duplicated across threads!
- immutability -- should the cached object be mutable / how can we protect it from changes?
Instead, I think an approach involving code generation is the way to go -- similar to how the dataclasses core module itself is implemented. When you think about it, a schema is only generated once and known at "module-load time". In other languages we might call this "compile-time". We can see the code-generation approach utilized in codec schema libraries in other languages, be it json or even other data-interchange formats like protobuf
Going this route, the schema now is loaded as just more code, so to speak, instead of living in memory.