Skip to content

Allowing alternative InferenceData/Dataset storage #16

@sethaxen

Description

@sethaxen

Currently all Datasets wrap a Dimstack, which in turn wraps a set of NamedTuples. Likewise, InferenceData wraps a NamedTuple of Datasets. As noted in #15, this makes everything but the array values themselves immutable.

An alternative is to decouple the interface for Dataset and InferenceData from the storage. e.g. we could define NamedTupleStore, DictStore, NCDatasetStore, etc and implement the InferenceData/Dataset interfaces for each of these. This would allow users to strongly type everything with NamedTupleStore if they want to, or they could use a DictStore to have much more dynamic access. Having a NCDatasetStore would allow users to open a NetCDF file as an InferenceData and even incrementally write to such an InferenceData, having the NetCDF file automatically updated.

With a well-designed API for stores, this would actually look a lot like the InferenceData API proposed in arviz-devs/ArviZ.jl#154. Namely, the store API could be implemented for a type like MCMCChains.Chains, and then one could call InferenceData(chains) to (inefficiently) view it as an InferenceData. When efficiency is needed, one just converts to one of the native stores.

I started work on a small prototype of this. It's a fair amount of surgery to the existing code and increases the code complexity. Though it will be more work to do it then than now, I think it's better to hold off on this until later.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions