Skip to content
This repository was archived by the owner on Jul 22, 2019. It is now read-only.

#88 Support the addition of new fields to an existing schema. #275

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

JonathanWylie
Copy link

  • Especially ListFields and DictFields

@djc
Copy link
Owner

djc commented Mar 15, 2016

I'm just not sure of the behavior where the data set in the Mapping object is updated into the original to-be-wrapped object. Can you explain to me why that makes sense? Conceptually, it seems like wrapping should be a one-way street: the new object gets data from the mapping that's passed in.

@djc
Copy link
Owner

djc commented Mar 15, 2016

(Also, it looks like your changes made the tests fail. Can you look into that?)

@JonathanWylie
Copy link
Author

OK so I have been writing loads of stuff to explain to you, and in the process I think I have got to the real crux of the problem.

The architecture handles defaults in two different ways.

  • Virtual default not written to the db
  • Static default which is written to the db

When a document is initialized, then all the defaults are hard written into the document, so you can see them in document._data, and those defaults will be written to the db when the document is stored.

When a document is loaded from the database, then any missing defaults from the document are not hard written into the document. (My suggested fix changes that)

It also handles defaults in Field.__get__, if the field is not in the doc, then it just returns the default.
It seems to be trying to give you a default without writing it to the document._data. This is a different kind of default behavior, where it just gives the caller the default value, but the default value never get's written to the DB.
This is in principle a nicer way to handle defaults, because you can just change the default in code if you want, and everything which has not got a value set will just work with the new default.

Unfortunately it is hard to make that kind of defaulting work with this architecture.
It means you only ever change instance._data on __set__, which is fine until you have submappings like a DictField. Setting something in a DictField mapping, will only change it's _data, and if the DictField is not in it's parents _data (because it was just defaults before) then you loose the change. So you would need to find a way to ripple the set up so that the DictField mappings _data gets attached to it's parents _data once it is not just defaults anymore.
That goes for Lists as well, you need to find a way to ripple changes made through the ListProxy, to update the parent fields _data.

I had a go at bubbling up the sets which does work, but is a horrible hack,
but it does prove that an entirely "virtual" default behavior is possible,

So I recommend removing the default handling in __get__, and just set the defaults as we do on initializing of the Mapping, and as we do now in the wraps.

Incidentally there is a bug in Field.__get__ anyway, it should return self._to_python(default). So that you get the ListProxy or MappingProxy, instead of a real [] or {}.

I am going to go have a look at the tests see what's going on. I didn't realise we have a mix of doc tests and stand alone tests.

@JonathanWylie
Copy link
Author

To answer about your question about wrap itself.
If you want all new schema defaults to be added in initialization in the same as as they are when we make a new document. Then it needs to add something.
eg

  • I have a hand with 4 fingers
  • I wear a five fingered glove
  • I want to use the fifth finger
  • So the glove attaches the fifth finger to my hand

…setting the default values for fields in the schema when wrapping data loaded from the DB.

 - Especially ListFields and DictFields
…ead of overloading the descriptor __get__ method.

 - The _get_ method should return values in python types, eg a date time is a datetime.datetime instance
 - The default values need to be in basic json types, eg [1,2,3] , {}, str, int which can be serialized to the doc
 - We don't need to get the default values in __get__ anymore because the defaults are set on initialisation of the mapping, or on wrap in a mapping.
@JonathanWylie
Copy link
Author

I know understand the bug in

Field.__get__
because it was overloading get to

  • Get the python object for a field
  • Also get the json values when it needed the default.

So it looks like the "virtual" default style was never really intended, it just looked like it because it would return you something, even though it was the wrong type. You mentioned that in the initial issue you were getting [] instead of a ListProxy.
This was a side effect of a strange implementation to get the default values on initializing a mapping.

I have fixed the strange implementation to something that doesn't have weird side effects in the branch too.

…ms. These aren't massive so it doesn't really matter.
 - Especially ListFields and DictFields
 - The _to_json need to be idempotent to support string values coming from wraps, and python type values eg DateTime coming from user code.
@JonathanWylie
Copy link
Author

OK so even though the tests pass, I came across another case where this solution doesn't work.
In particular, if you have a ListField(DictField(Mapping.build(......schema....))), then change the schema. Any instances of the old DictField in the ListField were not being updated with a new schema.

So I went back to the old way of getting the defaults set.
It means some changes to the date and time field _to_json methods.
If the data comes from user code, we expect the field to be in a date,datetime,time python type.
But if it comes from wraps, then it is not in the correct type, because they are just strings.
So just make sure they are the right type in _to_json before converting them.
The other basic fields are fine, because bool(bool(True)) doesn't fail for example.

Now the tests pass, using the original solution.

@djc
Copy link
Owner

djc commented Mar 16, 2016

Wow, that's incredibly thorough! Thank you for really diving into this. In part, this is generating some more questions for me, so I hope you'll bear with me.

  • How is this class method actually being used? To me, it looks like being used in Document.load() and Document._wrap_row(), which seems sensible enough, and the _to_python methods.
  • As for what you call virtual and static (which I guess you could also call lazy and eager, maybe?), so it seems you're saying that the code was designed to be more static, materializing defaults eagerly? But at some point maybe some code was added which tries to be virtual?
  • It kind of makes sense to me that we materialize defaults eagerly on document creation, but that we try to be lazy if we're loading something from the database: it feels bad to mutate the retrieved document kind of implicitly. On the other hand, we still have similar problems where the defaults could shift out from under us as they are changed in the code (in a way, this could still happen if the materialized defaults don't get saved, but there's nothing we can do about that, I guess).

Does this make sense to you? Does it match what your changes do?

As for the actual commits, I'd like to have a clean commit series with small commits moving the behavior to what it should be. Also, having extensive commit messages is great, but please keep the first line of the commit message at about ~70 characters, then add an empty line, then continue the story.

I hope you still have time/energy to work on this!

@JonathanWylie
Copy link
Author

So yes you have understood pretty well what's going on.
Except my changes have not implemented lazy defaults if we're loading something from the database. In both cases when loading and when creating anew document defaults are handled eagerly.

It seems bad to handle defaults in an inconsistent way. I think they should either always be lazy or always eager.
I don't really understand the difference between loading a document {some document} from the db, and initializing {some document} from code. Why is it OK for one of them to implicitly change the document, and one not?
If you don't like implicit changes then best to go with the lazy defaults.

When I was considering the pros and cons between them I think that the eager solution is better. This is because there are cases when you want to change a default only for new records, not for old one's. Which is almost impossible to do with a lazy default system. However with a eager default system, if you change the default, it's not that hard to update old records if you do want to change the defaults which have been written to old documents.
So the eager defaults gives more flexibility.

Also:
It's considerably easier to implement eager defaults.
I did come up with a possible way of implementing lazy defaults, but it is a horrible hack.

@djc
Copy link
Owner

djc commented Mar 18, 2016

Okay, I agree that the eager behavior is better, I was just working through the pros and cons before.

Could you clean up the commits a bit along the lines that I suggested? If not, I can do that.

@JonathanWylie
Copy link
Author

Yes I can clean it up. Just didn't want to do it before we had agreed the behavior you want.
I will do it sometime this week if that's OK.

@djc
Copy link
Owner

djc commented Mar 22, 2016

Sounds great!

@djc
Copy link
Owner

djc commented Mar 26, 2016

Hey, did you get a chance to work on this? If not, I'll just clean it up. :)

@JonathanWylie
Copy link
Author

Sorry I was really busy last week.

@djc
Copy link
Owner

djc commented Apr 27, 2016

Hey @JonathanWylie, do you think you'll have time in the near future to clean this up a little? If not, that's fine too, just let me know and I'll try to dig into it myself.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants