-
Notifications
You must be signed in to change notification settings - Fork 63
feat: raise NoDefaultIndexError from read_gbq on clustered/partitioned tables with no index_col or filters set
#631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: raise NoDefaultIndexError from read_gbq on clustered/partitioned tables with no index_col or filters set
#631
Conversation
…rtitioned-default-index-error
…ed-default-index-error
NoDefaultIndexError from read_gbq on clustered/partitioned tables with no index_col set. Set force=True to allow default sequential indexNoDefaultIndexError from read_gbq on clustered/partitioned tables with no index_col set. Set bigframes.options.compute.allow_default_index_on_clustered_partitioned_tables = True to allow default sequential index
…or-partitioned-default-index-error
This ensures the cached `primary_keys` is more likely to be correct, in case the user called ALTER TABLE after we originally cached the snapshot time.
…tered-or-partitioned-default-index-error
NoDefaultIndexError from read_gbq on clustered/partitioned tables with no index_col set. Set bigframes.options.compute.allow_default_index_on_clustered_partitioned_tables = True to allow default sequential indexNoDefaultIndexError from read_gbq on clustered/partitioned tables with no index_col set
…or-partitioned-default-index-error
…ve to separate module add todos
NoDefaultIndexError from read_gbq on clustered/partitioned tables with no index_col setNoDefaultIndexError from read_gbq on clustered/partitioned tables with no index_col or filters set
| **New in bigframes version 1.4.0**: Support | ||
| :class:`bigframes.enums.TypeKind` to override default index | ||
| behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TypeKind or DefaultIndexKind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultIndexKind. Good catch. I had tried a few names for this but the refactor option didn't catch docstrings I think.
| rows via pandas-like outer join behavior. Operations like | ||
| ``cumsum()`` that window across a non-unique index can have some | ||
| unpredictability due to ambiguous ordering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part about non-determinism I don't think is correct? My understanding is if the index is non-unique, we fall back to hidden hashes to ensure total ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped this.
| "BigQueryOptions", | ||
| "get_global_session", | ||
| "close_session", | ||
| "enums", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is enums an intuitive module, or would a domain-related term be better, eg indexing.IndexType or directly putting the enum in the main module, bigframes.pandas.IndexType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to find some guidance on this, but Python community doesn't seem particularly prescriptive about module names.
PEP-8 has this to say:
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability.
https://peps.python.org/pep-0008/#package-and-module-names
Google Python style guide has a bit more to say:
Place related classes and top-level functions together in a module. Unlike Java, there is no need to limit yourself to one class per module.
Use CapWords for class names, but lower_with_under.py for module names.
https://google.github.io/styleguide/pyguide.html#3162-naming-conventions
I tried a few of these options out locally (bigframes.indexes.DefaultIndexKind and bigframes.pandas.DefaultIndexKind), but it feels strange to have something not really mimicking pandas in the pandas sub-package and bigframes.indexes.DefaultIndexKind would imply that we should move the Index and MultiIndex classes there, which is kinda the opposite of what we want to do.
The other option we could try is bigframes.pandas.core.indexes, but in pandas "core" is how they signify that an API is private an not to be relied on.
IMO, determining if classes are "related" by type for the basic types (e.g. exceptions, enums, ...) will be less effort for us long-term than having to figure out which public package to place these things if it doesn't fit in an existing API.
| Can also take wildcard table name, such as `project.dataset.table_prefix*`. | ||
| In tha case, will read all the matched table as one DataFrame. | ||
| index_col (Iterable[str] or str): | ||
| index_col (Iterable[str], str, bigframes.enums.IndexKind): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DefaultIndexKind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Done.
|
Looks like |
…or-partitioned-default-index-error
…ned-default-index-error' into b335727141-clustered-or-partitioned-default-index-error
Done! Looks like from https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html |
…or-partitioned-default-index-error
Please review #636 first, as this PR builds on that.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes internal issue 335727141
🦕