-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-32424: Synchronize xml.etree.ElementTree.Element copy methods. #5046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Lib/test/test_xml_etree.py
Outdated
self.assertEqual(element_foo2.tail, self.element_foo.tail) | ||
|
||
for (child1, child2) in zip(self.element_foo, element_foo2): | ||
self.assertIs(child1, child2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dunno if this is a problem with the code coverage software or what, but it doesn't seem like this statement is getting hit by the coverage checker. Is it possible that one of these two has length zero?
I'll note that I personally prefer to use itertools.zip_longest
for tests like this, since zip
will just silently fail if one of the two iterators is a truncated version of the other - e.g. all(x == y for x, y in zip((1, 2, 3), (1, 2, 3, 4, 5)))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason the statement wasn't getting hit was because I forgot to actually check it.
And while the lists are expected to be of identical length, I agree that failing silently is not a good idea.
I've created a new commit that addresses these two issues, plus others.
dd7421c
to
d072962
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__deepcopy__
is not needed in Python implementation. Many tests duplicate already existing tests or expose implementation details.
@@ -102,6 +103,11 @@ def newtest(*args, **kwargs): | |||
return decorator | |||
|
|||
|
|||
#----------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is redundant.
@@ -115,6 +121,143 @@ def test_all(self): | |||
support.check__all__(self, ET, names, blacklist=("HTML_EMPTY",)) | |||
|
|||
|
|||
class ElementTree_Element_UnitTest(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name of this class is not in line with other names.
def test___init__(self): | ||
tag = "foo" | ||
attrib = { "zix": "wyp" } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too much empty lines. Remove unneeded empty lines.
|
||
element_foo = ET.Element(tag, attrib) | ||
|
||
with self.subTest("traits of an element"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed. It doesn't match the style of other tests.
|
||
with self.subTest("traits of an element"): | ||
self.assertIsInstance(element_foo, ET.Element) | ||
self.assertIn("tag", dir(element_foo)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already tested in test_interface.
attrib["bar"] = "baz" | ||
|
||
self.assertIsNot(element_foo.attrib, attrib) | ||
self.assertNotEqual(element_foo.attrib, attrib) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already tested in test_attrib.
self.assertIsNot(element_foo2.attrib, element_foo.attrib) | ||
self.assertNotEqual(element_foo2.attrib, element_foo.attrib) | ||
|
||
def test___copy__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an existing test for copying: test_copy.
del e1, e2, e3 | ||
gc_collect() | ||
self.assertIsNone(wref()) | ||
with self.subTest("Test the shortest cycle: d->element->d"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes helped me to debug the changes I made to the C implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please remove them after finishing debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point was that they were not wholly unrelated. What is the harm in including them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @GPHemsley that it's better to have subtests than multiple tests in a single test function. Although having multiple test functions also wouldn't be all too bad. :)
Yes, they are unrelated to this specific ticket, but they don't really seem big enough to merit their own ticket and PR. They somewhat match the general refactoring of lengthy tests into subtests that this change does in other places. It's at least an edge case.
elem = self.makeelement(self.tag, self.attrib) | ||
elem.text = self.text | ||
elem.tail = self.tail | ||
elem[:] = self | ||
return elem | ||
|
||
def __deepcopy__(self, memo): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is not needed in Python implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer you have this discussion on the bpo ticket, as PEP 399 suggests otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't contradict PEP 399.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention of PEP-399 is that both implementations behave the same. It's not that they use exactly the same mechanisms to reach that goal. Some things work in Python that don't easily work in C, and vice-versa. That's why there are two implementations in the first place.
If we can avoid adding Python (or C) code without sacrificing that goal, then that is a good thing. Code that does not exist is entirely free of bugs, for example. :)
@@ -283,10 +283,12 @@ create_new_element(PyObject* tag, PyObject* attrib) | |||
PyObject_GC_Track(self); | |||
|
|||
if (attrib != Py_None && !is_empty_dict(attrib)) { | |||
attrib = PyDict_Copy(attrib); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change has been added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the results of the tests in this commit uncovered a discrepancy between the C and Python implementations whereby the attrib list is not copied in the C version beyond the __init__
function. (The Python implementation always uses __init__
whenever it creates a new element, but the C implementation uses create_new_element
in all locations except the original __init__
function.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In many case this makes a redundant copying since create_new_element
is called after copying a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be moved out of this function into the caller(s) that need it?
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Generally: I have added the unit tests in the way that I have because the existing checks are generally integration tests, and most of the test coverage for this module comes as a side effect of |
_elementtree_Element_copy_impl(ElementObject *self) | ||
/*[clinic end generated code: output=84660e8524276b22 input=1f8134305a7719a3]*/ | ||
{ | ||
if (PyErr_WarnEx(PyExc_DeprecationWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm personally ok with deprecating this right away (lxml never provided such a .copy()
method, AFAICT), but still, the usual process would be to issue a PendingDeprecationWarning
first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @serhiy-storchaka that some cleanup is needed to get this merged.
The Python implementation had Element.copy() while the C implementation had Element.__copy__() and Element.__deepcopy__(). This synchronizes the two implementations and deprecates elem.copy() in favor of copy.copy(elem).
@scoder My latest comment on bpo-32424 explains some of my decisions. Would you mind providing feedback on that? (In the meantime, I've rebased this against master.) |
Upon further review, it's clear I'm trying to do too many (4) different things in this one commit, so I'm going to break it up and submit the changes separately. |
The Python implementation had
Element.copy()
while the Cimplementation had
Element.__copy__()
andElement.__deepcopy__()
.This synchronizes the two implementations and deprecates
elem.copy()
in favor ofcopy.copy(elem)
.https://bugs.python.org/issue32424