Skip to content

bpo-32424: Synchronize xml.etree.ElementTree.Element copy methods. #5046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

GPHemsley
Copy link
Contributor

@GPHemsley GPHemsley commented Dec 29, 2017

The Python implementation had Element.copy() while the C
implementation had Element.__copy__() and Element.__deepcopy__().

This synchronizes the two implementations and deprecates
elem.copy() in favor of copy.copy(elem).

https://bugs.python.org/issue32424

self.assertEqual(element_foo2.tail, self.element_foo.tail)

for (child1, child2) in zip(self.element_foo, element_foo2):
self.assertIs(child1, child2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno if this is a problem with the code coverage software or what, but it doesn't seem like this statement is getting hit by the coverage checker. Is it possible that one of these two has length zero?

I'll note that I personally prefer to use itertools.zip_longest for tests like this, since zip will just silently fail if one of the two iterators is a truncated version of the other - e.g. all(x == y for x, y in zip((1, 2, 3), (1, 2, 3, 4, 5))).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason the statement wasn't getting hit was because I forgot to actually check it.

And while the lists are expected to be of identical length, I agree that failing silently is not a good idea.

I've created a new commit that addresses these two issues, plus others.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__deepcopy__ is not needed in Python implementation. Many tests duplicate already existing tests or expose implementation details.

@@ -102,6 +103,11 @@ def newtest(*args, **kwargs):
return decorator


#-----------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is redundant.

@@ -115,6 +121,143 @@ def test_all(self):
support.check__all__(self, ET, names, blacklist=("HTML_EMPTY",))


class ElementTree_Element_UnitTest(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this class is not in line with other names.

def test___init__(self):
tag = "foo"
attrib = { "zix": "wyp" }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too much empty lines. Remove unneeded empty lines.


element_foo = ET.Element(tag, attrib)

with self.subTest("traits of an element"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed. It doesn't match the style of other tests.


with self.subTest("traits of an element"):
self.assertIsInstance(element_foo, ET.Element)
self.assertIn("tag", dir(element_foo))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already tested in test_interface.

attrib["bar"] = "baz"

self.assertIsNot(element_foo.attrib, attrib)
self.assertNotEqual(element_foo.attrib, attrib)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already tested in test_attrib.

self.assertIsNot(element_foo2.attrib, element_foo.attrib)
self.assertNotEqual(element_foo2.attrib, element_foo.attrib)

def test___copy__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an existing test for copying: test_copy.

del e1, e2, e3
gc_collect()
self.assertIsNone(wref())
with self.subTest("Test the shortest cycle: d->element->d"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes helped me to debug the changes I made to the C implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please remove them after finishing debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that they were not wholly unrelated. What is the harm in including them here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @GPHemsley that it's better to have subtests than multiple tests in a single test function. Although having multiple test functions also wouldn't be all too bad. :)
Yes, they are unrelated to this specific ticket, but they don't really seem big enough to merit their own ticket and PR. They somewhat match the general refactoring of lengthy tests into subtests that this change does in other places. It's at least an edge case.

elem = self.makeelement(self.tag, self.attrib)
elem.text = self.text
elem.tail = self.tail
elem[:] = self
return elem

def __deepcopy__(self, memo):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is not needed in Python implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer you have this discussion on the bpo ticket, as PEP 399 suggests otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't contradict PEP 399.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention of PEP-399 is that both implementations behave the same. It's not that they use exactly the same mechanisms to reach that goal. Some things work in Python that don't easily work in C, and vice-versa. That's why there are two implementations in the first place.

If we can avoid adding Python (or C) code without sacrificing that goal, then that is a good thing. Code that does not exist is entirely free of bugs, for example. :)

@@ -283,10 +283,12 @@ create_new_element(PyObject* tag, PyObject* attrib)
PyObject_GC_Track(self);

if (attrib != Py_None && !is_empty_dict(attrib)) {
attrib = PyDict_Copy(attrib);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change has been added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the results of the tests in this commit uncovered a discrepancy between the C and Python implementations whereby the attrib list is not copied in the C version beyond the __init__ function. (The Python implementation always uses __init__ whenever it creates a new element, but the C implementation uses create_new_element in all locations except the original __init__ function.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many case this makes a redundant copying since create_new_element is called after copying a dict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be moved out of this function into the caller(s) that need it?

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@GPHemsley
Copy link
Contributor Author

Generally: I have added the unit tests in the way that I have because the existing checks are generally integration tests, and most of the test coverage for this module comes as a side effect of pickle tests. It is my goal to expand the unit tests later to test each method individually, so starting that work here makes it easier.

_elementtree_Element_copy_impl(ElementObject *self)
/*[clinic end generated code: output=84660e8524276b22 input=1f8134305a7719a3]*/
{
if (PyErr_WarnEx(PyExc_DeprecationWarning,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm personally ok with deprecating this right away (lxml never provided such a .copy() method, AFAICT), but still, the usual process would be to issue a PendingDeprecationWarning first.

Copy link
Contributor

@scoder scoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @serhiy-storchaka that some cleanup is needed to get this merged.

The Python implementation had Element.copy() while the C
implementation had Element.__copy__() and Element.__deepcopy__().

This synchronizes the two implementations and deprecates
elem.copy() in favor of copy.copy(elem).
@GPHemsley
Copy link
Contributor Author

GPHemsley commented Apr 14, 2019

@scoder My latest comment on bpo-32424 explains some of my decisions. Would you mind providing feedback on that?

(In the meantime, I've rebased this against master.)

@GPHemsley
Copy link
Contributor Author

Upon further review, it's clear I'm trying to do too many (4) different things in this one commit, so I'm going to break it up and submit the changes separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants