WIP: Fix precision of DataFrame.combine and DataFrame.combine_first #62691

angela-tarantula · 2025-10-14T12:01:20Z

EDIT: This was my first attempt. Converting to EA Dtypes and back just to prevent lossy conversion is a discouraged approach not taken in this codebase, according to maintainers.

[] closes BUG: Series.combine_first loss of precision #60128
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

…64 does the trick)

jbrockmendel · 2025-10-30T22:50:38Z

pandas/core/frame.py

-            if y.name not in self.columns:
-                return y_values
-
-            return expressions.where(mask, y_values, x_values)


does not using expressions.where have a perf impact?

jbrockmendel · 2025-10-30T22:51:59Z

pandas/tests/frame/methods/test_combine_first.py

+            (-1666880195890293744, "int64"),
+        ),
+    )
+    def test_combine_first_preserve_precision(self, wide_val, dtype):


looks like you made changes to both combine and combine_first, but only a test for combine_first. can you test the fixed bug in combine?

jbrockmendel · 2025-10-30T22:52:34Z

pandas/core/frame.py

+            """Resolve the combined dtypes according to the original dtypes."""
+            cast_map: dict[IndexLabel, DtypeObj] = {}
+            for col in combined_df.columns:
+                ser = combined_df[col]


this is not going to be robust to non-unique columns. can you use .iloc[:, i] instead?

jbrockmendel · 2025-10-30T22:53:06Z

pandas/core/frame.py

+                orig_dt_self = self_orig.dtypes.get(col)
+                orig_dt_other = other_orig.dtypes.get(col)
+
+                was_promoted = (orig_dt_self in [np.int64, np.uint64]) or (


no risk for smaller int widths?

jbrockmendel · 2025-10-30T22:53:58Z

pandas/core/frame.py

        2  NaN  3.0 1.0
        """
+
+        # GH#60128 Integers n where |n| > 2**53 would lose precision after align


align only upcasts to float when the frames have different indexes right? would it be simpler to just warn users and tell them to do alignment before calling .combine?

angela-tarantula · 2025-10-30T23:10:59Z

My bad! I meant to close this draft PR.

angela-tarantula · 2025-10-30T23:17:00Z

This is not the right solution to the problem. A maintainer told me that casting to EA dtypes and back is an controversial approach and discouraged in this codebase.

I have already merged this PR which partially addresses bug report. The combiner did not previously preserve EA dtypes as it should, leaving users no way to avoid float64 lossy conversion even with EA dtypes. I'm currently thinking of a stopgap that can close the bug report without using EA dtypes.

angela-tarantula added 7 commits October 14, 2025 07:59

add targeted casting to combine_first

c8ba339

use difference not union to avoid sorting

f786763

fix typo, fewer comments

ae99b3d

refactor

6e77fc9

add news

ea7d2ba

always use nullable Int for 64-bit ints

301d85f

always upcast, for predictability

a6b461c

angela-tarantula changed the title ~~WIP: add targeted casting to combine_first~~ WIP: Fix precision of DataFrame.combine_first Oct 19, 2025

angela-tarantula added 2 commits October 19, 2025 13:33

make wide ints nullable before align and restore after combining

e15bde9

update test expectations (don't convert to float64 when Int64 or UInt…

451621b

…64 does the trick)

angela-tarantula force-pushed the fix-issue-60128 branch from 3ec10fc to 451621b Compare October 19, 2025 17:36

angela-tarantula added 13 commits October 19, 2025 13:51

add type hint

4fdc459

small refactor

ef662a0

combine_first's combiner must preserve EA dtypes

fefadcb

clearer comments

f80917d

create new test for issue

bf69fad

clean up test

016c64e

don't break any other tests, but comment why it may be worth it

2928cee

preserve old test

1a53d48

thinner comment

7e6837f

clearer comments

444deaa

follow contributing guidelines

49ff1a5

move news from reshaping to numeric

747e8bc

use correct typing

c527bc0

angela-tarantula changed the title ~~WIP: Fix precision of DataFrame.combine_first~~ WIP: Fix precision of DataFrame.combine and DataFrame.combine_first Oct 24, 2025

angela-tarantula mentioned this pull request Oct 24, 2025

BUG: Series.combine_first loss of precision #60128

Open

3 tasks

jbrockmendel reviewed Oct 30, 2025

View reviewed changes

angela-tarantula closed this Oct 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Fix precision of DataFrame.combine and DataFrame.combine_first #62691

WIP: Fix precision of DataFrame.combine and DataFrame.combine_first #62691

angela-tarantula commented Oct 14, 2025 •

edited

Loading

Uh oh!

jbrockmendel Oct 30, 2025

Uh oh!

jbrockmendel Oct 30, 2025

Uh oh!

jbrockmendel Oct 30, 2025

Uh oh!

jbrockmendel Oct 30, 2025

Uh oh!

jbrockmendel Oct 30, 2025

Uh oh!

angela-tarantula commented Oct 30, 2025 •

edited

Loading

Uh oh!

angela-tarantula commented Oct 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

WIP: Fix precision of DataFrame.combine and DataFrame.combine_first #62691

WIP: Fix precision of DataFrame.combine and DataFrame.combine_first #62691

Conversation

angela-tarantula commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

angela-tarantula commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angela-tarantula commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

angela-tarantula commented Oct 14, 2025 •

edited

Loading

angela-tarantula commented Oct 30, 2025 •

edited

Loading

angela-tarantula commented Oct 30, 2025 •

edited

Loading