Description
Feature or enhancement
Bug description:
A question on StackOverflow got me thinking about "the other end" of list.sort()
: the start, with count_run()
.
https://stackoverflow.com/questions/78108792/
That descending runs have to be strictly descending was always annoying, but since we only have __lt__
to work with, it seemed too expensive to check for equality. As comments in the cited issue say, if we could tell when adjacent elements are equal, then runs of equal elements encountered during a descending run could be reversed in-place on the fly, and then their original order would be restored by a reverse-the-whole-run at the end.
Another minor annoyance is that, e.g., [3, 2, 1, 3, 4, 5]
gives up after finding the [3, 2, 1]
prefix. But after a descending run is reversed, it's all but free to go on to see whether it can be extended by a luckily ascending run immediately following.
So the idea is to make count_run()
smarter, and have it do all reversing by itself.
A Python prototype convinced me this can be done without slowing processing of ascending runs(*), and even on randomized data it creates runs about 20% longer on average.
Which doesn't much matter on its own: on randomized data, all runs are almost certain to be artificially boosted to length MINRUN anyway. But starting with longer short runs would have beneficial effect anyway, by cutting the number of searches the binary insertion sort has to do.
Not a major thing regardless. The wins would be large on various new cases of high partial order, and on the brain relief by not having to explain the difference between the definitions of "ascending" and "descending" by appeal to implementation details.
(*) Consider [100] * 1_000_000 + [2]
. The current code does a million compares to detect that this starts with a million-element ascending run, followed by a drop. The whole thing would be a "descending run" under the new definition, but just not worth it if it required 2 million compares to detect the long run of equalities.
But, happily, the current code for that doesn't need to change at all. After the million compares, it knows
a[0] <= a[1] <= ... <= a[-2] > a[-1]
At that point, it only needs one more compare, "is a[0] < a[-2]?". If so, we're done: at some point (& we don't care where) the sequence increased, so this cannot be a descending run. But if not, the prefix both starts and ends with 100, so all elements in the prefix must be equal: this is a descending run. We reverse the first million elements in-p1ace, extend the right end to include the trailing 2, then finally reverse the whole thing in-place.
Yes, that final result could be gotten with about half the stores using a suitable in-place array-rotation algorithm instead, but that's out of scope for this issue report. The approach sketched works no matter how many distinct blocks of all-equal runs are encountered.
CPython versions tested on:
3.12
Operating systems tested on:
No response