Faster categorical tick formatter.

anntzer · anntzer · commit 59c53c2ab752 · 2019-04-10T07:03:39.000+02:00
Having thousands of categories is most likely a sign that the user
forgot to convert strings to floats or dates, but we may as well not
take forever to generate the incorrect plot so that they can observe the
failure faster.

Right now StrCategoryFormatter constructs the value-to-label dict at
every call to `__call__` which leads to quadratic complexity when
iterating over the ticks.  Instead, just do this once in `format_ticks`
(and let `__call__` use that implementation too), for linear complexity.

This speeds up

    from pylab import *
    cats = [str(x) for x in np.random.rand(4000)]  # Bunch of labels.
    plt.plot(cats)
    plt.gcf().canvas.draw()

from ~25s to ~11s (and the difference gets bigger for more ticks as
we're comparing O(n^2) to O(n) (modulo dict lookup terms in log(n),
probably)).

The other option was to make UnitData maintain both a forward and a
backward mapping in sync but this would require passing the UnitData
instance rather than the mapping to the StrCategoryFormatter constructor
and the API break is just not worth it.
diff --git a/lib/matplotlib/category.py b/lib/matplotlib/category.py
@@ -148,11 +148,11 @@ def __init__(self, units_mapping):
         self._units = units_mapping
 
     def __call__(self, x, pos=None):
-        if pos is None:
-            return ""
-        r_mapping = {v: StrCategoryFormatter._text(k)
-                     for k, v in self._units.items()}
-        return r_mapping.get(int(np.round(x)), '')
+        return '' if pos is None else self.format_ticks([x])[0]
+
+    def format_ticks(self, values):
+        r_mapping = {v: self._text(k) for k, v in self._units.items()}
+        return [r_mapping.get(round(val), '') for val in values]
 
     @staticmethod
     def _text(value):