|
| 1 | + |
| 2 | +LRU Dictionaries |
| 3 | +================= |
| 4 | + |
| 5 | + >>> from darts.lib.utils.lru import LRUDict |
| 6 | + |
| 7 | +An `LRUDict` is basically a simple dictionary, which has a defined |
| 8 | +maximum capacity, that may be supplied at construction time, or modified |
| 9 | +at run-time via the `capacity` property:: |
| 10 | + |
| 11 | + >>> cache = LRUDict(1) |
| 12 | + >>> cache.capacity |
| 13 | + 1 |
| 14 | + |
| 15 | +The minimum capacity value is 1, and LRU dicts will complain, if someone |
| 16 | +attempts to use a value smaller than that:: |
| 17 | + |
| 18 | + >>> cache.capacity = -1 #doctest: +ELLIPSIS |
| 19 | + Traceback (most recent call last): |
| 20 | + ... |
| 21 | + ValueError: -1 is not a valid capacity |
| 22 | + >>> LRUDict(-1) #doctest: +ELLIPSIS |
| 23 | + Traceback (most recent call last): |
| 24 | + ... |
| 25 | + ValueError: -1 is not a valid capacity |
| 26 | + |
| 27 | +LRU dictionaries can never contain more elements than their capacity value |
| 28 | +indicates, so:: |
| 29 | + |
| 30 | + >>> cache[1] = "First" |
| 31 | + >>> cache[2] = "Second" |
| 32 | + >>> len(cache) |
| 33 | + 1 |
| 34 | + |
| 35 | +In order to ensure this behaviour, the dictionary will evict entries if |
| 36 | +it needs to make room for new ones. So:: |
| 37 | + |
| 38 | + >>> 1 in cache |
| 39 | + False |
| 40 | + >>> 2 in cache |
| 41 | + True |
| 42 | + |
| 43 | +The capacity can be adjusted at run-time. Growing the capacity does not |
| 44 | +affect the number of elements present in an LRU dictionary:: |
| 45 | + |
| 46 | + >>> cache.capacity = 3 |
| 47 | + >>> len(cache) |
| 48 | + 1 |
| 49 | + >>> cache[1] = "First" |
| 50 | + >>> cache[3] = "Third" |
| 51 | + >>> len(cache) |
| 52 | + 3 |
| 53 | + |
| 54 | +but shrinking does:: |
| 55 | + |
| 56 | + >>> cache.capacity = 2 |
| 57 | + >>> len(cache) |
| 58 | + 2 |
| 59 | + >>> sorted(list(cache.iterkeys())) |
| 60 | + [1, 3] |
| 61 | + |
| 62 | +Note, that the entry with key `2` was evicted, because it was the oldest |
| 63 | +entry at the time of the modification of `capacity`. The new oldest entry |
| 64 | +is the one with key `1`, which can be seen, when we try to add another |
| 65 | +entry to the dict:: |
| 66 | + |
| 67 | + >>> cache[4] = "Fourth" |
| 68 | + >>> sorted(list(cache.iterkeys())) |
| 69 | + [3, 4] |
| 70 | + |
| 71 | +The following operations affect an entry's priority:: |
| 72 | + |
| 73 | +- `get` |
| 74 | +- `__getitem__` |
| 75 | +- `__setitem__` |
| 76 | +- `__contains__` |
| 77 | + |
| 78 | +Calling any of these operations on an existing key will boost the key's |
| 79 | +priority, making it more unlikely to get evicted, when the dictionary needs |
| 80 | +to make room for new entries. There is a special `peek` operation, which |
| 81 | +returns the current value associated to a key without boosting the priority |
| 82 | +of the entry:: |
| 83 | + |
| 84 | + >>> cache.peek(3) |
| 85 | + 'Third' |
| 86 | + >>> cache[5] = "Fifth" |
| 87 | + >>> sorted(list(cache.iterkeys())) |
| 88 | + [4, 5] |
| 89 | + |
| 90 | +As you can see, even though we accessed the entry with key `3` as the last |
| 91 | +one, the entry is now gone, because it did not get a priority boost from |
| 92 | +the call to `peek`. |
| 93 | + |
| 94 | +The class `LRUDict` supports a subset of the standard Python `dict` |
| 95 | +interface. In particular, we can iterate over the key, values, and |
| 96 | +items of an LRU dict:: |
| 97 | + |
| 98 | + >>> sorted([k for k in cache.iterkeys()]) |
| 99 | + [4, 5] |
| 100 | + >>> sorted([v for v in cache.itervalues()]) |
| 101 | + ['Fifth', 'Fourth'] |
| 102 | + >>> sorted([p for p in cache.iteritems()]) |
| 103 | + [(4, 'Fourth'), (5, 'Fifth')] |
| 104 | + >>> sorted(list(cache)) |
| 105 | + [4, 5] |
| 106 | + |
| 107 | +Note, that there is no guaranteed order; in particular, the elements are |
| 108 | +not generated in priority order or somesuch. Similar to regular `dict`s, |
| 109 | +an LRU dict's `__iter__` is actually any alias for `iterkeys`. |
| 110 | + |
| 111 | +Furthermore, we can remove all elements from the dict: |
| 112 | + |
| 113 | + >>> cache.clear() |
| 114 | + >>> sorted(list(cache.iterkeys())) |
| 115 | + [] |
| 116 | + |
| 117 | + |
| 118 | +Thread-safety |
| 119 | +-------------- |
| 120 | + |
| 121 | +Instances of class `LRUDict` are not thread safe. Worse: even concurrent |
| 122 | +read-only access is not thread-safe and has to be synchronized by the |
| 123 | +client application. |
| 124 | + |
| 125 | +There is, however, the class `SynchronizedLRUDict`, which exposes the |
| 126 | +same interface as plain `LRUDict`, but fully thread-safe. The following |
| 127 | +session contains exactly the steps, we already tried with a plain `LRUDict`, |
| 128 | +but now using the synchronized version:: |
| 129 | + |
| 130 | + >>> from darts.lib.utils.lru import SynchronizedLRUDict |
| 131 | + >>> cache = SynchronizedLRUDict(1) |
| 132 | + >>> cache.capacity |
| 133 | + 1 |
| 134 | + >>> cache.capacity = -1 #doctest: +ELLIPSIS |
| 135 | + Traceback (most recent call last): |
| 136 | + ... |
| 137 | + ValueError: -1 is not a valid capacity |
| 138 | + >>> LRUDict(-1) #doctest: +ELLIPSIS |
| 139 | + Traceback (most recent call last): |
| 140 | + ... |
| 141 | + ValueError: -1 is not a valid capacity |
| 142 | + >>> cache[1] = "First" |
| 143 | + >>> cache[2] = "Second" |
| 144 | + >>> len(cache) |
| 145 | + 1 |
| 146 | + >>> 1 in cache |
| 147 | + False |
| 148 | + >>> 2 in cache |
| 149 | + True |
| 150 | + >>> cache.capacity = 3 |
| 151 | + >>> len(cache) |
| 152 | + 1 |
| 153 | + >>> cache[1] = "First" |
| 154 | + >>> cache[3] = "Third" |
| 155 | + >>> len(cache) |
| 156 | + 3 |
| 157 | + >>> cache.capacity = 2 |
| 158 | + >>> len(cache) |
| 159 | + 2 |
| 160 | + >>> sorted(list(cache.iterkeys())) |
| 161 | + [1, 3] |
| 162 | + >>> cache[4] = "Fourth" |
| 163 | + >>> sorted(list(cache.iterkeys())) |
| 164 | + [3, 4] |
| 165 | + >>> cache.peek(3) |
| 166 | + 'Third' |
| 167 | + >>> cache[5] = "Fifth" |
| 168 | + >>> sorted(list(cache.iterkeys())) |
| 169 | + [4, 5] |
| 170 | + >>> sorted([k for k in cache.iterkeys()]) |
| 171 | + [4, 5] |
| 172 | + >>> sorted([v for v in cache.itervalues()]) |
| 173 | + ['Fifth', 'Fourth'] |
| 174 | + >>> sorted([p for p in cache.iteritems()]) |
| 175 | + [(4, 'Fourth'), (5, 'Fifth')] |
| 176 | + >>> sorted(list(cache)) |
| 177 | + [4, 5] |
| 178 | + >>> cache.clear() |
| 179 | + >>> sorted(list(cache.iterkeys())) |
| 180 | + [] |
| 181 | + |
| 182 | + |
| 183 | +Auto-loading Caches |
| 184 | +==================== |
| 185 | + |
| 186 | +Having some kind of dictionary which is capable of cleaning itself |
| 187 | +up is nice, but in order to implement caching, there is still something |
| 188 | +missing: the mechanism, which actually loads something into our dict. |
| 189 | +This part of the story is implemented by the `AutoLRUCache`:: |
| 190 | + |
| 191 | + >>> from darts.lib.utils.lru import AutoLRUCache |
| 192 | + |
| 193 | +Let's first define a load function:: |
| 194 | + |
| 195 | + >>> def load_resource(key): |
| 196 | + ... if key < 10: |
| 197 | + ... print "Loading %r" % (key,) |
| 198 | + ... return "R(%s)" % (key,) |
| 199 | + |
| 200 | +and a cache:: |
| 201 | + |
| 202 | + >>> cache = AutoLRUCache(load_resource, capacity=3) |
| 203 | + >>> cache.load(1) |
| 204 | + Loading 1 |
| 205 | + 'R(1)' |
| 206 | + >>> cache.load(1) |
| 207 | + 'R(1)' |
| 208 | + |
| 209 | +As you can see, the first time, an actual element is loaded, the load |
| 210 | +function provided to the constructor is called, in order to provide the |
| 211 | +actual resource value. On subsequent calls to `load`, the cached value |
| 212 | +is returned. |
| 213 | + |
| 214 | +Internally, the `AutoLRUCache` class uses an `LRUDict` to cache values, |
| 215 | +so:: |
| 216 | + |
| 217 | + >>> cache.load(2) |
| 218 | + Loading 2 |
| 219 | + 'R(2)' |
| 220 | + >>> cache.load(3) |
| 221 | + Loading 3 |
| 222 | + 'R(3)' |
| 223 | + >>> cache.load(4) |
| 224 | + Loading 4 |
| 225 | + 'R(4)' |
| 226 | + >>> cache.load(1) |
| 227 | + Loading 1 |
| 228 | + 'R(1)' |
| 229 | + |
| 230 | +Note the "Loading 1" line in the last example. The cache has been initialized |
| 231 | +with a capacity of 3, so the value of key `1` had to be evicted when the one |
| 232 | +for key `4` was loaded. When we tried to obtain `1` again, the cache had to |
| 233 | +properly reload it, calling the loader function. |
| 234 | + |
| 235 | +If there is actually no resource for a given key value, the loader function |
| 236 | +must return `None`. It follows, that `None` is never a valid resource value |
| 237 | +to be associated with some key in an `AutoLRUCache`. |
| 238 | + |
| 239 | + >>> cache.load(11, 'Oops') |
| 240 | + 'Oops' |
| 241 | + |
| 242 | + |
| 243 | +Thread-safety |
| 244 | +-------------- |
| 245 | + |
| 246 | +Instances of class `AutoLRUCache` are fully thread safe. Be warned, though, |
| 247 | +that the loader function is called outside of any synchronization scope the |
| 248 | +class may internally use, and has to provide its own synchronization if |
| 249 | +required. |
| 250 | + |
| 251 | +The cache class actually tries to minimize the number of invocations of the |
| 252 | +loader by making sure, that no two concurrent threads will try to load the |
| 253 | +same key value (though any number of concurrent threads might be busy loading |
| 254 | +the resources associated with different keys). |
| 255 | + |
| 256 | + |
| 257 | +Change Log |
| 258 | +========== |
| 259 | + |
| 260 | +Version 0.3 |
| 261 | +------------ |
| 262 | + |
| 263 | +Added class `SynchronizedLRUDict` as thread-safe counterpart for `LRUDict`. |
| 264 | + |
0 commit comments