|
| 1 | +.. _design-patterns-slowly-changing-dimensions: |
| 2 | + |
| 3 | +========================== |
| 4 | +Slowly Changing Dimensions |
| 5 | +========================== |
| 6 | + |
| 7 | +.. meta:: |
| 8 | + :description: Implement the Slowly Changing Dimensions framework to track and query changes and different versions of fields in your documents over time. |
| 9 | + |
| 10 | +.. facet:: |
| 11 | + :name: genre |
| 12 | + :values: tutorial |
| 13 | + |
| 14 | +.. contents:: On this page |
| 15 | + :local: |
| 16 | + :backlinks: none |
| 17 | + :depth: 2 |
| 18 | + :class: singlecol |
| 19 | + |
| 20 | +Slowly changing dimensions (SCDs) is a framework for managing and |
| 21 | +tracking changes to dimension data in a data warehouse over time. |
| 22 | +This framework refers to the dimensions as "slowly changing" because |
| 23 | +it assumes that the data SCDs cover changes with a low frequency, |
| 24 | +but without any apparent pattern in time. Use SCDs when the |
| 25 | +requirements for the data warehouse cover functionality to track |
| 26 | +and reproduce outputs based on historical states of data. |
| 27 | + |
| 28 | +A common use case for SCDs is reporting. For example, |
| 29 | +in financial reporting systems, you need to explain the differences |
| 30 | +between the aggregated values in a report produced last month and |
| 31 | +those in the current version of the report from the data warehouse. |
| 32 | + |
| 33 | +The different implementations of SCDs in SQL are referred to as |
| 34 | +"types." Types 0 and 1, the most basic types, only keep track of |
| 35 | +the original state of data or the current state of data, respectively. |
| 36 | +Type 2, the most commonly applied implementation, creates three |
| 37 | +new fields: ``validFrom``, ``validTo``, and an optional flag |
| 38 | +on the latest set of data, often called ``isValid`` or ``isEffective``. |
| 39 | + |
| 40 | +SCD Types |
| 41 | +--------- |
| 42 | + |
| 43 | +.. list-table:: |
| 44 | + :header-rows: 1 |
| 45 | + :stub-columns: 1 |
| 46 | + :widths: 20 80 |
| 47 | + |
| 48 | + * - SCD Type |
| 49 | + - Description |
| 50 | + |
| 51 | + * - Type 0 |
| 52 | + - Only keep original state and data cannot be changed. |
| 53 | + |
| 54 | + * - Type 1 |
| 55 | + - Only keep updated state and history cannot be stored. |
| 56 | + |
| 57 | + * - Type 2 |
| 58 | + - Keep history in a new document. |
| 59 | + |
| 60 | + * - Type 3 |
| 61 | + - Keep history in new fields in the same document. |
| 62 | + |
| 63 | + * - Type 4 |
| 64 | + - Keep history in a separate collection. |
| 65 | + |
| 66 | + * - Type 6 |
| 67 | + - Combination of Type 2 and Type 3. |
| 68 | + |
| 69 | +SCDs in MongoDB |
| 70 | +--------------- |
| 71 | + |
| 72 | +You can apply the SCD framework to MongoDB in the same way you apply it to |
| 73 | +a relational database. The concept of slowly changing dimensions applies on a |
| 74 | +per-document basis in the chosen and optimized data model for the specific use case. |
| 75 | + |
| 76 | +Example |
| 77 | +~~~~~~~ |
| 78 | + |
| 79 | +Consider a collection called ``prices`` that stores the |
| 80 | +prices of a set of items. You need to keep track of the changes of the |
| 81 | +price of an item over time in order to be able to process returns of an |
| 82 | +item, as the money refunded must match the price of the item at the time of |
| 83 | +purchase. Each document in the collection has an ``item`` and ``price`` field: |
| 84 | + |
| 85 | +.. code-block:: javascript |
| 86 | + |
| 87 | + db.prices.insertMany( [ |
| 88 | + { 'item': 'shorts', 'price': 10 }, |
| 89 | + { 'item': 't-shirt', 'price': 2 }, |
| 90 | + { 'item': 'pants', 'price': 5 }, |
| 91 | + ] ) |
| 92 | + |
| 93 | +Suppose the price of pants changes from ``5`` to ``7``. To track this price change, |
| 94 | +assume the default values for the necessary data fields for SCD Type 2. |
| 95 | +The default value for ``validFrom`` is ``01.01.1900``, ``validTo`` is ``01.01.9999``, |
| 96 | +and ``isValid`` is ``true``. To change the ``price`` field in the object with |
| 97 | +``'item': 'pants'``, insert a new document to represent the current state |
| 98 | +of the pants, and update the previously valid document to no longer be valid: |
| 99 | + |
| 100 | +.. code-block:: javascript |
| 101 | + |
| 102 | + let now = new Date(); |
| 103 | + |
| 104 | + db.prices.updateOne( |
| 105 | + { |
| 106 | + 'item': 'pants', |
| 107 | + "$or": [ |
| 108 | + { "isValid": false }, |
| 109 | + { "isValid": null } |
| 110 | + ] |
| 111 | + }, |
| 112 | + { "$set": |
| 113 | + { |
| 114 | + "validFrom": new Date("1900-01-01"), |
| 115 | + "validTo": now, |
| 116 | + "isValid": false |
| 117 | + } |
| 118 | + } |
| 119 | + ); |
| 120 | + |
| 121 | + db.prices.insertOne( |
| 122 | + { |
| 123 | + 'item': 'pants', |
| 124 | + 'price': 7, |
| 125 | + "validFrom": now, |
| 126 | + "validTo": new Date("9999-01-01"), |
| 127 | + "isValid": true |
| 128 | + } |
| 129 | + ); |
| 130 | + |
| 131 | +To avoid breaking the chain of validity, ensure that both of the above |
| 132 | +database operation occur at the same timestamp. Depending on the |
| 133 | +requirements of the application, you can wrap the two above commands |
| 134 | +into a transaction to ensure MongoDB always applies both changes together. |
| 135 | +For more information, see :ref:`transactions`. |
| 136 | + |
| 137 | +The following operation demonstrates how to query the latest |
| 138 | +``price`` of the document containing the ``pants`` item: |
| 139 | + |
| 140 | +.. code-block:: javascript |
| 141 | + |
| 142 | + db.prices.find( { 'item': 'pants', 'isValid': true } ); |
| 143 | + |
| 144 | +To query for the ``price`` of the document containing the ``pants`` |
| 145 | +item at a specific point in time, use the following operation: |
| 146 | + |
| 147 | +.. code-block:: javascript |
| 148 | + |
| 149 | + let time = new Date("2022-11-16T13:00:00"); |
| 150 | + db.prices.find( { |
| 151 | + 'item': 'pants', |
| 152 | + 'validFrom': { '$lte': time }, |
| 153 | + 'validTo': { '$gt': time } |
| 154 | + } ); |
| 155 | + |
| 156 | +Tracking Changes in Few Fields |
| 157 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 158 | + |
| 159 | +If you only need to track changes over time to few fields |
| 160 | +in a document, you can use SCD type 3 by embedding the |
| 161 | +history of a field as an array in the first document. |
| 162 | + |
| 163 | +For example, the following aggregation pipeline updates the ``price`` |
| 164 | +in the document representing ``pants`` to ``7`` and stores the |
| 165 | +previous value of the ``price`` with a timestamp of when the |
| 166 | +previous ``price`` became invalid in an array called ``priceHistory``: |
| 167 | + |
| 168 | +.. code-block:: javascript |
| 169 | + |
| 170 | + db.prices.aggregate( [ |
| 171 | + { $match: { 'item': 'pants' } }, |
| 172 | + { $addFields: |
| 173 | + { price: 7, priceHistory: |
| 174 | + { $concatArrays: |
| 175 | + [ |
| 176 | + { $ifNull: [ '$priceHistory', [] ] }, |
| 177 | + [ { price: "$price", time: now } ] |
| 178 | + ] |
| 179 | + } |
| 180 | + } |
| 181 | + }, |
| 182 | + { $merge: |
| 183 | + { |
| 184 | + into: "prices", |
| 185 | + on: "_id", |
| 186 | + whenMatched: "merge", |
| 187 | + whenNotMatched: "fail" |
| 188 | + } |
| 189 | + } |
| 190 | + ] ) |
| 191 | + |
| 192 | +This solution can become slow or inefficient if your array size gets too large. |
| 193 | +To avoid large arrays, you can use the :ref:`outlier <group-data-outlier-pattern>` |
| 194 | +or the :ref:`bucket <group-data-bucket-pattern>` patterns to design your schema. |
| 195 | + |
| 196 | +Outlook Data Federation |
| 197 | +----------------------- |
| 198 | + |
| 199 | +The above examples focus on a strict and accurate representation of |
| 200 | +document field changes. Sometimes, you might have less strict requirements |
| 201 | +on showing historical data. For example, you might have an application that |
| 202 | +only requires access to the current state of the data most of the time, |
| 203 | +but you must run some analytical queries on the full history of data. |
| 204 | + |
| 205 | +In this case, you can store the current version of the data in one collection |
| 206 | +and the historical changes in another collection. You can then remove the |
| 207 | +historical collection from the active MongoDB cluster using the |
| 208 | +:ref:`MongoDB Atlas Federated Database <atlas-data-federation>` functionalities, |
| 209 | +and in the fully managed version using the |
| 210 | +:atlas:`Online Archive </online-archive/manage-online-archive/>`. |
| 211 | + |
| 212 | +Other Use Cases |
| 213 | +--------------- |
| 214 | + |
| 215 | +While slowly changing dimensions is helpful for data warehousing, you |
| 216 | +can also use the SCD framework in event-driven applications. If you have |
| 217 | +infrequent events in different types of categories, it is expensive to |
| 218 | +find the latest event per category, as the process could require |
| 219 | +grouping or sorting your data in order to find the current state. |
| 220 | + |
| 221 | +In the case of infrequent events, you can amend the data model by |
| 222 | +adding a field to store the time of the next event, in addition |
| 223 | +to the event time per document. The new date field ensures that |
| 224 | +if you execute a search for a specific point in time, you can easily |
| 225 | +and efficiently retrieve the respective event you are searching for. |
0 commit comments