Skip to content

Commit b2611ea

Browse files
DOCSP-49409-slowly-changing-dimensions (#11989)
* DOCSP-49409-slowly-changing-dimensions * fixes * links * typo * feedback * more edits * code render fixes * build * reviewr feedback * reviewer feedback * feedback
1 parent b375b12 commit b2611ea

File tree

2 files changed

+226
-0
lines changed

2 files changed

+226
-0
lines changed

content/manual/upcoming/source/data-modeling/design-patterns/data-versioning.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,3 +87,4 @@ Learn More
8787

8888
Keep Document History </data-modeling/design-patterns/data-versioning/document-versioning>
8989
Maintain Versions </data-modeling/design-patterns/data-versioning/schema-versioning>
90+
Slowly Changing Dimensions </data-modeling/design-patterns/data-versioning/slowly-changing-dimensions>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
.. _design-patterns-slowly-changing-dimensions:
2+
3+
==========================
4+
Slowly Changing Dimensions
5+
==========================
6+
7+
.. meta::
8+
:description: Implement the Slowly Changing Dimensions framework to track and query changes and different versions of fields in your documents over time.
9+
10+
.. facet::
11+
:name: genre
12+
:values: tutorial
13+
14+
.. contents:: On this page
15+
:local:
16+
:backlinks: none
17+
:depth: 2
18+
:class: singlecol
19+
20+
Slowly changing dimensions (SCDs) is a framework for managing and
21+
tracking changes to dimension data in a data warehouse over time.
22+
This framework refers to the dimensions as “slowly changing” because
23+
it assumes that the data SCDs cover changes with a low frequency,
24+
but without any apparent pattern in time. Use SCDs when the
25+
requirements for the data warehouse cover functionality to track
26+
and reproduce outputs based on historical states of data.
27+
28+
A common use case for SCDs is reporting. For example,
29+
in financial reporting systems, you need to explain the differences
30+
between the aggregated values in a report produced last month and
31+
those in the current version of the report from the data warehouse.
32+
33+
The different implementations of SCDs in SQL are referred to as
34+
“types.” Types 0 and 1, the most basic types, only keep track of
35+
the original state of data or the current state of data, respectively.
36+
Type 2, the most commonly applied implementation, creates three
37+
new fields: ``validFrom``, ``validTo``, and an optional flag
38+
on the latest set of data, often called ``isValid`` or ``isEffective``.
39+
40+
SCD Types
41+
---------
42+
43+
.. list-table::
44+
:header-rows: 1
45+
:stub-columns: 1
46+
:widths: 20 80
47+
48+
* - SCD Type
49+
- Description
50+
51+
* - Type 0
52+
- Only keep original state and data cannot be changed.
53+
54+
* - Type 1
55+
- Only keep updated state and history cannot be stored.
56+
57+
* - Type 2
58+
- Keep history in a new document.
59+
60+
* - Type 3
61+
- Keep history in new fields in the same document.
62+
63+
* - Type 4
64+
- Keep history in a separate collection.
65+
66+
* - Type 6
67+
- Combination of Type 2 and Type 3.
68+
69+
SCDs in MongoDB
70+
---------------
71+
72+
You can apply the SCD framework to MongoDB in the same way you apply it to
73+
a relational database. The concept of slowly changing dimensions applies on a
74+
per-document basis in the chosen and optimized data model for the specific use case.
75+
76+
Example
77+
~~~~~~~
78+
79+
Consider a collection called ``prices`` that stores the
80+
prices of a set of items. You need to keep track of the changes of the
81+
price of an item over time in order to be able to process returns of an
82+
item, as the money refunded must match the price of the item at the time of
83+
purchase. Each document in the collection has an ``item`` and ``price`` field:
84+
85+
.. code-block:: javascript
86+
87+
db.prices.insertMany( [
88+
{ 'item': 'shorts', 'price': 10 },
89+
{ 'item': 't-shirt', 'price': 2 },
90+
{ 'item': 'pants', 'price': 5 },
91+
] )
92+
93+
Suppose the price of pants changes from 5 to 7. To track this price change,
94+
assume the default values for the necessary data fields for SCD Type 2.
95+
The default value for ``validFrom`` is 01.01.1900, ``validTo`` is 01.01.9999,
96+
and ``isValid`` is ``true``. To change the ``price`` field in the object with
97+
``'item': 'pants'``, insert a new document to represent the current state
98+
of the pants, and update the previously valid document to no longer be valid:
99+
100+
.. code-block:: javascript
101+
102+
let now = new Date();
103+
104+
db.prices.updateOne(
105+
{
106+
'item': 'pants',
107+
"$or": [
108+
{ "isValid": false },
109+
{ "isValid": null }
110+
]
111+
},
112+
{ "$set":
113+
{
114+
"validFrom": new Date("1900-01-01"),
115+
"validTo": now,
116+
"isValid": false
117+
}
118+
}
119+
);
120+
121+
db.prices.insertOne(
122+
{
123+
'item': 'pants',
124+
'price': 7,
125+
"validFrom": now,
126+
"validTo": new Date("9999-01-01"),
127+
"isValid": true
128+
}
129+
);
130+
131+
To avoid breaking the chain of validity, ensure that both of the above
132+
database operation occur at the same timestamp. Depending on the
133+
requirements of the application, you can wrap the two above commands
134+
into a transaction to ensure MongoDB always applies both changes together.
135+
For more information, see :ref:`transactions`.
136+
137+
The following operation demonstrates how to query the latest
138+
``price`` of the document containing the ``pants`` item:
139+
140+
.. code-block:: javascript
141+
142+
db.prices.find( { 'item': 'pants', 'isValid': true } );
143+
144+
To query for the ``price`` of the document containing the ``pants``
145+
item at a specific point in time, use the following operation:
146+
147+
.. code-block:: javascript
148+
149+
let time = new Date("2022-11-16T13:00:00");
150+
db.prices.find( {
151+
'item': 'pants',
152+
'validFrom': { '$lte': time },
153+
'validTo': { '$gt': time }
154+
} );
155+
156+
Tracking Changes in Few Fields
157+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
158+
159+
If you only need to track changes over time to few fields
160+
in a document, you can use SCD type 3 by embedding the
161+
history of a field as an array in the first document.
162+
163+
For example, the following aggregation pipeline updates the ``price``
164+
in the document representing ``pants`` to ``7`` and stores the
165+
previous value of the ``price`` with a timestamp of when the
166+
previous ``price`` became invalid in an array called ``priceHistory``:
167+
168+
.. code-block:: javascript
169+
170+
db.prices.aggregate( [
171+
{ $match: { 'item': 'pants' } },
172+
{ $addFields:
173+
{ price: 7, priceHistory:
174+
{ $concatArrays:
175+
[
176+
{ $ifNull: [ '$priceHistory', [] ] },
177+
[ { price: "$price", time: now } ]
178+
]
179+
}
180+
}
181+
},
182+
{ $merge:
183+
{
184+
into: "prices",
185+
on: "_id",
186+
whenMatched: "merge",
187+
whenNotMatched: "fail"
188+
}
189+
}
190+
] )
191+
192+
This solution can become slow or inefficient if your array size gets too large.
193+
To avoid large arrays, you can use the :ref:`outlier <group-data-outlier-pattern>`
194+
or the :ref:`bucket <group-data-bucket-pattern>` patterns to design your schema.
195+
196+
Outlook Data Federation
197+
-----------------------
198+
199+
The above examples focus on a strict and accurate representation of
200+
document field changes. Sometimes, you might have less strict requirements
201+
on showing historical data. For example, you might have an application that
202+
only requires access to the current state of the data most of the time,
203+
but you must run some analytical queries on the full history of data.
204+
205+
In this case, you can store the current version of the data in one collection
206+
and the historical changes in another collection. You can then remove the
207+
historical collection from the active MongoDB cluster using the
208+
:ref:`MongoDB Atlas Federated Database <atlas-data-federation>` functionalities,
209+
and in the fully managed version using the
210+
:atlas:`Online Archive </online-archive/manage-online-archive/>`.
211+
212+
Other Use Cases
213+
---------------
214+
215+
While slowly changing dimensions is helpful for data warehousing, you
216+
can also use the SCD framework in event-driven applications. If you have
217+
infrequent events in different types of categories, it is expensive to
218+
find the latest event per category, as the process could require
219+
grouping or sorting your data in order to find the current state.
220+
221+
In the case of infrequent events, you can amend the data model by
222+
adding a field to store the time of the next event, in addition
223+
to the event time per document. The new date field ensures that
224+
if you execute a search for a specific point in time, you can easily
225+
and efficiently retrieve the respective event you are searching for.

0 commit comments

Comments
 (0)