Skip to content

Commit 6819773

Browse files
DOCSP-49409-slowly-changing-dimensions-main-backport (11989) (#12348)
* DOCSP-49409-slowly-changing-dimensions (#11989) * DOCSP-49409-slowly-changing-dimensions * fixes * links * typo * feedback * more edits * code render fixes * build * reviewr feedback * reviewer feedback * feedback * reviewer feedback
1 parent 7eee38c commit 6819773

File tree

5 files changed

+456
-4
lines changed

5 files changed

+456
-4
lines changed

content/manual/manual/source/data-modeling/design-patterns/data-versioning.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,4 @@ Learn More
8484

8585
Keep Document History </data-modeling/design-patterns/data-versioning/document-versioning>
8686
Maintain Versions </data-modeling/design-patterns/data-versioning/schema-versioning>
87+
Slowly Changing Dimensions </data-modeling/design-patterns/data-versioning/slowly-changing-dimensions>
Lines changed: 225 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,225 @@
1+
.. _design-patterns-slowly-changing-dimensions:
2+
3+
==========================
4+
Slowly Changing Dimensions
5+
==========================
6+
7+
.. meta::
8+
:description: Implement the Slowly Changing Dimensions framework to track and query changes and different versions of fields in your documents over time.
9+
10+
.. facet::
11+
:name: genre
12+
:values: tutorial
13+
14+
.. contents:: On this page
15+
:local:
16+
:backlinks: none
17+
:depth: 2
18+
:class: singlecol
19+
20+
Slowly changing dimensions (SCDs) is a framework for managing and
21+
tracking changes to dimension data in a data warehouse over time.
22+
This framework refers to the dimensions as "slowly changing" because
23+
it assumes that the data SCDs cover changes with a low frequency,
24+
but without any apparent pattern in time. Use SCDs when the
25+
requirements for the data warehouse cover functionality to track
26+
and reproduce outputs based on historical states of data.
27+
28+
A common use case for SCDs is reporting. For example,
29+
in financial reporting systems, you need to explain the differences
30+
between the aggregated values in a report produced last month and
31+
those in the current version of the report from the data warehouse.
32+
33+
The different implementations of SCDs in SQL are referred to as
34+
"types." Types 0 and 1, the most basic types, only keep track of
35+
the original state of data or the current state of data, respectively.
36+
Type 2, the most commonly applied implementation, creates three
37+
new fields: ``validFrom``, ``validTo``, and an optional flag
38+
on the latest set of data, often called ``isValid`` or ``isEffective``.
39+
40+
SCD Types
41+
---------
42+
43+
.. list-table::
44+
:header-rows: 1
45+
:stub-columns: 1
46+
:widths: 20 80
47+
48+
* - SCD Type
49+
- Description
50+
51+
* - Type 0
52+
- Only keep original state and data cannot be changed.
53+
54+
* - Type 1
55+
- Only keep updated state and history cannot be stored.
56+
57+
* - Type 2
58+
- Keep history in a new document.
59+
60+
* - Type 3
61+
- Keep history in new fields in the same document.
62+
63+
* - Type 4
64+
- Keep history in a separate collection.
65+
66+
* - Type 6
67+
- Combination of Type 2 and Type 3.
68+
69+
SCDs in MongoDB
70+
---------------
71+
72+
You can apply the SCD framework to MongoDB in the same way you apply it to
73+
a relational database. The concept of slowly changing dimensions applies on a
74+
per-document basis in the chosen and optimized data model for the specific use case.
75+
76+
Example
77+
~~~~~~~
78+
79+
Consider a collection called ``prices`` that stores the
80+
prices of a set of items. You need to keep track of the changes of the
81+
price of an item over time in order to be able to process returns of an
82+
item, as the money refunded must match the price of the item at the time of
83+
purchase. Each document in the collection has an ``item`` and ``price`` field:
84+
85+
.. code-block:: javascript
86+
87+
db.prices.insertMany( [
88+
{ 'item': 'shorts', 'price': 10 },
89+
{ 'item': 't-shirt', 'price': 2 },
90+
{ 'item': 'pants', 'price': 5 },
91+
] )
92+
93+
Suppose the price of pants changes from ``5`` to ``7``. To track this price change,
94+
assume the default values for the necessary data fields for SCD Type 2.
95+
The default value for ``validFrom`` is ``01.01.1900``, ``validTo`` is ``01.01.9999``,
96+
and ``isValid`` is ``true``. To change the ``price`` field in the object with
97+
``'item': 'pants'``, insert a new document to represent the current state
98+
of the pants, and update the previously valid document to no longer be valid:
99+
100+
.. code-block:: javascript
101+
102+
let now = new Date();
103+
104+
db.prices.updateOne(
105+
{
106+
'item': 'pants',
107+
"$or": [
108+
{ "isValid": false },
109+
{ "isValid": null }
110+
]
111+
},
112+
{ "$set":
113+
{
114+
"validFrom": new Date("1900-01-01"),
115+
"validTo": now,
116+
"isValid": false
117+
}
118+
}
119+
);
120+
121+
db.prices.insertOne(
122+
{
123+
'item': 'pants',
124+
'price': 7,
125+
"validFrom": now,
126+
"validTo": new Date("9999-01-01"),
127+
"isValid": true
128+
}
129+
);
130+
131+
To avoid breaking the chain of validity, ensure that both of the above
132+
database operation occur at the same timestamp. Depending on the
133+
requirements of the application, you can wrap the two above commands
134+
into a transaction to ensure MongoDB always applies both changes together.
135+
For more information, see :ref:`transactions`.
136+
137+
The following operation demonstrates how to query the latest
138+
``price`` of the document containing the ``pants`` item:
139+
140+
.. code-block:: javascript
141+
142+
db.prices.find( { 'item': 'pants', 'isValid': true } );
143+
144+
To query for the ``price`` of the document containing the ``pants``
145+
item at a specific point in time, use the following operation:
146+
147+
.. code-block:: javascript
148+
149+
let time = new Date("2022-11-16T13:00:00");
150+
db.prices.find( {
151+
'item': 'pants',
152+
'validFrom': { '$lte': time },
153+
'validTo': { '$gt': time }
154+
} );
155+
156+
Tracking Changes in Few Fields
157+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
158+
159+
If you only need to track changes over time to few fields
160+
in a document, you can use SCD type 3 by embedding the
161+
history of a field as an array in the first document.
162+
163+
For example, the following aggregation pipeline updates the ``price``
164+
in the document representing ``pants`` to ``7`` and stores the
165+
previous value of the ``price`` with a timestamp of when the
166+
previous ``price`` became invalid in an array called ``priceHistory``:
167+
168+
.. code-block:: javascript
169+
170+
db.prices.aggregate( [
171+
{ $match: { 'item': 'pants' } },
172+
{ $addFields:
173+
{ price: 7, priceHistory:
174+
{ $concatArrays:
175+
[
176+
{ $ifNull: [ '$priceHistory', [] ] },
177+
[ { price: "$price", time: now } ]
178+
]
179+
}
180+
}
181+
},
182+
{ $merge:
183+
{
184+
into: "prices",
185+
on: "_id",
186+
whenMatched: "merge",
187+
whenNotMatched: "fail"
188+
}
189+
}
190+
] )
191+
192+
This solution can become slow or inefficient if your array size gets too large.
193+
To avoid large arrays, you can use the :ref:`outlier <group-data-outlier-pattern>`
194+
or the :ref:`bucket <group-data-bucket-pattern>` patterns to design your schema.
195+
196+
Outlook Data Federation
197+
-----------------------
198+
199+
The above examples focus on a strict and accurate representation of
200+
document field changes. Sometimes, you might have less strict requirements
201+
on showing historical data. For example, you might have an application that
202+
only requires access to the current state of the data most of the time,
203+
but you must run some analytical queries on the full history of data.
204+
205+
In this case, you can store the current version of the data in one collection
206+
and the historical changes in another collection. You can then remove the
207+
historical collection from the active MongoDB cluster using the
208+
:ref:`MongoDB Atlas Federated Database <atlas-data-federation>` functionalities,
209+
and in the fully managed version using the
210+
:atlas:`Online Archive </online-archive/manage-online-archive/>`.
211+
212+
Other Use Cases
213+
---------------
214+
215+
While slowly changing dimensions is helpful for data warehousing, you
216+
can also use the SCD framework in event-driven applications. If you have
217+
infrequent events in different types of categories, it is expensive to
218+
find the latest event per category, as the process could require
219+
grouping or sorting your data in order to find the current state.
220+
221+
In the case of infrequent events, you can amend the data model by
222+
adding a field to store the time of the next event, in addition
223+
to the event time per document. The new date field ensures that
224+
if you execute a search for a specific point in time, you can easily
225+
and efficiently retrieve the respective event you are searching for.

content/manual/upcoming/source/data-modeling/design-patterns/data-versioning/slowly-changing-dimensions.txt

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Slowly Changing Dimensions
1919

2020
Slowly changing dimensions (SCDs) is a framework for managing and
2121
tracking changes to dimension data in a data warehouse over time.
22-
This framework refers to the dimensions as slowly changing because
22+
This framework refers to the dimensions as "slowly changing" because
2323
it assumes that the data SCDs cover changes with a low frequency,
2424
but without any apparent pattern in time. Use SCDs when the
2525
requirements for the data warehouse cover functionality to track
@@ -31,7 +31,7 @@ between the aggregated values in a report produced last month and
3131
those in the current version of the report from the data warehouse.
3232

3333
The different implementations of SCDs in SQL are referred to as
34-
types. Types 0 and 1, the most basic types, only keep track of
34+
"types." Types 0 and 1, the most basic types, only keep track of
3535
the original state of data or the current state of data, respectively.
3636
Type 2, the most commonly applied implementation, creates three
3737
new fields: ``validFrom``, ``validTo``, and an optional flag
@@ -90,9 +90,9 @@ purchase. Each document in the collection has an ``item`` and ``price`` field:
9090
{ 'item': 'pants', 'price': 5 },
9191
] )
9292

93-
Suppose the price of pants changes from 5 to 7. To track this price change,
93+
Suppose the price of pants changes from ``5`` to ``7``. To track this price change,
9494
assume the default values for the necessary data fields for SCD Type 2.
95-
The default value for ``validFrom`` is 01.01.1900, ``validTo`` is 01.01.9999,
95+
The default value for ``validFrom`` is ``01.01.1900``, ``validTo`` is ``01.01.9999``,
9696
and ``isValid`` is ``true``. To change the ``price`` field in the object with
9797
``'item': 'pants'``, insert a new document to represent the current state
9898
of the pants, and update the previously valid document to no longer be valid:

content/manual/v7.0/source/data-modeling/design-patterns/data-versioning.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,4 @@ Learn More
8484

8585
Keep Document History </data-modeling/design-patterns/data-versioning/document-versioning>
8686
Maintain Versions </data-modeling/design-patterns/data-versioning/schema-versioning>
87+
Slowly Changing Dimensions </data-modeling/design-patterns/data-versioning/slowly-changing-dimensions>

0 commit comments

Comments
 (0)