Skip to content

Commit b375b12

Browse files
DOCSP-49410-subset-pattern (#12001) (#12346)
* DOCSP-49410-subset-pattern * fixes * test * feedback * edit code examples
1 parent 56e73f0 commit b375b12

File tree

4 files changed

+342
-0
lines changed

4 files changed

+342
-0
lines changed

content/manual/manual/source/data-modeling/design-patterns/group-data.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,3 +68,4 @@ Learn More
6868
Bucket Pattern </data-modeling/design-patterns/group-data/bucket-pattern>
6969
Outlier Pattern </data-modeling/design-patterns/group-data/outlier-pattern>
7070
Attribute Pattern </data-modeling/design-patterns/group-data/attribute-pattern>
71+
Subset Pattern </data-modeling/design-patterns/group-data/subset-pattern>
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
.. _group-data-subset-pattern:
2+
3+
==================================
4+
Group Data with the Subset Pattern
5+
==================================
6+
7+
.. meta::
8+
:description: Improve query access and minimize working by using the subset pattern to group highly accessed data into one collection and less frequently accessed data into another collection.
9+
10+
.. contents:: On this page
11+
:local:
12+
:backlinks: none
13+
:depth: 1
14+
:class: singlecol
15+
16+
MongoDB keeps frequently accessed data, referred to as the
17+
:term:`working set`, in RAM. When the working set of data
18+
and indexes grows beyond the physical RAM allotted, performance
19+
is reduced as disk accesses starts to occur and data is no longer retrieved from RAM.
20+
21+
To solve this problem, you can shard your collection. However,
22+
sharding can create additional costs and complexities that your
23+
application may not be ready for. Rather than sharding your collection,
24+
you can reduce the size of your working set by using the subset pattern.
25+
26+
The subset pattern is a data modeling technique used to handle
27+
scenarios where you have a large array of items within a document,
28+
but need to access frequently a small subset of those items.
29+
In this case, the document size can often cause the working set to exceed
30+
the computer's RAM capacities. The subset pattern helps optimize performance by reducing
31+
the amount of data that needs to be read from the database for common queries.
32+
33+
About this Task
34+
---------------
35+
36+
Consider an e-commerce site that has a list of reviews for a product, stored in a
37+
collection called ``products``. The e-commerce site inserts
38+
documents with the following schema into the ``products`` collection:
39+
40+
.. code-block:: javascript
41+
42+
db.collection('products').insertOne( [
43+
{
44+
_id: ObjectId("507f1f77bcf86cd99338452"),
45+
name: "Super Widget",
46+
description: "This is the most useful item in your toolbox."
47+
price: { value: NumberDecimal("119.99"), currency: "USD" },
48+
reviews: [
49+
{
50+
review_id: 786,
51+
review_author: "Kristina",
52+
review_text: "This is indeed an amazing widgt.",
53+
published_date: ISODate("2019-02-18")
54+
},
55+
{
56+
review_id: 785,
57+
review_author: "Trina",
58+
review_text: "Very nice product, slow shipping.",
59+
published_date: ISODate("2019-02-17")
60+
},
61+
[...],
62+
{
63+
review_id: 1,
64+
review_author: "Hans",
65+
review_text: "Meh, it's ok.",
66+
published_date: ISODate("2017-12-06")
67+
}
68+
]
69+
}
70+
] )
71+
72+
When accessing a product’s data, you likely only need the most recent reviews.
73+
The following procedure demonstrates how to apply the subset pattern to the above schema.
74+
75+
Steps
76+
-----
77+
78+
.. procedure::
79+
:style: normal
80+
81+
.. step:: Identify the subset of frequently accessed data.
82+
83+
In an array field containing information about a document, determine the subset of
84+
information you need to access the most. For example, in the ``products``
85+
collection, you might only need to access the ten most recent reviews.
86+
87+
.. step:: Separate the subset into different collections.
88+
89+
Instead of storing all the reviews with the product, split your collection
90+
into two collections: one for your most accessed data, and one for your least
91+
accessed data. This allows for quick access to the most relevant data without
92+
having to load the entire array.
93+
94+
The first collection, the ``products`` collection, contains the
95+
most frequently used data, such as current reviews:
96+
97+
.. code-block:: javascript
98+
99+
db.collection('products').insertOne( [
100+
{
101+
_id: ObjectId("507f1f77bcf86cd99338452"),
102+
name: "Super Widget",
103+
description: "This is the most useful item in your toolbox."
104+
price: { value: NumberDecimal("119.99"), currency: "USD" },
105+
reviews: [
106+
{
107+
review_id: 786,
108+
review_author: "Kristina",
109+
review_text: "This is indeed an amazing widget.",
110+
published_date: ISODate("2019-02-18")
111+
},
112+
[...],
113+
{
114+
review_id: 776,
115+
review_author: "Pablo",
116+
review_text: "Amazing!",
117+
published_date: ISODate("2019-02-15")
118+
}
119+
]
120+
}
121+
] )
122+
123+
The ``products`` collection only contains the ten most recent reviews.
124+
This reduces the working set by only loading in a portion, or a subset, of the overall data.
125+
126+
The second collection, the ``reviews`` collection, contains less frequently used data, such as old reviews:
127+
128+
.. code-block:: javascript
129+
130+
db.collection('review').insertOne( [
131+
{
132+
review_id: 786,
133+
review_author: "Kristina",
134+
review_text: "This is indeed an amazing widget.",
135+
product_id: ObjectId("507f1f77bcf86cd99338452"),
136+
published_date: ISODate("2019-02-18")
137+
},
138+
{
139+
review_id: 785,
140+
review_author: "Trina",
141+
review_text: "Very nice product, slow shipping.",
142+
product_id: ObjectId("507f1f77bcf86cd99338452"),
143+
published_date: ISODate("2019-02-17")
144+
},
145+
[...],
146+
{
147+
review_id: 1,
148+
review_author: "Hans",
149+
review_text: "Meh, it's ok.",
150+
product_id: ObjectId("507f1f77bcf86cd99338452"),
151+
published_date: ISODate("2017-12-06")
152+
}
153+
] )
154+
155+
You can access the ``reviews`` collection whenever you need to see additional
156+
reviews. When considering where to split your data, store the most used fields
157+
of your documents in your main collection and the less frequently used data in a new collection.
158+
159+
Results
160+
-------
161+
162+
By using smaller documents with more frequently accessed data, you reduce the overall size
163+
of the working set. This allows for shorter disk access times for the most frequently used
164+
information that your application needs.
165+
166+
.. note::
167+
168+
The subset pattern requires you to manage two collections, rather than one, as well as query
169+
multiple databases when you need to gather comprehensive information on a document, rather than
170+
the subset.

content/manual/v7.0/source/data-modeling/design-patterns/group-data.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,3 +68,4 @@ Learn More
6868
Bucket Pattern </data-modeling/design-patterns/group-data/bucket-pattern>
6969
Outlier Pattern </data-modeling/design-patterns/group-data/outlier-pattern>
7070
Attribute Pattern </data-modeling/design-patterns/group-data/attribute-pattern>
71+
Subset Pattern </data-modeling/design-patterns/group-data/subset-pattern>
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
.. _group-data-subset-pattern:
2+
3+
==================================
4+
Group Data with the Subset Pattern
5+
==================================
6+
7+
.. meta::
8+
:description: Improve query access and minimize working by using the subset pattern to group highly accessed data into one collection and less frequently accessed data into another collection.
9+
10+
.. contents:: On this page
11+
:local:
12+
:backlinks: none
13+
:depth: 1
14+
:class: singlecol
15+
16+
MongoDB keeps frequently accessed data, referred to as the
17+
:term:`working set`, in RAM. When the working set of data
18+
and indexes grows beyond the physical RAM allotted, performance
19+
is reduced as disk accesses starts to occur and data is no longer retrieved from RAM.
20+
21+
To solve this problem, you can shard your collection. However,
22+
sharding can create additional costs and complexities that your
23+
application may not be ready for. Rather than sharding your collection,
24+
you can reduce the size of your working set by using the subset pattern.
25+
26+
The subset pattern is a data modeling technique used to handle
27+
scenarios where you have a large array of items within a document,
28+
but need to access frequently a small subset of those items.
29+
In this case, the document size can often cause the working set to exceed
30+
the computer's RAM capacities. The subset pattern helps optimize performance by reducing
31+
the amount of data that needs to be read from the database for common queries.
32+
33+
About this Task
34+
---------------
35+
36+
Consider an e-commerce site that has a list of reviews for a product, stored in a
37+
collection called ``products``. The e-commerce site inserts
38+
documents with the following schema into the ``products`` collection:
39+
40+
.. code-block:: javascript
41+
42+
db.collection('products').insertOne( [
43+
{
44+
_id: ObjectId("507f1f77bcf86cd99338452"),
45+
name: "Super Widget",
46+
description: "This is the most useful item in your toolbox."
47+
price: { value: NumberDecimal("119.99"), currency: "USD" },
48+
reviews: [
49+
{
50+
review_id: 786,
51+
review_author: "Kristina",
52+
review_text: "This is indeed an amazing widgt.",
53+
published_date: ISODate("2019-02-18")
54+
},
55+
{
56+
review_id: 785,
57+
review_author: "Trina",
58+
review_text: "Very nice product, slow shipping.",
59+
published_date: ISODate("2019-02-17")
60+
},
61+
[...],
62+
{
63+
review_id: 1,
64+
review_author: "Hans",
65+
review_text: "Meh, it's ok.",
66+
published_date: ISODate("2017-12-06")
67+
}
68+
]
69+
}
70+
] )
71+
72+
When accessing a product’s data, you likely only need the most recent reviews.
73+
The following procedure demonstrates how to apply the subset pattern to the above schema.
74+
75+
Steps
76+
-----
77+
78+
.. procedure::
79+
:style: normal
80+
81+
.. step:: Identify the subset of frequently accessed data.
82+
83+
In an array field containing information about a document, determine the subset of
84+
information you need to access the most. For example, in the ``products``
85+
collection, you might only need to access the ten most recent reviews.
86+
87+
.. step:: Separate the subset into different collections.
88+
89+
Instead of storing all the reviews with the product, split your collection
90+
into two collections: one for your most accessed data, and one for your least
91+
accessed data. This allows for quick access to the most relevant data without
92+
having to load the entire array.
93+
94+
The first collection, the ``products`` collection, contains the
95+
most frequently used data, such as current reviews:
96+
97+
.. code-block:: javascript
98+
99+
db.collection('products').insertOne( [
100+
{
101+
_id: ObjectId("507f1f77bcf86cd99338452"),
102+
name: "Super Widget",
103+
description: "This is the most useful item in your toolbox."
104+
price: { value: NumberDecimal("119.99"), currency: "USD" },
105+
reviews: [
106+
{
107+
review_id: 786,
108+
review_author: "Kristina",
109+
review_text: "This is indeed an amazing widget.",
110+
published_date: ISODate("2019-02-18")
111+
},
112+
[...],
113+
{
114+
review_id: 776,
115+
review_author: "Pablo",
116+
review_text: "Amazing!",
117+
published_date: ISODate("2019-02-15")
118+
}
119+
]
120+
}
121+
] )
122+
123+
The ``products`` collection only contains the ten most recent reviews.
124+
This reduces the working set by only loading in a portion, or a subset, of the overall data.
125+
126+
The second collection, the ``reviews`` collection, contains less frequently used data, such as old reviews:
127+
128+
.. code-block:: javascript
129+
130+
db.collection('review').insertOne( [
131+
{
132+
review_id: 786,
133+
review_author: "Kristina",
134+
review_text: "This is indeed an amazing widget.",
135+
product_id: ObjectId("507f1f77bcf86cd99338452"),
136+
published_date: ISODate("2019-02-18")
137+
},
138+
{
139+
review_id: 785,
140+
review_author: "Trina",
141+
review_text: "Very nice product, slow shipping.",
142+
product_id: ObjectId("507f1f77bcf86cd99338452"),
143+
published_date: ISODate("2019-02-17")
144+
},
145+
[...],
146+
{
147+
review_id: 1,
148+
review_author: "Hans",
149+
review_text: "Meh, it's ok.",
150+
product_id: ObjectId("507f1f77bcf86cd99338452"),
151+
published_date: ISODate("2017-12-06")
152+
}
153+
] )
154+
155+
You can access the ``reviews`` collection whenever you need to see additional
156+
reviews. When considering where to split your data, store the most used fields
157+
of your documents in your main collection and the less frequently used data in a new collection.
158+
159+
Results
160+
-------
161+
162+
By using smaller documents with more frequently accessed data, you reduce the overall size
163+
of the working set. This allows for shorter disk access times for the most frequently used
164+
information that your application needs.
165+
166+
.. note::
167+
168+
The subset pattern requires you to manage two collections, rather than one, as well as query
169+
multiple databases when you need to gather comprehensive information on a document, rather than
170+
the subset.

0 commit comments

Comments
 (0)