Skip to content

Commit cc7a9db

Browse files
DOCSP-50213 csfle right to erasure (#12294)
* Initial draft * Added What is CSFLE * More draft content * Moooooore draft content * Added article to TOC * Removing DS_Store * Added right sidebar TOC * Added default domain just in case * ref cleanup * TOC title edit * First section copy edits * More copy edits * More copy edits * Wording * Wording * Wording * Wording * Wording change * Added missing space to copyable option
1 parent 4e73a8f commit cc7a9db

13 files changed

+378
-0
lines changed

content/.DS_Store

-6 KB
Binary file not shown.

content/manual/manual/source/core/csfle/tutorials.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,10 +136,16 @@ following table for language-specific quick start guides.
136136
* - `Ruby <https://docs.mongodb.com/ruby-driver/current/>`__
137137
- `Ruby Driver {+csfle-abbrev+} Quick Start <https://www.mongodb.com/docs/ruby-driver/current/reference/in-use-encryption/client-side-encryption/>`__
138138

139+
Other Tutorials
140+
---------------
141+
142+
For an example of implementing Right to Erasure with {+csfle-abbrev+}, see :ref:`csfle-right-to-erasure`.
143+
139144
.. toctree::
140145
:titlesonly:
141146

142147
Use AWS </core/csfle/tutorials/aws/aws-automatic>
143148
Use Azure </core/csfle/tutorials/azure/azure-automatic>
144149
Use GCP </core/csfle/tutorials/gcp/gcp-automatic>
145150
Use KMIP </core/csfle/tutorials/kmip/kmip-automatic>
151+
Sample App: Right to Erasure </core/csfle/tutorials/right-to-erasure>
Lines changed: 372 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,372 @@
1+
.. facet::
2+
:name: genre
3+
:values: tutorial
4+
5+
.. facet::
6+
:name: programming_language
7+
:values: python
8+
9+
.. meta::
10+
:keywords: client-side field level encryption, encryption, gdpr
11+
12+
.. _csfle-right-to-erasure:
13+
.. _csfle-gdpr:
14+
15+
===============================================================================
16+
Implementing Right to Erasure with {+csfle-abbrev+}
17+
===============================================================================
18+
19+
.. default-domain:: mongodb
20+
21+
.. contents:: On this page
22+
:local:
23+
:backlinks: none
24+
:depth: 2
25+
:class: singlecol
26+
27+
The Crypto Shredding sample app demonstrates how you can make use of MongoDB's
28+
:ref:`{+csfle+} ({+csfle-abbrev+}) <manual-csfle-feature>` to strengthen
29+
procedures for removing sensitive data.
30+
31+
About the Sample Application
32+
----------------------------
33+
34+
The right to erasure, also known as the right to be forgotten, is a right
35+
granted to individuals under laws and regulations such as GDPR. This means that
36+
companies storing an individual's personal data must be able to delete it on
37+
request. Because data can be spread across several systems, it can be
38+
technically challenging for these companies to identify and remove it from all
39+
places. Even when properly executed, there is the risk that deleted data can be
40+
restored from backups in the future, potentially creating legal and financial
41+
risks.
42+
43+
.. warning::
44+
45+
MongoDB provides no guarantees that the solution and techniques described
46+
in this article fulfill all regulatory requirements around the right to
47+
erasure. Your organization must determine appropriate, sufficient measures
48+
to comply with regulatory requirements such as GDPR.
49+
50+
The Crypto Shredding sample application demonstrates one way to implement a
51+
right to erasure. The demo application is a Python (Flask) Web application with
52+
a front end for adding users, logging in, and entering data. It also includes
53+
an "Admin" page to showcase crypto shredding functionality.
54+
55+
You can install and run the application by following the instructions in the
56+
`GitHub repository <https://github.com/mongodb-developer/mongodb-flask-cryptoshredding-example>`__.
57+
58+
What is Crypto-Shredding?
59+
-------------------------
60+
61+
Crypto-shredding, also called cryptographic erasure, is a data destruction
62+
technique where, instead of destroying encrypted data, you destroy the
63+
encryption keys necessary to decrypt it. This makes the data indecipherable.
64+
65+
For example, imagine you are storing data for multiple users. You start by
66+
giving each user their own unique {+dek-long+} (DEK), and mapping it to that
67+
customer.
68+
69+
In the diagram, "User A" and "User B" each have their own unique DEK in the key
70+
store. Each key is used to encrypt or decrypt data for its respective
71+
user:
72+
73+
.. image:: /images/devcenter_csfle_gdpr_deks.png
74+
:alt: A Data Encryption Key store with two users
75+
76+
Let's assume that you want to remove all data for User B. If you remove User
77+
B's DEK, you can no longer decrypt their data. Everything in the datastore
78+
becomes indecipherable ciphertext. User A's data is unaffected, since their
79+
DEK still exists:
80+
81+
.. image:: /images/devcenter_csfle_gdpr_delete_user.png
82+
:alt: Deleting User B's data encryption key
83+
84+
What is {+csfle-abbrev+}?
85+
-------------------------
86+
87+
With :ref:`{+csfle-abbrev+} <manual-csfle-feature>`, applications can encrypt
88+
sensitive fields in documents prior to transmitting data to the server. Even
89+
when data is being used by the database in memory, it is never in plain text.
90+
The database stores and transmits encrypted data that is only deciphered by the client.
91+
92+
{+csfle-abbrev+} uses :term:`envelope encryption`, which is the practice of
93+
encrypting plaintext data with a data key, that is itself encrypted by
94+
a top level envelope key (also known as a "{+cmk-long+}", or CMK).
95+
96+
.. image:: /images/devcenter_csfle_gdpr_dek_diagram.webp
97+
:alt: Envelope encryption diagram
98+
99+
Encryption Key Management
100+
~~~~~~~~~~~~~~~~~~~~~~~~~
101+
102+
CMKs are usually managed by a Key Management Service (KMS).
103+
{+csfle-abbrev+} :ref:`supports multiple KMSs <qe-fundamentals-kms-providers>`,
104+
including Amazon Web Services (AWS), Azure Key Vault, Google Cloud Platform
105+
(GCP), and keystores that support the KMIP standard, such as Hashicorp
106+
Keyvault. The sample app uses Amazon Web Services as the KMS.
107+
108+
Automatic and {+manual-enc-title+}
109+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
110+
111+
{+csfle-abbrev+} can be used in :ref:`automatic
112+
<csfle-fundamentals-automatic-encryption>` or :ref:`explicit <csfle-fundamentals-manual-encryption>` mode, or a
113+
combination of both. The sample app uses {+manual-enc+}.
114+
115+
- With automatic encryption, you perform encrypted read and
116+
write operations based on a defined :ref:`encryption schema
117+
<csfle-fundamentals-create-schema>`, so you don't need application code to
118+
specify how to encrypt or decrypt fields.
119+
120+
- With {+manual-enc+}, you use the MongoDB driver's encryption library to
121+
manually encrypt or decrypt fields in your application.
122+
123+
124+
Sample App Walkthrough
125+
----------------------
126+
127+
The sample app uses {+csfle-abbrev+} with {+manual-enc+}, and Amazon Web
128+
Services as the KMS:
129+
130+
.. image:: /images/devcenter_csfle_gdpr_crypto_shredding_example.png
131+
:alt: An example of the crypto shredding UI
132+
133+
Adding Users
134+
~~~~~~~~~~~~
135+
136+
The app instantiates the ``ClientEncryption`` class by initializing an
137+
``app.mongodb_encryption_client`` object. This encryption client is responsible
138+
for generating DEKs, and then encrypting them using a CMK from the AWS KMS.
139+
140+
When a user signs up, the application generates a unique DEK for them using the
141+
``create_data_key`` method, then returns the ``data_key_id``:
142+
143+
.. code-block:: python
144+
:copyable: true
145+
146+
# flaskapp/db_queries.py
147+
148+
@aws_credential_handler
149+
def create_key(userId):
150+
data_key_id = \app.mongodb_encryption_client.create_data_key
151+
(kms_provider, master_key, key_alt_names=[userId])
152+
return data_key_id
153+
154+
The app then uses this method when saving user information:
155+
156+
.. code-block:: python
157+
:copyable: true
158+
159+
# flaskapp/user.py
160+
161+
def save(self):
162+
dek_id = db_queries.create_key(self.username)
163+
result = app.mongodb[db_name].user.insert_one(
164+
{
165+
"username": self.username,
166+
"password_hash": self.password_hash,
167+
"dek_id": dek_id,
168+
"createdAt": datetime.now(),
169+
}
170+
)
171+
if result:
172+
self.id = result.inserted_id
173+
return True
174+
else:
175+
return False
176+
177+
Adding and Encrypting Data
178+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
179+
180+
Once registered, a user can log in and enter data as key-value pairs via an
181+
input form:
182+
183+
.. image:: /images/devcenter_csfle_gdpr_data_input_form.png
184+
:alt: A sample UI for adding data
185+
186+
The database stores this data in a MongoDB collection named “data,”
187+
where each document includes the username and the key-value pair:
188+
189+
.. code-block:: json
190+
:copyable: true
191+
192+
{
193+
"name": "shoe size",
194+
"value": "10",
195+
"username": "tom"
196+
}
197+
198+
The sample app encrypts the ``value`` and ``username`` fields, but not
199+
the ``name``. The app encrypts fields with the user's DEK and a specified
200+
encryption algorithm:
201+
202+
.. code-block:: python
203+
:copyable: true
204+
205+
# flaskapp/db_queries.py
206+
207+
# Fields to encrypt, and the algorithm to encrypt them with
208+
ENCRYPTED_FIELDS = {
209+
# Deterministic encryption for username, because we need to search on it
210+
"username": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic,
211+
# Random encryption for value, as we don't need to search on it
212+
"value": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random,
213+
}
214+
215+
The ``insert_data`` function takes an unencrypted document and loops over the
216+
``ENCRYPTED_FIELDS`` to encrypt them:
217+
218+
.. code-block:: python
219+
:copyable: true
220+
221+
# flaskapp/db_queries.py
222+
223+
def insert_data(document):
224+
document["username"] = current_user.username
225+
# Loop over the field names (and associated algorithm) we want to encrypt
226+
for field, algo in ENCRYPTED_FIELDS.items():
227+
# if the field exists in the document, encrypt it
228+
if document.get(field):
229+
document[field] = encrypt_field(document[field], algo)
230+
# Insert document (now with encrypted fields) to the data collection
231+
app.data_collection.insert_one(document)
232+
233+
If the specified field exists in the document, the function calls
234+
``encrypt_field`` to encrypt it using the specified algorithm:
235+
236+
.. code-block:: python
237+
:copyable: true
238+
239+
# flaskapp/db_queries.py
240+
241+
# Encrypt a single field with the given algorithm
242+
@aws_credential_handler
243+
def encrypt_field(field, algorithm):
244+
try:
245+
field = app.mongodb_encryption_client.encrypt(
246+
field,
247+
algorithm,
248+
key_alt_name=current_user.username,
249+
)
250+
return field
251+
except pymongo.errors.EncryptionError as ex:
252+
# Catch this error in case the DEK doesn't exist. Log a warning and
253+
# re-raise the exception
254+
if "not all keys requested were satisfied" in ex._message:
255+
app.logger.warn(
256+
f"Encryption failed: could not find data encryption key for user: {current_user.username}"
257+
)
258+
raise ex
259+
260+
After adding data, you can see it in the Web app:
261+
262+
.. image:: /images/devcenter_csfle_gdpr_demo_data_entered.png
263+
:alt: Sample data in the UI
264+
265+
Deleting an Encryption Key
266+
~~~~~~~~~~~~~~~~~~~~~~~~~~
267+
268+
Now let's see what happens when you delete the DEK. The sample app does this
269+
from an admin page, which should be restricted to only those individuals with
270+
authorization to manage keys:
271+
272+
.. image:: /images/devcenter_csfle_gdpr_demo_admin_page.png
273+
:alt: The sample app admin page
274+
275+
The "Delete data encryption key" option removes the DEK, but leaves the user's
276+
encrypted data in place. After that, the application can no longer decrypt the
277+
data. Trying to retrieve the data for the logged in user throws an error:
278+
279+
.. image:: /images/devcenter_csfle_gdpr_demo_error_message.png
280+
:alt: An error message when trying to retrieve encrypted data without a key
281+
282+
.. note::
283+
284+
After deleting the DEK, the application may still be able to decrypt and
285+
show data until its cache expires, up to 60 seconds later.
286+
287+
But what is actually left in the database? You can review the information by
288+
returning to the Admin page and clicking :guilabel:`Fetch data for all
289+
users`. This view doesn't throw an exception if the application can't decrypt
290+
the data. Instead, it shows exactly what is stored in the database.
291+
292+
Even though you haven't actually deleted the user's data, because the data
293+
encryption key no longer exists, the application can only show the ciphertext
294+
for the encrypted fields "username" and "value".
295+
296+
.. image:: /images/devcenter_csfle_gdpr_raw_ciphertext.png
297+
:alt: Raw ciphertext from the demo app database
298+
299+
Here is the code used to fetch this data. It uses similar logic to the
300+
``encrypt`` method shown earlier. The application runs a ``find`` operation
301+
without any filters to retrieve all data, then loops over the
302+
``ENCRYPTED_FIELDS`` dictionary to decrypt fields:
303+
304+
.. code-block:: python
305+
:copyable: true
306+
307+
# flaskapp/db_queries.py
308+
309+
def fetch_all_data_unencrypted(decrypt=False):
310+
results = list(app.data_collection.find())
311+
312+
if decrypt:
313+
for field in ENCRYPTED_FIELDS.keys():
314+
for result in results:
315+
if result.get(field):
316+
result[field], result["encryption_succeeded"] = decrypt_field(result[field])
317+
return results
318+
319+
The ``decrypt_field`` function is called for each field to be decrypted, but in
320+
this case the application catches the error if it can't successfully decrypt
321+
it due to a missing DEK:
322+
323+
.. code-block:: python
324+
:copyable: true
325+
326+
# flaskapp/db_queries.py
327+
328+
# Try to decrypt a field, returning a tuple of (value, status). This will be
329+
either (decrypted_value, True), or (raw_cipher_text, False) if we
330+
couldn't decrypt
331+
def decrypt_field(field):
332+
try:
333+
# We don't need to pass the DEK or algorithm to decrypt a field
334+
field = app.mongodb_encryption_client.decrypt(field)
335+
return field, True
336+
# Catch this error in case the DEK doesn't exist.
337+
except pymongo.errors.EncryptionError as ex:
338+
if "not all keys requested were satisfied" in ex._message:
339+
app.logger.warn(
340+
"Decryption failed: could not find data encryption key to decrypt the record."
341+
)
342+
# If we can't decrypt due to missing DEK, return the "raw" value.
343+
return field, False
344+
raise ex
345+
346+
You can also use the mongosh shell to check directly in the database, to
347+
prove that there's nothing readable:
348+
349+
.. image:: /images/devcenter_csfle_gdpr_mongosh.png
350+
:alt: mongosh shell output when querying the database
351+
352+
At this point, the user's encrypted data is still present. Someone could gain
353+
access to it by restoring their encryption key, such as from a database backup.
354+
355+
To prevent this, the sample application uses two separate database clusters:
356+
one for storing data, and one for storing DEKs (the "key vault"). Using
357+
separate clusters decouples the restoration of backups for application data and
358+
the key vault. Restoring the data cluster from a backup doesn't restore any
359+
DEKs that were deleted from the key vault cluster.
360+
361+
Conclusion
362+
----------
363+
364+
{+csfle+} can simplify the task of "forgetting" certain data. By deleting data
365+
keys, you can effectively forget data that exists across different databases,
366+
collections, backups, and logs.
367+
368+
In a production application, you might also delete the encrypted data itself,
369+
on top of removing the encryption key. This "defense in depth" approach helps
370+
ensure that data is really gone. Implementing crypto shredding on top of data
371+
deletion minimizes the impact if a delete operation fails, or doesn't include
372+
data that should have been wiped.
Loading
Loading
Binary file not shown.
Loading
Loading
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)