|
| 1 | +.. facet:: |
| 2 | + :name: genre |
| 3 | + :values: tutorial |
| 4 | + |
| 5 | +.. facet:: |
| 6 | + :name: programming_language |
| 7 | + :values: python |
| 8 | + |
| 9 | +.. meta:: |
| 10 | + :keywords: client-side field level encryption, encryption, gdpr |
| 11 | + |
| 12 | +.. _csfle-right-to-erasure: |
| 13 | +.. _csfle-gdpr: |
| 14 | + |
| 15 | +=============================================================================== |
| 16 | +Implementing Right to Erasure with {+csfle-abbrev+} |
| 17 | +=============================================================================== |
| 18 | + |
| 19 | +.. default-domain:: mongodb |
| 20 | + |
| 21 | +.. contents:: On this page |
| 22 | + :local: |
| 23 | + :backlinks: none |
| 24 | + :depth: 2 |
| 25 | + :class: singlecol |
| 26 | + |
| 27 | +The Crypto Shredding sample app demonstrates how you can make use of MongoDB's |
| 28 | +:ref:`{+csfle+} ({+csfle-abbrev+}) <manual-csfle-feature>` to strengthen |
| 29 | +procedures for removing sensitive data. |
| 30 | + |
| 31 | +About the Sample Application |
| 32 | +---------------------------- |
| 33 | + |
| 34 | +The right to erasure, also known as the right to be forgotten, is a right |
| 35 | +granted to individuals under laws and regulations such as GDPR. This means that |
| 36 | +companies storing an individual's personal data must be able to delete it on |
| 37 | +request. Because data can be spread across several systems, it can be |
| 38 | +technically challenging for these companies to identify and remove it from all |
| 39 | +places. Even when properly executed, there is the risk that deleted data can be |
| 40 | +restored from backups in the future, potentially creating legal and financial |
| 41 | +risks. |
| 42 | + |
| 43 | +.. warning:: |
| 44 | + |
| 45 | + MongoDB provides no guarantees that the solution and techniques described |
| 46 | + in this article fulfill all regulatory requirements around the right to |
| 47 | + erasure. Your organization must determine appropriate, sufficient measures |
| 48 | + to comply with regulatory requirements such as GDPR. |
| 49 | + |
| 50 | +The Crypto Shredding sample application demonstrates one way to implement a |
| 51 | +right to erasure. The demo application is a Python (Flask) Web application with |
| 52 | +a front end for adding users, logging in, and entering data. It also includes |
| 53 | +an "Admin" page to showcase crypto shredding functionality. |
| 54 | + |
| 55 | +You can install and run the application by following the instructions in the |
| 56 | +`GitHub repository <https://github.com/mongodb-developer/mongodb-flask-cryptoshredding-example>`__. |
| 57 | + |
| 58 | +What is Crypto-Shredding? |
| 59 | +------------------------- |
| 60 | + |
| 61 | +Crypto-shredding, also called cryptographic erasure, is a data destruction |
| 62 | +technique where, instead of destroying encrypted data, you destroy the |
| 63 | +encryption keys necessary to decrypt it. This makes the data indecipherable. |
| 64 | + |
| 65 | +For example, imagine you are storing data for multiple users. You start by |
| 66 | +giving each user their own unique {+dek-long+} (DEK), and mapping it to that |
| 67 | +customer. |
| 68 | + |
| 69 | +In the diagram, "User A" and "User B" each have their own unique DEK in the key |
| 70 | +store. Each key is used to encrypt or decrypt data for its respective |
| 71 | +user: |
| 72 | + |
| 73 | +.. image:: /images/devcenter_csfle_gdpr_deks.png |
| 74 | + :alt: A Data Encryption Key store with two users |
| 75 | + |
| 76 | +Let's assume that you want to remove all data for User B. If you remove User |
| 77 | +B's DEK, you can no longer decrypt their data. Everything in the datastore |
| 78 | +becomes indecipherable ciphertext. User A's data is unaffected, since their |
| 79 | +DEK still exists: |
| 80 | + |
| 81 | +.. image:: /images/devcenter_csfle_gdpr_delete_user.png |
| 82 | + :alt: Deleting User B's data encryption key |
| 83 | + |
| 84 | +What is {+csfle-abbrev+}? |
| 85 | +------------------------- |
| 86 | + |
| 87 | +With :ref:`{+csfle-abbrev+} <manual-csfle-feature>`, applications can encrypt |
| 88 | +sensitive fields in documents prior to transmitting data to the server. Even |
| 89 | +when data is being used by the database in memory, it is never in plain text. |
| 90 | +The database stores and transmits encrypted data that is only deciphered by the client. |
| 91 | + |
| 92 | +{+csfle-abbrev+} uses :term:`envelope encryption`, which is the practice of |
| 93 | +encrypting plaintext data with a data key, that is itself encrypted by |
| 94 | +a top level envelope key (also known as a "{+cmk-long+}", or CMK). |
| 95 | + |
| 96 | +.. image:: /images/devcenter_csfle_gdpr_dek_diagram.webp |
| 97 | + :alt: Envelope encryption diagram |
| 98 | + |
| 99 | +Encryption Key Management |
| 100 | +~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 101 | + |
| 102 | +CMKs are usually managed by a Key Management Service (KMS). |
| 103 | +{+csfle-abbrev+} :ref:`supports multiple KMSs <qe-fundamentals-kms-providers>`, |
| 104 | +including Amazon Web Services (AWS), Azure Key Vault, Google Cloud Platform |
| 105 | +(GCP), and keystores that support the KMIP standard, such as Hashicorp |
| 106 | +Keyvault. The sample app uses Amazon Web Services as the KMS. |
| 107 | + |
| 108 | +Automatic and {+manual-enc-title+} |
| 109 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 110 | + |
| 111 | +{+csfle-abbrev+} can be used in :ref:`automatic |
| 112 | +<csfle-fundamentals-automatic-encryption>` or :ref:`explicit <csfle-fundamentals-manual-encryption>` mode, or a |
| 113 | +combination of both. The sample app uses {+manual-enc+}. |
| 114 | + |
| 115 | +- With automatic encryption, you perform encrypted read and |
| 116 | + write operations based on a defined :ref:`encryption schema |
| 117 | + <csfle-fundamentals-create-schema>`, so you don't need application code to |
| 118 | + specify how to encrypt or decrypt fields. |
| 119 | + |
| 120 | +- With {+manual-enc+}, you use the MongoDB driver's encryption library to |
| 121 | + manually encrypt or decrypt fields in your application. |
| 122 | + |
| 123 | + |
| 124 | +Sample App Walkthrough |
| 125 | +---------------------- |
| 126 | + |
| 127 | +The sample app uses {+csfle-abbrev+} with {+manual-enc+}, and Amazon Web |
| 128 | +Services as the KMS: |
| 129 | + |
| 130 | +.. image:: /images/devcenter_csfle_gdpr_crypto_shredding_example.png |
| 131 | + :alt: An example of the crypto shredding UI |
| 132 | + |
| 133 | +Adding Users |
| 134 | +~~~~~~~~~~~~ |
| 135 | + |
| 136 | +The app instantiates the ``ClientEncryption`` class by initializing an |
| 137 | +``app.mongodb_encryption_client`` object. This encryption client is responsible |
| 138 | +for generating DEKs, and then encrypting them using a CMK from the AWS KMS. |
| 139 | + |
| 140 | +When a user signs up, the application generates a unique DEK for them using the |
| 141 | +``create_data_key`` method, then returns the ``data_key_id``: |
| 142 | + |
| 143 | +.. code-block:: python |
| 144 | + :copyable: true |
| 145 | + |
| 146 | + # flaskapp/db_queries.py |
| 147 | + |
| 148 | + @aws_credential_handler |
| 149 | + def create_key(userId): |
| 150 | + data_key_id = \app.mongodb_encryption_client.create_data_key |
| 151 | + (kms_provider, master_key, key_alt_names=[userId]) |
| 152 | + return data_key_id |
| 153 | + |
| 154 | +The app then uses this method when saving user information: |
| 155 | + |
| 156 | +.. code-block:: python |
| 157 | + :copyable: true |
| 158 | + |
| 159 | + # flaskapp/user.py |
| 160 | + |
| 161 | + def save(self): |
| 162 | + dek_id = db_queries.create_key(self.username) |
| 163 | + result = app.mongodb[db_name].user.insert_one( |
| 164 | + { |
| 165 | + "username": self.username, |
| 166 | + "password_hash": self.password_hash, |
| 167 | + "dek_id": dek_id, |
| 168 | + "createdAt": datetime.now(), |
| 169 | + } |
| 170 | + ) |
| 171 | + if result: |
| 172 | + self.id = result.inserted_id |
| 173 | + return True |
| 174 | + else: |
| 175 | + return False |
| 176 | + |
| 177 | +Adding and Encrypting Data |
| 178 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 179 | + |
| 180 | +Once registered, a user can log in and enter data as key-value pairs via an |
| 181 | +input form: |
| 182 | + |
| 183 | +.. image:: /images/devcenter_csfle_gdpr_data_input_form.png |
| 184 | + :alt: A sample UI for adding data |
| 185 | + |
| 186 | +The database stores this data in a MongoDB collection named “data,” |
| 187 | +where each document includes the username and the key-value pair: |
| 188 | + |
| 189 | +.. code-block:: json |
| 190 | + :copyable: true |
| 191 | + |
| 192 | + { |
| 193 | + "name": "shoe size", |
| 194 | + "value": "10", |
| 195 | + "username": "tom" |
| 196 | + } |
| 197 | + |
| 198 | +The sample app encrypts the ``value`` and ``username`` fields, but not |
| 199 | +the ``name``. The app encrypts fields with the user's DEK and a specified |
| 200 | +encryption algorithm: |
| 201 | + |
| 202 | +.. code-block:: python |
| 203 | + :copyable: true |
| 204 | + |
| 205 | + # flaskapp/db_queries.py |
| 206 | + |
| 207 | + # Fields to encrypt, and the algorithm to encrypt them with |
| 208 | + ENCRYPTED_FIELDS = { |
| 209 | + # Deterministic encryption for username, because we need to search on it |
| 210 | + "username": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Deterministic, |
| 211 | + # Random encryption for value, as we don't need to search on it |
| 212 | + "value": Algorithm.AEAD_AES_256_CBC_HMAC_SHA_512_Random, |
| 213 | + } |
| 214 | + |
| 215 | +The ``insert_data`` function takes an unencrypted document and loops over the |
| 216 | +``ENCRYPTED_FIELDS`` to encrypt them: |
| 217 | + |
| 218 | +.. code-block:: python |
| 219 | + :copyable: true |
| 220 | + |
| 221 | + # flaskapp/db_queries.py |
| 222 | + |
| 223 | + def insert_data(document): |
| 224 | + document["username"] = current_user.username |
| 225 | + # Loop over the field names (and associated algorithm) we want to encrypt |
| 226 | + for field, algo in ENCRYPTED_FIELDS.items(): |
| 227 | + # if the field exists in the document, encrypt it |
| 228 | + if document.get(field): |
| 229 | + document[field] = encrypt_field(document[field], algo) |
| 230 | + # Insert document (now with encrypted fields) to the data collection |
| 231 | + app.data_collection.insert_one(document) |
| 232 | + |
| 233 | +If the specified field exists in the document, the function calls |
| 234 | +``encrypt_field`` to encrypt it using the specified algorithm: |
| 235 | + |
| 236 | +.. code-block:: python |
| 237 | + :copyable: true |
| 238 | + |
| 239 | + # flaskapp/db_queries.py |
| 240 | + |
| 241 | + # Encrypt a single field with the given algorithm |
| 242 | + @aws_credential_handler |
| 243 | + def encrypt_field(field, algorithm): |
| 244 | + try: |
| 245 | + field = app.mongodb_encryption_client.encrypt( |
| 246 | + field, |
| 247 | + algorithm, |
| 248 | + key_alt_name=current_user.username, |
| 249 | + ) |
| 250 | + return field |
| 251 | + except pymongo.errors.EncryptionError as ex: |
| 252 | + # Catch this error in case the DEK doesn't exist. Log a warning and |
| 253 | + # re-raise the exception |
| 254 | + if "not all keys requested were satisfied" in ex._message: |
| 255 | + app.logger.warn( |
| 256 | + f"Encryption failed: could not find data encryption key for user: {current_user.username}" |
| 257 | + ) |
| 258 | + raise ex |
| 259 | + |
| 260 | +After adding data, you can see it in the Web app: |
| 261 | + |
| 262 | +.. image:: /images/devcenter_csfle_gdpr_demo_data_entered.png |
| 263 | + :alt: Sample data in the UI |
| 264 | + |
| 265 | +Deleting an Encryption Key |
| 266 | +~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 267 | + |
| 268 | +Now let's see what happens when you delete the DEK. The sample app does this |
| 269 | +from an admin page, which should be restricted to only those individuals with |
| 270 | +authorization to manage keys: |
| 271 | + |
| 272 | +.. image:: /images/devcenter_csfle_gdpr_demo_admin_page.png |
| 273 | + :alt: The sample app admin page |
| 274 | + |
| 275 | +The "Delete data encryption key" option removes the DEK, but leaves the user's |
| 276 | +encrypted data in place. After that, the application can no longer decrypt the |
| 277 | +data. Trying to retrieve the data for the logged in user throws an error: |
| 278 | + |
| 279 | +.. image:: /images/devcenter_csfle_gdpr_demo_error_message.png |
| 280 | + :alt: An error message when trying to retrieve encrypted data without a key |
| 281 | + |
| 282 | +.. note:: |
| 283 | + |
| 284 | + After deleting the DEK, the application may still be able to decrypt and |
| 285 | + show data until its cache expires, up to 60 seconds later. |
| 286 | + |
| 287 | +But what is actually left in the database? You can review the information by |
| 288 | +returning to the Admin page and clicking :guilabel:`Fetch data for all |
| 289 | +users`. This view doesn't throw an exception if the application can't decrypt |
| 290 | +the data. Instead, it shows exactly what is stored in the database. |
| 291 | + |
| 292 | +Even though you haven't actually deleted the user's data, because the data |
| 293 | +encryption key no longer exists, the application can only show the ciphertext |
| 294 | +for the encrypted fields "username" and "value". |
| 295 | + |
| 296 | +.. image:: /images/devcenter_csfle_gdpr_raw_ciphertext.png |
| 297 | + :alt: Raw ciphertext from the demo app database |
| 298 | + |
| 299 | +Here is the code used to fetch this data. It uses similar logic to the |
| 300 | +``encrypt`` method shown earlier. The application runs a ``find`` operation |
| 301 | +without any filters to retrieve all data, then loops over the |
| 302 | +``ENCRYPTED_FIELDS`` dictionary to decrypt fields: |
| 303 | + |
| 304 | +.. code-block:: python |
| 305 | + :copyable: true |
| 306 | + |
| 307 | + # flaskapp/db_queries.py |
| 308 | + |
| 309 | + def fetch_all_data_unencrypted(decrypt=False): |
| 310 | + results = list(app.data_collection.find()) |
| 311 | + |
| 312 | + if decrypt: |
| 313 | + for field in ENCRYPTED_FIELDS.keys(): |
| 314 | + for result in results: |
| 315 | + if result.get(field): |
| 316 | + result[field], result["encryption_succeeded"] = decrypt_field(result[field]) |
| 317 | + return results |
| 318 | + |
| 319 | +The ``decrypt_field`` function is called for each field to be decrypted, but in |
| 320 | +this case the application catches the error if it can't successfully decrypt |
| 321 | +it due to a missing DEK: |
| 322 | + |
| 323 | +.. code-block:: python |
| 324 | + :copyable: true |
| 325 | + |
| 326 | + # flaskapp/db_queries.py |
| 327 | + |
| 328 | + # Try to decrypt a field, returning a tuple of (value, status). This will be |
| 329 | + either (decrypted_value, True), or (raw_cipher_text, False) if we |
| 330 | + couldn't decrypt |
| 331 | + def decrypt_field(field): |
| 332 | + try: |
| 333 | + # We don't need to pass the DEK or algorithm to decrypt a field |
| 334 | + field = app.mongodb_encryption_client.decrypt(field) |
| 335 | + return field, True |
| 336 | + # Catch this error in case the DEK doesn't exist. |
| 337 | + except pymongo.errors.EncryptionError as ex: |
| 338 | + if "not all keys requested were satisfied" in ex._message: |
| 339 | + app.logger.warn( |
| 340 | + "Decryption failed: could not find data encryption key to decrypt the record." |
| 341 | + ) |
| 342 | + # If we can't decrypt due to missing DEK, return the "raw" value. |
| 343 | + return field, False |
| 344 | + raise ex |
| 345 | + |
| 346 | +You can also use the mongosh shell to check directly in the database, to |
| 347 | +prove that there's nothing readable: |
| 348 | + |
| 349 | +.. image:: /images/devcenter_csfle_gdpr_mongosh.png |
| 350 | + :alt: mongosh shell output when querying the database |
| 351 | + |
| 352 | +At this point, the user's encrypted data is still present. Someone could gain |
| 353 | +access to it by restoring their encryption key, such as from a database backup. |
| 354 | + |
| 355 | +To prevent this, the sample application uses two separate database clusters: |
| 356 | +one for storing data, and one for storing DEKs (the "key vault"). Using |
| 357 | +separate clusters decouples the restoration of backups for application data and |
| 358 | +the key vault. Restoring the data cluster from a backup doesn't restore any |
| 359 | +DEKs that were deleted from the key vault cluster. |
| 360 | + |
| 361 | +Conclusion |
| 362 | +---------- |
| 363 | + |
| 364 | +{+csfle+} can simplify the task of "forgetting" certain data. By deleting data |
| 365 | +keys, you can effectively forget data that exists across different databases, |
| 366 | +collections, backups, and logs. |
| 367 | + |
| 368 | +In a production application, you might also delete the encrypted data itself, |
| 369 | +on top of removing the encryption key. This "defense in depth" approach helps |
| 370 | +ensure that data is really gone. Implementing crypto shredding on top of data |
| 371 | +deletion minimizes the impact if a delete operation fails, or doesn't include |
| 372 | +data that should have been wiped. |
0 commit comments