You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2021-05-04-backing-up-data-warehouse.md
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ parquet file and the corresponding `DeltaLog`.
25
25
26
26
When the task of having a workable backup of all those delta lake files fell
27
27
into my lap, I decided to look some of the age old concepts of backup in a new
28
-
perspective. THe concerns I consdiered were:
28
+
perspective. The concerns I consdiered were:
29
29
30
30
1. What am I protecting against? How much I need to protect?
31
31
1. Can I survive with loosing some data during restore and do I have the option of rebuilding them again from that point of time recovery?
@@ -61,7 +61,7 @@ Once we decided to use [AWS S3 batch operation](https://docs.aws.amazon.com/Ama
61
61
**Pros**:
62
62
63
63
* Simple setup, we can terraform it easily
64
-
* Much efficient operation compare to generating our list as that list object APIonly returns 1000 rows per call that means we have to keep iterating till we get the full list.
64
+
* Much more efficient operation compare to generating our list as that list object API only returns 1000 rows per call that means we have to keep iterating till we get the full list.
65
65
66
66
**Cons**:
67
67
@@ -71,7 +71,7 @@ Once we decided to use [AWS S3 batch operation](https://docs.aws.amazon.com/Ama
71
71
To overcome the downsides, we decided to run the backup at a later date, e.g. for a backup of March 31st we based that off a manifest generated on April 2nd. This manifest would certainly have all data up until March 31st and some of April 1st's files as well.
72
72
73
73
Once we have settled on this model, the rest of the work was similar to any
74
-
other backup process. We also set up the Source and the Destination to have
74
+
other backup process. We also set up the Source and the Destination to have
75
75
protective boundaries so that we don't accidentally propogate any deletes to
76
76
the backups.
77
77
@@ -82,7 +82,7 @@ backed up data set in completely separate bucket in a different AWS account
82
82
with stringent access controls in place. With the new account it was much easier to
83
83
control the access level from the beginning rather than controlling access in
84
84
an already existing account where people already have certain degree of access
85
-
and hard to modify that access levels. In the new account we ensured only a few handful of people nothing will actually have
85
+
and hard to modify that access levels. In the new account we ensured only a few handful of people will actually have
86
86
access to backed up data, further reducing chances of any manual error.
0 commit comments