Skip to content

Commit 34d9e61

Browse files
minherzgcf-owl-bot[bot]m-strzelczyk
authored
fix: import log to destination project (GoogleCloudPlatform#11005)
* fix: import log to destination project patch logName to have destination project. store original logName as user label. refactor code to replace double quotes with single quotes in all literals. refactor method comments. refactor handling partial errors of type WriteLogEntriesPartialErrors. * fix: update README to address code changes Add clarifications about the changes to log's logName field. Update references to the lines in code with the new commit hashes. * fix: update mock object add .project property to mocked logging client to support changes in code * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * chore(fix): cosmetic changes fix english in the code comment fix line numbers in README to reflect recent code refactoring. * chore(fix): convert quote to double quote * Update logging/import-logs/main.py --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com> Co-authored-by: Maciej Strzelczyk <[email protected]>
1 parent 9ff5277 commit 34d9e61

File tree

3 files changed

+54
-52
lines changed

3 files changed

+54
-52
lines changed

logging/import-logs/README.md

Lines changed: 23 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,19 @@ solution in [documentation].
99
1010
[documentation]: (LINK-TO-REFERENCE-ARCHITECTURE-ARTICLE)-->
1111

12+
All logs will be imported to the project that runs Cloud Run job.
13+
Alternatively you can explicitly configure the project ID where you want logs to be imported.
14+
The logs will be imported with log name `imported_logs`.
15+
It means that the `logName` field of the imported logs will be:
16+
17+
```text
18+
projects/PROJECT_ID/logs/imported_logs
19+
```
20+
21+
where `PROJECT_ID` is the project ID of the project where you import the logs.
22+
23+
The original log name will be stored in the user labels under the key `original_logName` (mind the casing).
24+
1225
## Requirements
1326

1427
You have to grant the following permissions to a service account that you will
@@ -47,6 +60,7 @@ have to be configured when a new Cloud Run job for the solution is created:
4760
| END_DATE | A string in format `MM/DD/YYYY` that defines the last day of the import range |
4861
| LOG_ID | A string that identifies the particular log to be imported. See [documentation][logid] for more details. |
4962
| STORAGE_BUCKET_NAME | A name of the storage bucket where the exported logs are stored. |
63+
| PROJECT_ID | (Optional) If you want to explicitly define destination project other than one your import job is deployed |
5064

5165
<!--Read [documentation] for more information about Cloud Run job setup.-->
5266

@@ -94,33 +108,25 @@ nox -s py-3.11
94108

95109
## Importing log entries with timestamps older than 30 days
96110

97-
The import logs implementation does not currently support logs with the timestamp older than 30 days
98-
because the incoming log entries must not be older than default retention period (see [documentation][retention]).
99-
To prevent such logs from being ignored the implementation returns errors if the timestamp is older than 29 days.
111+
The incoming log entries that are older than default retention period (i.e. 30 days) are not ingested.
112+
You can see [documentation][retention] for more details. The code takes a safety margin of an extra day
113+
to prevent importing old logs that cannot be ingested.
100114
To import the older logs you have to modified the [current] code by performing the following modifications:
101115

102-
* remove the fencing condition [lines][code1]:
103-
104-
<https://github.com/GoogleCloudPlatform/python-docs-samples/blob/95dd4f53ff96470a1f842d3134d56b017a85ac27/logging/import-logs/main.py#L91-L93>
116+
* comment the fencing condition [lines][code1]:
105117

106-
* in [`import_logs`][code2] add the following block after the call to [`_patch_reserved_log_ids`][code3]:
107-
108-
```python
109-
log.labels['original_timestamp'] = log.timestamp
110-
log.timestamp = None
111-
```
118+
<https://github.com/GoogleCloudPlatform/python-docs-samples/blob/86f12a752a4171e137adaa855c7247be9d5d39a2/logging/import-logs/main.py#L81-L83>
112119

113-
to keep the original timestamp as a user metadata labeled `original_timestamp` and to ingest the log entry using _current_ timestamp.
120+
* uncomment [2 lines][code2] to keep the original timestamp as a user label and to reset the `timestamp` field of the imported log entries.
114121

115122
> [!IMPORTANT]
116123
>
117-
> 1. After this change the logs should be queried using the `timestamp` field.
124+
> 1. To query the imported logs by timestamp you will have to use the label `original_timestamp` instead of the `timestamp` field.
118125
> 1. If the same log entry is imported multiple times, the query response may include more than one line.
119126
120127
After applying the changes, [build](#build) a custom container image and use it when creating an import job.
121128

122129
[retention]: https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#FIELDS.timestamp
123130
[current]: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/e2709a218072c86ec1a9b9101db45057ebfdbff0/logging/import-logs/main.py
124-
[code1]: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/95dd4f53ff96470a1f842d3134d56b017a85ac27/logging/import-logs/main.py#L91-L93
125-
[code2]: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/95dd4f53ff96470a1f842d3134d56b017a85ac27/logging/import-logs/main.py#L196
126-
[code3]: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/95dd4f53ff96470a1f842d3134d56b017a85ac27/logging/import-logs/main.py#L206
131+
[code1]: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/86f12a752a4171e137adaa855c7247be9d5d39a2/logging/import-logs/main.py#L81-L83
132+
[code2]: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/86f12a752a4171e137adaa855c7247be9d5d39a2/logging/import-logs/main.py#L186-L187

logging/import-logs/main.py

Lines changed: 29 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919
import json
2020
import math
2121
import os
22-
import re
2322
import sys
2423

2524
from typing import List, Tuple, TypedDict
@@ -30,19 +29,6 @@
3029
# Logging limits (https://cloud.google.com/logging/quotas#api-limits)
3130
_LOGS_MAX_SIZE_BYTES = 9 * 1024 * 1024 # < 10MB
3231

33-
_RESERVED_LOG_IDS = ["cloudaudit.googleapis.com"]
34-
_LOGGER_NAME_TEMPLATE = re.compile(
35-
r"""
36-
(projects/) # static prefix group (1)
37-
([^/]+) # initial letter, wordchars group (2) for project ID
38-
(/logs/) # static midfix group (3)
39-
(?P<name>[^/]+) # initial letter, wordchars group for LOG_ID
40-
""",
41-
re.VERBOSE,
42-
)
43-
44-
_LOG_ID_PREFIX = "_" # allowed characters are r"[a-zA-Z_\-\.\/\\]
45-
4632
# Read Cloud Run environment variables
4733
TASK_INDEX = int(os.getenv("CLOUD_RUN_TASK_INDEX", "0"))
4834
TASK_COUNT = int(os.getenv("CLOUD_RUN_TASK_COUNT", "1"))
@@ -70,24 +56,28 @@ def eprint(*objects: str, **kwargs: TypedDict) -> None:
7056

7157

7258
def _day(blob_name: str) -> int:
73-
"""Parse day number from Blob's name
74-
using the following Blob name convention:
75-
<LOG_ID>/YYYY/MM/DD/<OBJECT_NAME>
59+
"""Parse day number from Blob's path
60+
61+
Use the known Blob path convention to parse the day part from the path.
62+
The path convention is <LOG_ID>/YYYY/MM/DD/<OBJECT_NAME>
7663
"""
7764
# calculated in function to allow test to set LOG_ID
7865
offset = len(LOG_ID) + 1 + 4 + 1 + 2 + 1
7966
return int(blob_name[offset : offset + 2])
8067

8168

8269
def _is_valid_import_range() -> bool:
83-
"""Check the import range dates to ensure that
70+
"""Validate the import date range
71+
72+
Checks the import range dates to ensure that
8473
- start date is earlier than end date
85-
- no dates in the range is older than 29 days
86-
(for reason see https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#FIELDS.timestamp)
74+
- no dates in the range are older than 29 days
75+
due to https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#FIELDS.timestamp
8776
"""
8877
if START_DATE > END_DATE:
8978
eprint("Start date of the import time range should be earlier than end date")
9079
return False
80+
# comment the following 3 lines if import range includes dates older than 29 days from now
9181
if (date.today() - START_DATE).days > 29:
9282
eprint("Import range includes dates older than 29 days from today.")
9383
return False
@@ -170,27 +160,31 @@ def _write_logs(logs: List[dict], client: logging_v2.Client) -> None:
170160
try:
171161
client.logging_api.write_entries(logs)
172162
except exceptions.PermissionDenied as err2:
173-
partialerrors = logging_v2.types.WriteLogEntriesPartialErrors()
174163
for detail in err2.details:
175-
if detail.Unpack(partialerrors):
164+
if isinstance(detail, logging_v2.types.WriteLogEntriesPartialErrors):
176165
# partialerrors.log_entry_errors is a dictionary
177166
# keyed by the logs' zero-based index in the logs.
178167
# consider implementing custom error handling
179-
eprint(json.dumps(partialerrors.log_entry_errors))
168+
eprint(f"{detail}")
180169
raise
181170

182171

183-
def _patch_reserved_log_ids(log: dict) -> None:
184-
"""Replaces first character in LOG_ID with underscore for reserved LOG_ID prefixes"""
172+
def _patch_entry(log: dict, project_id: str) -> None:
173+
"""Modify entry fields to allow importing entry to destination project.
174+
175+
Save logName as a user label.
176+
Replace logName with the fixed value "projects/PROJECT_ID/logs/imported_logs"
177+
"""
185178
log_name = log.get("logName")
186-
if log_name:
187-
match = _LOGGER_NAME_TEMPLATE.match(log_name)
188-
log_id = match.group("name")
189-
if log_id and log_id.startswith(tuple(_RESERVED_LOG_IDS)):
190-
log_name = _LOGGER_NAME_TEMPLATE.sub(
191-
f"\\g<1>\\g<2>\\g<3>{_LOG_ID_PREFIX + log_id[1:]}", log_name
192-
)
193-
log["logName"] = log_name
179+
labels = log.get("labels")
180+
log["logName"] = f"projects/{project_id}/logs/imported_logs"
181+
if not labels:
182+
labels = dict()
183+
log["labels"] = labels
184+
labels["original_logName"] = log_name
185+
# uncomment the following 2 lines if import range includes dates older than 29 days from now
186+
# labels["original_timestamp"] = log["timestamp"]
187+
# log["timestamp"] = None
194188

195189

196190
def import_logs(
@@ -203,7 +197,7 @@ def import_logs(
203197
data = _read_logs(file_path, bucket)
204198
for entry in data:
205199
log = json.loads(entry)
206-
_patch_reserved_log_ids(log)
200+
_patch_entry(log, logging_client.project)
207201
size = sys.getsizeof(log)
208202
if total_size + size >= _LOGS_MAX_SIZE_BYTES:
209203
_write_logs(logs, logging_client)
@@ -244,5 +238,5 @@ def main() -> None:
244238
try:
245239
main()
246240
except Exception as err:
247-
eprint(f"Task #{TASK_INDEX}, failed: {str(err)}")
241+
eprint(f"Task #{TASK_INDEX+1}, failed: {err}")
248242
sys.exit(1)

logging/import-logs/main_test.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
TEST_LOG_ID = "test-log"
3434
TEST_BUCKET = "test-bucket"
3535
TEST_BUCKET_NAME = f"gs://{TEST_BUCKET}"
36+
TEST_PROJECT_ID = "test-project-id"
3637

3738

3839
def _setup_environment(
@@ -339,6 +340,7 @@ def test_import_logs(
339340
mocked_bucket.blob = MagicMock(side_effect=_args_based_blob_return)
340341
mocked_logging_client = MagicMock(spec=logging_v2.Client)
341342
mocked_logging_client.logging_api = MagicMock()
343+
mocked_logging_client.project = TEST_PROJECT_ID
342344
mocked_write_entries = mocked_logging_client.logging_api.write_entries = MagicMock()
343345

344346
main.import_logs(log_files, mocked_storage_client, mocked_logging_client)

0 commit comments

Comments
 (0)