Skip to content

Commit bddbbcb

Browse files
Skip raw_glosses identical to a gloss in glosses
If an entry in raw_glosses is identical to something in `glosses` in the data of a sense in `senses`, just skip it in raw_glosses. If raw_glosses is empty, don't add the field.
1 parent 81312b2 commit bddbbcb

File tree

1 file changed

+15
-0
lines changed

1 file changed

+15
-0
lines changed

wiktextract/page.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3480,6 +3480,21 @@ def parse_page(ctx: Wtp, word: str, text: str, config: WiktionaryConfig) -> list
34803480
if field in sense:
34813481
sense[field] = list(sorted(set(sense[field])))
34823482

3483+
# If a raw_gloss is identical to something in glosses, remove it
3484+
for data in ret:
3485+
for s in data.get("senses", []):
3486+
new_raw_glosses = []
3487+
skipped = False
3488+
for rg in s.get("raw_glosses", []):
3489+
if rg not in s.get("glosses", []):
3490+
new_raw_glosses.append(rg)
3491+
else:
3492+
skipped = True
3493+
if not new_raw_glosses:
3494+
del s["raw_glosses"]
3495+
elif skipped:
3496+
s["raw_glosses"] = new_raw_glosses
3497+
34833498
# Return the resulting words
34843499
return ret
34853500

0 commit comments

Comments
 (0)