Skip to content

Commit a181c13

Browse files
New "dummy-section-header" trigger in inflmap
The new Swahili not-so-mega tables have a problem in that there are too many <table> elements that break the flow of our parser, a bit needlessly. When "Infinitives" in a separate <table>...Infinitives</table> from the data it should be heading (<table>...positive: blah...</table> it can't propagate, even with the old header-saving subsystem devised for the original Swahili megatable (see those commits). To fix this, using TableContext we introduce a new context-variable called TableContext.section_header which should contain the tag data of the current "section" being parsed, within a table with subtables. When inflmap finds the dummy-tag "dummy-section-header" it will save the tag data (of only one, the first) where it encounters it in a sect of tag alternatives into TableContext.section_header, and then later joins that data in the data of the cells. When it encounters dummy-section-header the next time, it will replace the tag data with the new section data.
1 parent 579e5bd commit a181c13

File tree

3 files changed

+128
-51
lines changed

3 files changed

+128
-51
lines changed

wiktextract/inflection.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1699,6 +1699,7 @@ def merge_row_and_column_tags(form, some_has_covered_text):
16991699
tags.update(extra_tags)
17001700
tags.update(rt)
17011701
tags.update(refs_tags)
1702+
tags.update(tblctx.section_header)
17021703
# Merge tags from column. For certain kinds of tags,
17031704
# those coming from row take precedence.
17041705
old_tags = set(tags)
@@ -1810,6 +1811,7 @@ def merge_row_and_column_tags(form, some_has_covered_text):
18101811
"dummy-store-hdrspan",
18111812
"dummy-load-stored-hdrspans",
18121813
"dummy-reset-stored-hdrspans",
1814+
"dummy-section-header",
18131815
])
18141816

18151817
# Perform language-specific tag replacements according
@@ -2034,6 +2036,10 @@ def merge_row_and_column_tags(form, some_has_covered_text):
20342036
new_hdrspans.append(hdrspan)
20352037
hdrspans = new_hdrspans
20362038

2039+
for tt in v:
2040+
if "dummy-section-header" in tt:
2041+
tblctx.section_header = tt
2042+
break
20372043
# Text between headers on a row causes earlier headers to
20382044
# be reset
20392045
if have_text:
@@ -2581,10 +2587,12 @@ class TableContext(object):
25812587
"""Saved context used when parsing a table and its subtables."""
25822588
__slot__ = (
25832589
"stored_hdrspans",
2590+
"section_header",
25842591
"template_name",
25852592
)
25862593
def __init__(self, template_name=None):
25872594
self.stored_hdrspans = []
2595+
self.section_header = []
25882596
if not template_name:
25892597
self.template_name = ""
25902598
else:

0 commit comments

Comments
 (0)