Skip to content

Commit 579e5bd

Browse files
Split megatables with subtables into linear order
Because of megatables with subtables we've previously had to do a lot of "flattening" so that table-parsing happens in a reasonable "human-readable order": that is, parsing should not be depth-first or some kind of tree traversing thing that handles things in an order where pieces of the whole table are parsed out of order. The last attempt actually stopped a bit early, because it was sufficient: first parse the higher tables, then the subtables so that data from the higher level tables can propagate down. The trouble is (which was not relevant then, but is now with the new and improved Swahili tables after the megatable monstrosity was pared down something more manageable on Wiktionary) that the top level table is parsed completely, and you could get garbage data from portions of it that were in reading-order after a subtable. +-------------------+ | Main Data 1 | +--+-------------+--| | | Subtable 1 |--| +--+-------------+--| | Main Data 2 | +--+-------------+--| | | Subtable 2 |--| +--+-------------+--+ Main Data 1 and Main Data 2 would be parsed both before Subtable 1 and Subtable 2, so if there is something that affects subtables in Main Data 2 (like the new "dummy-section-header" inflmap trigger) then Subtable 1 would be affected by Main Data 2 when it shouldn't. This has now been changed so that that the higher-level table is now split into separate pieces: [Main Data 1, Main Data 2], [Subtable 1], [Subtable 2] =====> [Main Data 1], [Subtable 1], [Main Data 2], [Subtable 2] The split pieces of the higher-level table are parsed as new "tables", so they get separate in the data with table-separator entries etc.
1 parent 4a89c21 commit 579e5bd

File tree

1 file changed

+8
-2
lines changed

1 file changed

+8
-2
lines changed

wiktextract/inflection.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2748,6 +2748,10 @@ def handle_table1(config, ctx, tblctx, word, lang, pos,
27482748
pos, data, tbl, new_titles,
27492749
source, "", depth + 1)
27502750
if subtbl:
2751+
sub_ret.append((rows, titles, after, depth))
2752+
rows = []
2753+
titles = []
2754+
after = ""
27512755
sub_ret.extend(subtbl)
27522756

27532757
# This magic value is used as part of header detection
@@ -2839,9 +2843,11 @@ def handle_table1(config, ctx, tblctx, word, lang, pos,
28392843
# print(" TOP-LEVEL CELL", node)
28402844
pass
28412845

2842-
main_ret = [(rows, titles, after, depth)]
28432846
if sub_ret:
2844-
main_ret.extend(sub_ret)
2847+
main_ret = sub_ret
2848+
main_ret.append((rows, titles, after, depth))
2849+
else:
2850+
main_ret = [(rows, titles, after, depth)]
28452851
return main_ret
28462852

28472853

0 commit comments

Comments
 (0)