Commit 8ca3bf2
committed
Fixed a bug in classify_desc()
Previously when doing unicode data stuff with the text,
the unicode data was decomposed into components pieces
(like an ellipse "…" -> "...") to make it easier to figure
out what data they have (especially with COMBINING characters),
but that data was then zip()ed together with the original text,
causing mismatches when handling it; the indexes were off, so
"character -> unicodedata" did not hold true.
The description data is now normalized into normalized_desc,
which is used when appropriate to analyze characters.1 parent 6f16f57 commit 8ca3bf2
2 files changed
+20
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2397 | 2397 | | |
2398 | 2398 | | |
2399 | 2399 | | |
| 2400 | + | |
| 2401 | + | |
2400 | 2402 | | |
2401 | 2403 | | |
2402 | 2404 | | |
| |||
2428 | 2430 | | |
2429 | 2431 | | |
2430 | 2432 | | |
2431 | | - | |
2432 | | - | |
| 2433 | + | |
| 2434 | + | |
| 2435 | + | |
| 2436 | + | |
2433 | 2437 | | |
2434 | 2438 | | |
2435 | 2439 | | |
| |||
2488 | 2492 | | |
2489 | 2493 | | |
2490 | 2494 | | |
2491 | | - | |
| 2495 | + | |
2492 | 2496 | | |
2493 | 2497 | | |
2494 | 2498 | | |
2495 | | - | |
| 2499 | + | |
| 2500 | + | |
| 2501 | + | |
| 2502 | + | |
| 2503 | + | |
2496 | 2504 | | |
2497 | 2505 | | |
2498 | 2506 | | |
| |||
2504 | 2512 | | |
2505 | 2513 | | |
2506 | 2514 | | |
| 2515 | + | |
2507 | 2516 | | |
| 2517 | + | |
| 2518 | + | |
| 2519 | + | |
2508 | 2520 | | |
2509 | 2521 | | |
2510 | 2522 | | |
| |||
0 commit comments