|
| 1 | +# Lookahead and lookbehind |
| 2 | + |
| 3 | +Sometimes we need to match a pattern only if followed by another pattern. For instance, we'd like to get the price from a string like `subject:1 turkey costs 30€`. |
| 4 | + |
| 5 | +We need a number (let's say a price has no decimal point) followed by `subject:€` sign. |
| 6 | + |
| 7 | +That's what lookahead is for. |
| 8 | + |
| 9 | +## Lookahead |
| 10 | + |
| 11 | +The syntax is: `pattern:x(?=y)`, it means "match `pattern:x` only if followed by `pattern:y`". |
| 12 | + |
| 13 | +The euro sign is often written after the amount, so the regexp will be `pattern:\d+(?=€)` (assuming the price has no decimal point): |
| 14 | + |
| 15 | +```js run |
| 16 | +let str = "1 turkey costs 30€"; |
| 17 | + |
| 18 | +alert( str.match(/\d+(?=€)/) ); // 30 (correctly skipped the sole number 1) |
| 19 | +``` |
| 20 | + |
| 21 | +Or, if we wanted a quantity, then a negative lookahead can be applied. |
| 22 | + |
| 23 | +The syntax is: `pattern:x(?!y)`, it means "match `pattern:x` only if not followed by `pattern:y`". |
| 24 | + |
| 25 | +```js run |
| 26 | +let str = "2 turkeys cost 60€"; |
| 27 | + |
| 28 | +alert( str.match(/\d+(?!€)/) ); // 2 (correctly skipped the price) |
| 29 | +``` |
| 30 | + |
| 31 | +## Lookbehind |
| 32 | + |
| 33 | +Lookbehind allows to match a pattern only if there's something before. |
| 34 | + |
| 35 | +The syntax is: |
| 36 | +- Positive lookbehind: `pattern:(?<=y)x`, matches `pattern:x`, but only if it follows after `pattern:y`. |
| 37 | +- Negative lookbehind: `pattern:(?<!y)x`, matches `pattern:x`, but only if there's no `pattern:y` before. |
| 38 | + |
| 39 | +For example, let's change the price to US dollars. The dollar sign is usually before the number, so to look for `$30` we'll use `pattern:(?<=\$)\d+`: |
| 40 | + |
| 41 | +```js run |
| 42 | +let str = "1 turkey costs $30"; |
| 43 | + |
| 44 | +alert( str.match(/(?<=\$)\d+/) ); // 30 (correctly skipped the sole number 1) |
| 45 | +``` |
| 46 | + |
| 47 | +And for the quantity let's use a negative lookbehind `pattern:(?<!\$)\d+`: |
| 48 | + |
| 49 | +```js run |
| 50 | +let str = "2 turkeys cost $60"; |
| 51 | + |
| 52 | +alert( str.match(/(?<!\$)\d+/) ); // 2 (correctly skipped the price) |
| 53 | +``` |
| 54 | + |
| 55 | +## Capture groups |
| 56 | + |
| 57 | +Generally, what's inside the lookaround (a common name for both lookahead and lookbehind) parentheses does not become a part of the match. |
| 58 | + |
| 59 | +But if we want to capture something, that's doable. Just need to wrap that into additional parentheses. |
| 60 | + |
| 61 | +For instance, here the currency `pattern:(€|kr)` is captured, along with the amount: |
| 62 | + |
| 63 | +```js run |
| 64 | +let str = "1 turkey costs 30€"; |
| 65 | +let reg = /\d+(?=(€|kr))/; |
| 66 | + |
| 67 | +alert( str.match(reg) ); // 30, € |
| 68 | +``` |
| 69 | + |
| 70 | +And here's the same for lookbehind: |
| 71 | + |
| 72 | +```js run |
| 73 | +let str = "1 turkey costs $30"; |
| 74 | +let reg = /(?<=(\$|£))\d+/; |
| 75 | + |
| 76 | +alert( str.match(reg) ); // 30, $ |
| 77 | +``` |
| 78 | + |
| 79 | +Please note that for lookbehind the order stays be same, even though lookahead parentheses are before the main pattern. |
| 80 | + |
| 81 | +Usually parentheses are numbered left-to-right, but lookbehind is an exception, it is always captured after the main pattern. So the match for `pattern:\d+` goes in the result first, and then for `pattern:(\$|£)`. |
| 82 | + |
| 83 | + |
| 84 | +## Summary |
| 85 | + |
| 86 | +Lookahead and lookbehind (commonly referred to as "lookaround") are useful for simple regular expressions, when we'd like not to take something into the match depending on the context before/after it. |
| 87 | + |
| 88 | +Sometimes we can do the same manually, that is: match all and filter by context in the loop. Remember, `str.matchAll` and `reg.exec` return matches with `.index` property, so we know where exactly in the text it is. But generally regular expressions can do it better. |
| 89 | + |
| 90 | +Lookaround types: |
| 91 | + |
| 92 | +| Pattern | type | matches | |
| 93 | +|--------------------|------------------|---------| |
| 94 | +| `pattern:x(?=y)` | Positive lookahead | `x` if followed by `y` | |
| 95 | +| `pattern:x(?!y)` | Negative lookahead | `x` if not followed by `y` | |
| 96 | +| `pattern:(?<=y)x` | Positive lookbehind | `x` if after `y` | |
| 97 | +| `pattern:(?<!y)x` | Negative lookbehind | `x` if not after `y` | |
| 98 | + |
| 99 | +Lookahead can also used to disable backtracking. Why that may be needed -- see in the next chapter. |
0 commit comments