|
| 1 | +Daily Coding Problem #25 |
| 2 | +Problem |
| 3 | +This problem was asked by Facebook. |
| 4 | + |
| 5 | +Implement regular expression matching with the following special characters: |
| 6 | + |
| 7 | +. (period) which matches any single character |
| 8 | +* (asterisk) which matches zero or more of the preceding element |
| 9 | +That is, implement a function that takes in a string and a valid regular expression and returns whether or not the string matches the regular expression. |
| 10 | + |
| 11 | +For example, given the regular expression "ra." and the string "ray", your function should return true. The same regular expression on the string "raymond" should return false. |
| 12 | + |
| 13 | +Given the regular expression ".*at" and the string "chat", your function should return true. The same regular expression on the string "chats" should return false. |
| 14 | + |
| 15 | +Solution |
| 16 | +This problem should strike you as recursive. The string should match the regex if we can match the head of the string with the head of the regex and the rest of the string with the rest of the regex. The special characters . and * make implementing this a bit trickier, however, since the * means we can match 0 or any number of characters in the beginning. |
| 17 | + |
| 18 | +The basic idea then is to do the following. Let's call the string we want to match s and the regex r. |
| 19 | + |
| 20 | +Base case: if r is empty, then return whether s is empty or not. |
| 21 | +Otherwise, if the first thing in r is not proceeded by a *, then match the first character of both r and s, and if they match, return match(r[1:], s[1:]). If they don't, then return false. |
| 22 | +If the first thing in r _is_ proceeded by a *, then try every suffix substring of s on r[2:] and return true if any suffix substring works. |
| 23 | +The code should look something like this: |
| 24 | + |
| 25 | +def matches_first_char(s, r): |
| 26 | + return s[0] == r[0] or (r[0] == '.' and len(s) > 0) |
| 27 | + |
| 28 | +def matches(s, r): |
| 29 | + if r == '': |
| 30 | + return s == '' |
| 31 | + |
| 32 | + if len(r) == 1 or r[1] != '*': |
| 33 | + # The first character in the regex is not proceeded by a *. |
| 34 | + if matches_first_char(s, r): |
| 35 | + return matches(s[1:], r[1:]) |
| 36 | + else: |
| 37 | + return False |
| 38 | + else: |
| 39 | + # The first character is proceeded by a *. |
| 40 | + # First, try zero length. |
| 41 | + if matches(s, r[2:]): |
| 42 | + return True |
| 43 | + # If that doesn't match straight away, then try globbing more prefixes |
| 44 | + # until the first character of the string doesn't match anymore. |
| 45 | + i = 0 |
| 46 | + while matches_first_char(s[i:], r): |
| 47 | + if matches(s[i+1:], r[2:]): |
| 48 | + return True |
| 49 | + i += 1 |
| 50 | +This takes O(len(s) * len(r)) time and space, since we potentially need to iterate over each suffix substring again for each character. |
| 51 | + |
| 52 | +Fun fact: Stephen Kleene introduced the * operator in regular expressions and as such, it is sometimes referred to as the Kleene star. |
0 commit comments