Question: https://codility.com/demo/take-sample-test/find_three
Question Name: helium2013 or Find-Three or FindThree
For this question, we have two similar but different ways to solve it. The first solution uses the border array algorithm. And the other one is with the Z algorithm for the longest common prefix. After getting the border array or prefix array, the left work seems quite similar. We try every possible border, with length being equal to or less than one third of the original string’s length, from the longest to the shortest. If we find a non-overlapping occurrence of the trying border in the middle, we return the length of that border as the final result.
The solution with border array is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | def borderArray(S): # Get the traditional border array of S. Did not consider # the overlapping issue here. borders = [0] * len(S) for index in xrange(1, len(S)): borderLen = borders[index-1] while borderLen != 0 and S[borderLen] != S[index]: borderLen = borders[borderLen-1] if S[borderLen] == S[index]: borders[index] = borderLen + 1 else: borders[index] = 0 return borders def countThreeBorder(borders): # Compute the longest border, which have at least three # non-overlapping occurrences. # # Input: border array of the original string, to say S. # borders[i] means the longest border of the sub-string # ending at i, that is S[0:i+1]. For example, for the # string "ababab", the border array would be: # [0, 0, 1, 2, 3, 4] # # Method: we will traverse all borders, to say currently # trying border, whose length is equal to or shorter than # one third of the string's length, from longest to # shortest. And we check whether they appear in the middle # of the original string. To distinguish its three # occurrences, we call them head-border, mid-border, # and tail-border. longest = borders[-1] # The longest border, which appears # in the head and tail. Did not consider # the overlapping issue here. # Because the borders must appear three time or more, and there is # no overlapping among them, the maximum length of qualified border # must be equal to or shorter than one third of the string's length. while longest * 3 > len(borders): longest = borders[longest-1] # These two bounds define the searched middle range. leftBound, rightBound = len(borders)/2, len(borders)/2 # The following two arrays store our temporary searching results. # existMid[i]: there is one or more non-overlapping borders in the # mid of the string (mid-border), with length of i. # triedMid[i]: we have met with one or more borders in the mid of # of the string, with length of i. These borders might # be qualified (non-overlapping) or not. existMid, triedMid = [False] * len(borders), [False] * len(borders) existMid[0], triedMid[0] = True, True while longest > 0: # Extend the searching range frpm rightmost to leftmost. After # extension, the searched range for mid-border is from longest*2-1 # to len(borders)-longest-1. # The left bound guarantees: the mid-borders, whose lengths are # equal to or less than that of the currently trying border, # will NEVER be overlapped with the head-border. # The right bound guarantees the case of the tail-border. for index in range(len(borders)-longest-1, rightBound,- 1) +range(leftBound, longest*2-2, -1): borderLen = borders[index] # KEY POINT: if triedMid[borderLen] is true, we met with # the mid-border with the same length before. So there are # two possible cases: # # existMid[borderLen] is true: it was a non-overlapping # mid-border. And we also tried all its shorter possibilities. # No need to re-scan this kind of borders. Because we just # need to know whether there are some non-overlapping in # the middle range, rather than the location. A re-scan # would not change the existMid records. # # existMid[borderLen] is false: it was an overlapping # mid-border. It MAY become a non-overlapping mid-border in # our new range. *BUT* at the time of being overlapped, its # length must be larger than the corresponding trying border's # length at that time [SEE the document before for loop]. # And the length of trying border is STRICTLY decreasing. # So the length of this mid-border must be longer than the # currently trying border. Even if it become a non-overlapping # mid-border, it could NEVER be one of the three occurrences # of the current and remaining trying borders, because of its # unmatched length. In other words, even though rescanning # it might change the existMid records, these changes are # USELESS in the checking "if existMid[longest]". # # In sum, in both cases, we do not need to rescan it. while not triedMid[borderLen]: # For any mid-border with un-tried length, we will # try it and all its shorter possibilities. triedMid[borderLen] = True if borderLen-1 < index - borderLen + 1: # Not overlapped with the head-border existMid[borderLen] = True borderLen = borders[borderLen-1] # Shorter border # Check whether current border is quanlified for this question. # That is, this border appears in both head and tail, AND # appears in the middle without overlapping. if existMid[longest]: break # Update the bounds and the trying border for the next round. leftBound, rightBound = longest*2-1, len(borders)-longest-1 longest = borders[longest-1] return longest def solution(S): if len(S) < 3: # If length of S is less than 3, it could never have # the wanted border. result = 0 else: borders = borderArray(S) result = countThreeBorder(borders) return result |
I made much less comments on the solution with Z algorithm. If you really fully understand the first solution, you could easily understand this one without any comment. But if you are still confused by the first one, the comments are unhelpful in the second solution. The solution with Z algorithm is:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | def prefixArray(S): # Method: Z algorithm # prefixes[i] is the length of the longest common prefix of # string S and S[i:]. prefixes = [0] * len(S) left, right = -1, -1 for index in xrange(1, len(S)): if right > index: if prefixes[index-left] < right - index: # Already computed prefixes[index] = prefixes[index-left] else: # May extend offset = right - index while index+offset < len(S) and S[index+offset] == S[offset]: offset += 1 left, right = index, index + offset prefixes[index] = offset else: # Have to compute from scratch offset = 0 while index+offset < len(S) and S[offset] == S[index+offset]: offset += 1 left, right = index, index + offset prefixes[index] = offset return prefixes def countThreeBorder(prefixes): longest = len(prefixes)/3 longestMid = 0 leftBound, rightBound = len(prefixes)/2, len(prefixes)/2 while longest > 0: if prefixes[len(prefixes)-longest] != longest: # Not a border longest -= 1 continue # Find the longest prefixes in the middle range. for index in range(len(prefixes)-2*longest, rightBound-1,- 1) +range(leftBound, longest-1, -1): longestMid = max(longestMid, prefixes[index]) if longestMid >= longest: # The border appears three times, without overlapping break else: # Update the values for next trying leftBound = longest-1 rightBound = len(prefixes)-2*longest + 1 longest -= 1 return longest def solution(S): prefixes = prefixArray(S) result = countThreeBorder(prefixes) return result |