Solution to Perm-Missing-Elem by codility

16 Jan

Question: http://codility.com/demo/take-sample-test/perm_missing_elem

Question Name: PermMissingElem or PermMissingElement

The main challenge of this question is the XOR operations: X^X=0, and 0^X=X. Logically, the addition and subtraction operations also are able to do this work. But taking the overflow in computer into consideration, they become a very bad choice.

123 Replies to “Solution to Perm-Missing-Elem by codility

  1. Hi Sheng, would you please be so kind to explain in a few more details the main idea behind this solution? I clearly see how it works, but cannot imagine how to come to it (my own solution is pure arithmetic, though the overflow issues gave me some pains at the beginning). Thanks in advance!

    • It is the same as your arithmetic solution. You might use addition and subtraction, because x + a – a = x. Here I used the XOR operation, with x XOR a XOR a = x. One great advantage of XOR over addition and subtraction is that, XOR never lead to overflow. As a result, the XOR operation is widely used in encryption.

  2. Finally got how beautiful XOR is used here. I see it as subtraction and addition done in parallel with the idea that if there were no missing element in A then after the loop we would come up with the xor_sum value equal 0. And it does not matter in which order we add/subtract numbers, they all are there (except one, discovered in the last XOR application). Cool. Thanks again 🙂

    • Another 100/100 solution.
      But I do not support it. Float/Double number will lose some accuracy. In theory, if the sum is big enought, and the missing element is small enought, 0.5*(A.length+1)*(A.length+2) will be the same as sum after rounding. And it might give a wrong result in such cases.
      PS: if you want to post some code, please use <pre> and </pre>, instead of code tag.
      PPS: I removed your comment on Solution to Passing-Cars by codility, because its code is imcomplete and unreadable.

  3. It won’t…just won’t give a wrong result. It was just issue in java, so I used double instead. It should work perfectly.

    • It works well for the test. But passing the tests does not mean it is perfect. Bug is a bug, no matter you found it or not. For example, assuming significant digits in double have length of one, 0.9 E10 – 1 = 0.9 E10
      Using int will lead to overflow.
      Using float/double will lead to wrong answer in rare cases.
      Using BigInteger/BigDecimal works. But it should cost much more/much much more time.

      • Easy, man. Solution works perfectly for task conditions:
        – N is an integer within the range [0..100,000];
        – the elements of A are all distinct;
        – each element of array A is an integer within the range [1..(N + 1)].
        In general case, you are right. Though if it was an issue, I would simply develop another solution for that.

      • By the way, using double in this case is an equivalent of using big integer, as you can see in expression. And could you point me out on that rare cases, that could happen?

        • I did a quick test on Python like:
          >>> N = 200000000
          >>> sum = 0
          >>> for i in xrange(1, N+1): sum += i + 1
          >>> (N+1)*(N+2) // 2.0 – sum
          0.0
          The testing set is the integers from 2 (inclusive) to 200000001 (inclusive). And the integer 1 is missing.

  4. Yeah, you are right:) Sorry for my awkward and unprofessional answers.
    Thank you very much!
    I should study more of it:)

    • You are studying more, when you are doing these practice. Actually, years ago, when I faced this type of questions at the first time, I made the same mistake. So never mind. Solve it correctly once, and you are not going to make the same mistake again.
      Enjoy it!

  5. Here’s my solution in Python:

    • Hello @Dimitri ! Please include your code inside a <pre> … </pre> block, not <code> block.
      This works well with Python. Because Python will automatically change int to long, which could hold as large integer as you want. But for other languages, it may lead to overflow.

    • Is this a solution to oddOccurrences? Coz it does not seem to work. I’m using python 2.7.11 by the way (incase that is what is causing the problem)

  6. Here’s my solution, a bit different approach. It goes through every permutation cycle it can find in the array and sets the values to 0 along the way. If there is an element that is not zeroed, the solution is its index + 1 (zero-indexed). Otherwise the answer is N+1. It should not consider an element more than two times, so complexity is O(N). Code in C#:

  7. Here is my solution in Java…..

    • Thanks for sharing your solution! But there are two minor issues. On one hand, space complexity is not O(1). On the other hand, when the first missing is found, you could simply stop searching and return it.

  8. Hi! My solution is in C. The main idea is to compare the ideal sequence 1..N+1 with the given array. But with a whole sum, it is overflowed. So, I try accumulating the differences between ideal sequence and array for each position. In that way the number holds smaller and no overflows.

    • Overflow is still possible. Consider a big array, with the first three element as
      maxInt-1, maxInt-2, maxInt-3, ……, 3, 2, 1

      • jeje! Yes, you are right, I have also seen, but using the premises of the problem “N in range ([0..100,000])” and using a long int, the probability was low…Any way, is not a safe solution. Any idea without using XOR?
        Thank you Sheng

        • You are more than welcome 🙂 XOR is the best choice as far as I see. If you really hate XOR, BigInt in Java and integer in Python might be a choice.

      • There should be no overflow since the maximum N of the task is 100000 instead of maxInt. According to David’s code, if the array A with size 100000 is in reverse order (ie. [100001, 100000, …, 2]), the minimum value of missedNum can be reached should be -2500050000 ((100001+50002)/2*50000-(1+50000)/2*50000), which is larger than the minimum value of long.

        • Yes, you are right. But traditionally and pratically, XOR is better and more widely used. No mather how larger N is, XOR works perfectly.

  9. Sheng, your solution doesn’t work with codility. It compiles but it fails on some tests:
    Example test: [4, 1, 3, 2]
    WRONG ANSWER (got 5 expected 1)
    Example test: [4, 1, 3]
    WRONG ANSWER (got 2 expected 0)

    • I tired minutes ago, and it passed all the test.
      Are you running the solution in another challenge? “The array contains integers in the range [1..(N + 1)]”. How could the expected answer be 0?

  10. Stuck with the overflow, I saw this article and spent few hours studing about the XOR operations. For your convenience, here goes a short description of what I found:
    First, let’s take a look at the properties of XOR.
    1. XOR is commutative, which means
    X ^ Y = Y ^ X.
    2. XOR is assiciative, i.e.,
    X ^ (Y ^ Z) = (X ^ Y) ^ Z.
    3. For the truth values, as in the article,
    X ^ X = 0, 0 ^ X = X.
    With this properties, we can adapt a summation starategy similiar to the one of algebraic operations.
    Now we will assume a set [1, …, N + 1] (or an array), and evaluate the sum S of the elements by XOR operation. Then,
    S = 1 ^ 2 ^ … ^ N ^ (N +1) ^ 1 ^ 2 ^ … ^ N ^ (N +1).
    This might look wierd, but it has a useful charaterictics, that is;
    S = 1 ^ 2 ^ … ^ N ^ (N +1) ^ 1 ^ 2 ^ … ^ N ^ (N +1)
    = 1 ^ 1 ^ 2 ^ 2 ^ … N ^ N ^ (N + 1) ^ (N + 1)
    = 0.
    Back to our problem, we have an array A[] having N elements in [1, …, N + 1] and a missing elelent. Let the missing element M, then
    A[1] ^ A[2] ^ … ^ A[N] ^ M = 1 ^ 2 ^ … ^ N ^ (N +1),
    because any permutation in the array eventually holds all elements.
    Now the preperation for the answer is almost complete. To find M, we will modify the sum slightly.
    S = 1 ^ 2 ^ … ^ N ^ (N +1) ^ 1 ^ 2 ^ … ^ N ^ (N +1)
    = A[1] ^ A[2] ^ … ^ A[N] ^ M ^ 1 ^ 2 ^ … ^ N ^ (N +1)
    = (A[1] ^ A[2] ^ … ^ A[N] ^ 1 ^ 2 ^ … ^ N) ^ M ^(N +1)
    = (A[1] ^ 1 ^ A[2] ^ 2 ^ … ^ A[N] ^ N) ^ M ^ (N + 1).
    Since S = 0, we can get M as
    M = S ^ (A[1] ^ 1 ^ A[2] ^ 2 ^ … ^ A[N] ^ N) ^ (N + 1)
    = (A[1] ^ 1 ^ A[2] ^ 2 ^ … ^ A[N] ^ N) ^ (N + 1).
    And finally, this is my personal understanding of the solution and I do not garantee its correctness. Thanks for reminding the XOR operations.

  11. For those who prefer arithmetic solution to XOR (including myself :), I thought a little more on the behavior of XOR.
    Assume a set [1, …, N + 1], an array A[], and a missing element M, as previously.
    The arithmetic solution to the problem is very intuitive and seems much preferable to many of us. That is,
    M = (1 + 2 + … + N + (N + 1)) – (A[1] + A[2] + … + A[N]),
    which came from
    (1 + 2 + … + N + (N + 1)) – (A[1] + A[2] + … + A[N] + M) = 0.
    The reason I was uncomfortable with XOR is this. When I add or subtract some number, I can see certainly what I am doing, just like addition is DOing something and subtraction is UNDOing something. But in XOR operations, things didn’t look so clearly.
    However, I found that XOR solution is basically not different from arithmetic one. In fact, we can almost consider XOR as normal arithmetic operation like this:
    r = p ^ q <=> r = p + q (mod 2).
    What about the subtraction? In mod 2 system, subtraction is same as addition because
    p – q = p – q + 2q = p + q (mod 2).
    Don’t be confused by the modulo system (just like exactly what I did), because XOR operation is essentially bit-wise operation.
    After all, if we consider addition and subtraction as DOing-UNDOing something in arithmetic operations, the same happens in XOR operations. We do something by XORing(adding) and also undo something by XORing(subtracting at this time).
    Now we can see M in the XOR way. Like before, to get M, we will subtract the sum of the array A[] from the sum of the set [1, …, N + 1]. Then
    M = sum_of([1, …, N + 1]) minus sum_of(A[]),
    which becomes
    M = (1 ^ 2 ^ … ^ N ^ (N + 1)) ^ (A[1] ^ A[2] ^ … ^ A[N]).
    If you need M as the form described in the article, we can rearrange it like this as we did before;
    M = (A[1] ^ 1 ^ A[2] ^ 2 ^ … ^ A[N] ^ N) ^ (N + 1).
    PS: Sorry for my poor english. It’s not my mother tongue.

      • Hi Sheng!
        I can’t get the idea of using XOR in this task. If we have, let’s say, 5^4^7^6^5 what is DOing and what is UNDOing something. I know how XOR works on bits but don’t get the idea of summing something with XOR. We have 1^2 = 3 so something like adding then 3^3 = 0 so subtracting?

        • Sorry for spamming. I have already figured it out. The solution is hard to understand. I always come here to see your solutions after submiting mine. I can always learn something new.

        • Hello Julek,
          It is actuallly nothing about adding/subtracting. In this problem, in range 0 to N – 1, we have input[i] = i + 1, except one number (to say M) is missing. The M’s index should be M – 1.
          When we XOR every (index + 1) and its value, we are doing (the order does not change the XOR result):
          input[0] XOR 1 XOR input[1] XOR 2 XOR … XOR input[N-1] XOR N
          that is (with nput[i] = i + 1):
          1 XOR 1 XOR 2 XOR 2 XOR 3 XOR 3 XOR … XOR N XOR N
          Because any for any number X, X XOR X = 0, and X XOR 0 = X. The final result of previous equation is:
          0 XOR 0 XOR 0 … XOR M XOR … XOR 0 = M.
          Finally, we get the missing number M.

  12. Hello,
    Ca you please tell why my solution giving just 40% correctness though its 80% in Performance.

    Please suggest?

    • First of all, thanks for visiting my blog. But sorry I cannot help you to debug.
      Some suggestions are here:
      1. b = A[-1:][0] has a better form: b = A[-1], quick and simply.
      2. No need to sort the input in O(NlogN). Use b = max(A) instead.
      3. Your solution easily leads to overflow of A_sum and b_sum in most programming languages. Python could handle with big integers well. But the operations on big int is expensive.

  13. I solved it in C++ with another solution, using a logic vector instead. I think its an advantage, because no big numbers are possible. But it uses two for loops-one for filling the logic and one to analyse it.

  14. After I have come up with this version, I could totally understand how you came up with the XOR version as Kim explained it.
    (starting to write algo in Python. What a handy tool it is.)

    • Thanks for your sharing! Another 100/100 solution. But, as a matter of fact, your code is basically same as David said earlier, hence same comments on the overflow hold.

      • Python can manage the big integer automatically. So there is no overflow in this piece of code. However, if the integer is too big, Python needs lots of memory and it is time-consuming.
        It is strongly recommended NOT to use this solution.

  15. @Sheng Thanks.
    JavaScript

  16. This is my solution in Pascal (score: 100 of 100):

      • I am planning on posting more solutions in pascal. I just discovered your blog, and I like it :), so you’ll be seeing other posts from me. I am going to try to find the right pascal code for each of your python solutions

  17. 100/100 solution

  18. JavaScript 100%:

  19. As a programmer i never used XOR in my code. With your example i was able to get into the details of XOR, Thanks for that. But as i was trying to test your code i found that it seems to fail for following input:
    Your test case: [5, 6, 4, 8]
    Output (stderr):
    WARNING: producing output may seriously slow down your code!
    Output:
    Missing number: 14
    Returned value: 14
    I used Java to run your code.

    • Hello Ashwai, the problem says: “The array contains integers in the range [1..(N + 1)], which means that exactly one element is missing.”. Therefore your test case is invalid.

  20. Hi Sheng and the rest of you guys,
    I’m trying to port your XOR code to Java, I wrote this

    and I get array a java.lang.ArrayIndexOutOfBoundsException…
    Can you help? Thanks!

  21. My solution is not O(N) but O(N*log(N)) but it is still 100% on codility.

  22. so especially the last case, should it not return either 4 or 5 instead of 7?
    or it is designed for just a single missing number and not more than a single missing number? and if there is no missing number then it simply returns the next number in sequence?

    • The third testing case is wrong. Please double read the question. A possible testing case is [2, 3, 1, 4] or [2, 3, 1, 5]. In your testing case, there are two missing integers as 4 and 5.

      • Could someone explain to me why the test is ignoring real world approach to problem solving?I think it should return 4 and 5 if they are both missing in a list! Now i see why it takes time solving this problems. The problem is solved but not what codility wants!
        Below return 4 but cant return more than 1 missing number

        • You solution is wrong. Please try [1].
          I ignored the real-world approach, because this is the 0/1 world 🙂 The XOR solution is the most space- and time- efficient solution.

  23. Figured I’d throw out this solution. Performance is essentially identical to yours but it was a little easier to read, at least for me.

    • Thansk for sharing. However, addition/substration may not be a good idea, if we consider overflow.
      In addition:
      1. sum(A[:]) is euqal to sum(A)
      2. sum(range(len(A)+1)) could be optimized to sum(xrange(len(A)+1))
      3. your solution could be simplified to:

      Thanks!

  24. This gives you all the missing numbers. Thoughts please!

    def solution(A):
        #declare variables
        C=[]
        e=0
        #Assign Variables
        endindex = max(A) + 1
        #create my list that is correct pattern
        B = [x for x in range(1, endindex)]
        # sort the list
        A.sort()
        if len(A) == 0:
            return 0
        elif len(A) == 1 and A[0] == 1:
            return A[0]+1
        else:
            #iterate the two lists to find missing number.
            while e < len(B):
                #check of item is equal to item
                if A[e] != B[e]:
                    #insert number if missing to adjust the index
                    #for next iteration otherwise, result will have error
                    A.insert(e, B[e])
                    #collect missing number into a list
                    C.append(B[e])
                    print B[e]
                    #return B[e]
                e+=1
    
  25. My “solution” is definitely not as neat as yours. My question is: does “upward casting to long” have any flaw here? I am aware of two: 1. performance hit, it could be slower than bit operation; 2. it’s not portable since long could be 8 bytes on gcc, but 4 bytes on VS, although both of them are on 64-bit platform. Any other concern?
    Thanks!

    • accumulate(A.begin(),A.end(),0L) may cause overflow (result is too big to fit in an integer).
      upward casting usually is fine. But we cannot guarantee it is upward. It is possible that static_cast is a downward casting, when long is smaller than size_t.

      • Thank you for the timely reply!
        You may not notice that the third argument in accumulate is ‘0L’. According to C++ spec, accumulate, as a template function, takes the type on the third argument as the type for the returning value. So, I guess it’s just fine. If I miss anything, please let me know.
        Your concern on the cast between long and size_t is legit. size_t’s type is platform dependent for sure. if long is 4 bytes and size_t is 8 bytes (windows), the static_cast from size_t to long is downward. you may notice I checked the size of long in my source code, just in case. Well, codility probably not using Windows compiler.
        It makes me curious: is bitwise operation portable?

        • No problem. You are welcome!
          0L means signed long, on some platform (32bits for long), the maximum positive value is 2147483647. While the “N is an integer within the range [0..100,000];”, in the worst case, accumulate return 5000050000, which is too large to fit in a signed long integer.
          Bitwise operation is portable. And it is the expected solution here.

  26. Thanks! That’s what I pointed out as well. 🙂 Well, I guess if the upward casting is a must have, in this case, long long might be a better choice since it’s guaranteed to be 8 bytes on all platforms I am aware of.
    BTW, do you mind sharing how you learnt these bitwise tricks? I know “bitwise hacks”, but I will never be able to remember them or feel confident to apply them on any coding test.
    Thanks!

    • Long long works here. But bitwise solution is still preferred, because it is universal without overflow.
      In terms of bitwise operations, you have to use it many times to remember it. If you studied security or hardware, you have more chance to learn it.

  27. A little modernization for the primary code.

    No last XOR required, you can just initialize xor_sum with: length + 1

  28. my 100% C# solution

  29. My Java solution (100%)

  30. This solution give me 100/100 in c#

  31. My contribution in C#..

  32. I see how XOR would work but seems completely unreadable and over-engineered to me compared to the below 1-liner in Python (100% score):

  33. Hi Shang,
    I’m loving your blog and learning much, thank you!
    My arithmetic version equals or outperforms the XOR version in time performance for every test. I see that you specifically state “overflow” as being the catch on why XOR is better. Perhaps since in this test N is is limited to 100,000, the arithmetic version is better? And then maybe if N were allowed to go up to much higher limit (millions or billions), the XOR version would be safer to use? Thank you again!
    Javascript
    XOR version

    Arithmetic Version

  34. Ah, I see now.. the repetitive calling of array length was slowing down the XOR version from my last comment. This one now outperforms the arithmetic version.
    Javascript

    • Great to see that you got the solution by yourself 🙂 I would prefer the XOR solution, because:
      1. Both solutions are O(N) time complexity. [A.reduce((a,b)=>{return a + b}) is O(N)].
      2. XOR solution woks all the time, no matter what is the N’s limit.
      Thanks!

  35. This is my solution.
    Is there any advanatge using your XOR solution or mine is fine as well?

  36. I’ve solved it 100% in Java, but with an additional HashSet (i.e. O(N) space and O(N) time complexity) as follow:

    Can someone explains how XORing will resolve this?

    Many thanks 🙂

  37. I did this on sets, also works.

  38. 100%, Still think the XOR method better though.

  39. Here is code for PHP that I porting from OP
    100/100

  40. Hi Shen,
    I am using c# and Linq, however this is not giving me 100% result. Can any one explain what am I doing wrong. My understanding is that there’s is only one element that is not paired and rest of the elements are always paired. Shouldn’t the following give me accurate results?

    Appreciate your response on this

  41. Thank you for your solution with xor! Nice touch about the extra information that xor never overflows. That way we can do the “summing” without worrying about using long or things like that.

    BTW a solution with java and xor(scored 100%):

  42. This is my solution, would you please comment on this?

  43. My 100% java solution:

  44. Here’s a C# solution (100%) using a hashset to record the numbers that have been found. Once a matching pair is found the number is removed so by the end we’re left with a single entry containing the unmatched number:

  45. My Python implementation:

  46. Wow what an awesome place to get info. below is what i came up with (v slowly)
    I have tried to get my small brain around XOR and it just hurts! so this is my simple way in c#. trying to learn so very happy to be torn apart. it got 100%

    I tried to be smart:

    but that only gave me 80% which suprised me.

  47. I made it like this:

Leave a Reply to vi Cancel reply

Your email address will not be published. Required fields are marked *

Please put your code into a <pre>YOUR CODE</pre> section. Thanks and Happy Coding!