Solution to Distinct by codility

18 Mar

Question: https://codility.com/demo/take-sample-test/distinct

Question Name: Distinct

In this question, the expected worst-case time complexity is O(N*log(N)). Thus the designed, or says official, solution would be sort the input and travel them. This solution is practically good. And it is shown as following.

In theory, there is a solution with expected worst-case time complexity being O(N) and worst-case space complexity is O(1). But the coefficients of N and 1 are quite high. And this solution depends on the assumption: “each element of array A is an integer within the range [−1,000,000..1,000,000].” These reasons make it practically bad, while it could get 100/100 here.

 Remember not to blindly believe in the Big O.

49 Replies to “Solution to Distinct by codility

  1. I’m just using set. Not sure, that its O(N*log(N)) time but codility showed 100%

      • I think for a set shouldn’t it be Nlog(N)? As set generally uses BST, hence it stores the elements in a sorted order inside, We don’t care for the ordering so set wouldn’t be an ideal data structure to use..
        If you are aiming for worst case O(N) and average case O(1) you should use unordered_set instead of set. As unordered_set generally uses hashes hence worst case would be O(N) but would guarantee average or best case as O(1)

        • You are right. I should be more specific. I referred set to hash-implemented set, alos known as unordered_set.
          Thanks a lot for your complement!

    • First of all, thanks for your comment and any discussion is welcome!
      IMO, set is OK. We are not going to programming from scratch. Library, especially the built-in ones, are good for us. Even the assemble language will provide us some macro as high level API. As long as we can finish the task with provided resources, it should be good.
      In addition, I did not use set in my code. Set is typically based on hash table or red black tree. I used sort in the first solution, and bitmap in the second. Bitmap is similar with array, but far from set.

  2. Here is solution in PHP, please add it to the post

    • Hi,
      The .sort comparator function should return -1 if a is less than b, 0 if they are equal or 1 if a is greater than b. In your code, you are returning a boolean, turning the result array kind of… unpredictable I think.

    • I do not know C#. As far as I see, your solution’s complexity depends on the Distinct() and ToList() functions.
      Anyway, thanks for providing another way.

  3. There is another solution using VB.NET code that provides you 100% score in codility Environment.

  4. Hello Sheng,
    I love your explanations and proofs. I have been looking your proofs and I’ve been loving them. I addressed this problem by using a dictionary instead, which I believe the worst case complexity is O(N) as I tried to minimize the risk of colliision by checking if the key already exists. Let me know what you think.

    • In theory, the worst case of operations on disctionary is O(N), not O(1). Therefore, in worst case, the complexity of your solution is O(N^2).
      The good news is: the dictionary is nearly always O(1) in practice.

  5. Adding a Java O(N) in practice and in theory O(N^2).

  6. I came with a different idea after thinking about the hash collisions and having the boundaries of the problem in mind, I made the assumption of [-1,000,000 to 1,000,000]:

    it scores 100%
    https://codility.com/demo/results/demoMTWUSD-S9M/
    I got O(n) or possibly O(n log n) at codility, I think it’s O(n), what do you think?
    ( Btw the Programming Pearls inspired me this solution).
    EDIT by admin: this solution has a bug. See following comment.

    • After having some concerns of the misused space of my previous solutions I decide to try something different, and I found https://en.wikipedia.org/wiki/Hamming_weight#Language_support
      So using BitSet-cardinality() method I could use less space and time to compute the answer.

      100
      https://codility.com/demo/results/demoN57YUC-G9Z/
      I also used one BitSet since I was reading that BitSet works better for large arrays.
      EDIT by admin: this solution has a bug. See following comment.

      • Still is problematic. For the value and index in your solution:

        BUG~
        -1000000 and 0 have the same index in the bitset.

  7. Hi All,
    Thanks for your valuable comments.
    I implemented the following O(N) (100/100 ) solution in https://codility.com/demo/results/demoM8HJHM-FR5/

    , and an O(N log N) solution in https://codility.com/demo/results/demoDZS6YY-QK5/ that uses the array’s sorting strategy.
    The performance analysis in Codility shows how the O(N) solution is about 25 ms faster than the O(N log N) in the large tests.
    By the way, how can the source code be formatted when posting a comment?
    Cheers,
    Ernesto

    • Just 1 thing you need to add as per the question:-

  8. javascript solution 100%

    • Another solution for Javascript 100% both

    • Maybe newer python … not sure when Counter was added. Very fast.

      When I read the instructions I thought … it has to be harder than this … especially after that MinAvgTwoSlice nightmare.

      It does come back as O(N * Log(N)) worst case, O(N) expected. So maybe not fully optimized.

  9. I thought of doing something outside of the box of how to count the distinct values using dictionary. The time complexity of my solution was: O(N*log(N)) or O(N)

  10. In C++:

  11. Well this is my solution.

  12. My Python code using a Dictionary:

  13. Use a dictionary or just use a set.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please put your code into a <pre>YOUR CODE</pre> section. Thanks and Happy Coding!