# Solution to Distinct by codility

18 Mar

Question Name: Distinct

In this question, the expected worst-case time complexity is O(N*log(N)). Thus the designed, or says official, solution would be sort the input and travel them. This solution is practically good. And it is shown as following.

In theory, there is a solution with expected worst-case time complexity being O(N) and worst-case space complexity is O(1). But the coefficients of N and 1 are quite high. And this solution depends on the assumption: “each element of array A is an integer within the range [−1,000,000..1,000,000].” These reasons make it practically bad, while it could get 100/100 here.

Remember not to blindly believe in the Big O.

### 48 Replies to “Solution to Distinct by codility”

1. van says:

I’m just using set. Not sure, that its O(N*log(N)) time but codility showed 100%

• Sheng says:

In practice, it is O(N). In theory, the worst case is O(N^2).

• Rajeev says:

I think for a set shouldn’t it be Nlog(N)? As set generally uses BST, hence it stores the elements in a sorted order inside, We don’t care for the ordering so set wouldn’t be an ideal data structure to use..
If you are aiming for worst case O(N) and average case O(1) you should use unordered_set instead of set. As unordered_set generally uses hashes hence worst case would be O(N) but would guarantee average or best case as O(1)

• Sheng says:

You are right. I should be more specific. I referred set to hash-implemented set, alos known as unordered_set.
Thanks a lot for your complement!

2. Pawan Parekh says:

len(list(set(A))) will also work….

3. We are not suppose to use high level API such as SET.
By solving this puzzle we are learn what is the actual code inside SET.
Just my 2 cents.

• Sheng says:

First of all, thanks for your comment and any discussion is welcome!
IMO, set is OK. We are not going to programming from scratch. Library, especially the built-in ones, are good for us. Even the assemble language will provide us some macro as high level API. As long as we can finish the task with provided resources, it should be good.
In addition, I did not use set in my code. Set is typically based on hash table or red black tree. I used sort in the first solution, and bitmap in the second. Bitmap is similar with array, but far from set.

4. nemo says:

Why so much code? Isn’t that enough?
len({x for x in A})

• Sheng says:

https://wiki.python.org/moin/TimeComplexity
You cannot guarantee the worst-case time complexity as required if set or dict is used.
And for some programmer, who is not using Python, this code is a little hard to understand. len(set(A)) is better.

5. Mithun says:

• Sheng says:

In practice, it is O(N). In theory, the worst case is O(N^2).

6. amresh says:

• Sheng says:

Thanks for sharing!

• Sheng says:

Show time for PHP! I do not know the performance of array_flip() here. But it is really short.
Thanks!

• Henrique says:

Hi,
The .sort comparator function should return -1 if a is less than b, 0 if they are equal or 1 if a is greater than b. In your code, you are returning a boolean, turning the result array kind of… unpredictable I think.

• Sheng says:

OMG!!! This is what I expected. The readers, not only me, help each other.

• Sheng says:

I do not know C#. As far as I see, your solution’s complexity depends on the Distinct() and ToList() functions.
Anyway, thanks for providing another way.

7. There is another solution using VB.NET code that provides you 100% score in codility Environment.

• Sheng says:

Solution seems good. While the variable name “index” is misleading.

8. M. Haid says:

Hello Sheng,
I love your explanations and proofs. I have been looking your proofs and I’ve been loving them. I addressed this problem by using a dictionary instead, which I believe the worst case complexity is O(N) as I tried to minimize the risk of colliision by checking if the key already exists. Let me know what you think.

• Sheng says:

In theory, the worst case of operations on disctionary is O(N), not O(1). Therefore, in worst case, the complexity of your solution is O(N^2).
The good news is: the dictionary is nearly always O(1) in practice.

9. Adding a Java O(N) in practice and in theory O(N^2).

10. I came with a different idea after thinking about the hash collisions and having the boundaries of the problem in mind, I made the assumption of [-1,000,000 to 1,000,000]:

it scores 100%
https://codility.com/demo/results/demoMTWUSD-S9M/
I got O(n) or possibly O(n log n) at codility, I think it’s O(n), what do you think?
( Btw the Programming Pearls inspired me this solution).
EDIT by admin: this solution has a bug. See following comment.

• After having some concerns of the misused space of my previous solutions I decide to try something different, and I found https://en.wikipedia.org/wiki/Hamming_weight#Language_support
So using BitSet-cardinality() method I could use less space and time to compute the answer.

100
https://codility.com/demo/results/demoN57YUC-G9Z/
I also used one BitSet since I was reading that BitSet works better for large arrays.
EDIT by admin: this solution has a bug. See following comment.

• Sheng says:

Still is problematic. For the value and index in your solution:

BUG~
-1000000 and 0 have the same index in the bitset.

• Sheng says:

It is O(N). But there IS a bug in your solution: inputLimit should be 1000001 (consider 0 and 1000000).

• Sheng says:

I appended some warning information. Thanks and enjoy!

11. Ernesto Cabrera says:

Hi All,
I implemented the following O(N) (100/100 ) solution in https://codility.com/demo/results/demoM8HJHM-FR5/

, and an O(N log N) solution in https://codility.com/demo/results/demoDZS6YY-QK5/ that uses the array’s sorting strategy.
The performance analysis in Codility shows how the O(N) solution is about 25 ms faster than the O(N log N) in the large tests.
By the way, how can the source code be formatted when posting a comment?
Cheers,
Ernesto

• Ernesto Cabrera says:

Hi, the following is just another O(N Log N) 100/100 variation based on the idea of sorting the array. https://codility.com/demo/results/demo3ESX6Y-AX7/.
I prefer the O(N) solution posted previously even if the code is not as self-documented as this one.
Cheers,
Ernesto

12. Yunus says:

Hi,
I prefer to use built-in functions. 🙂

• Nitesh Jaiswal says:

Just 1 thing you need to add as per the question:-

13. Manuel says:

javascript solution 100%

• Another solution for Javascript 100% both

14. Codejumper says:

15. Spenrose says:

In Python:

• Lob says:

Maybe newer python … not sure when Counter was added. Very fast.

When I read the instructions I thought … it has to be harder than this … especially after that MinAvgTwoSlice nightmare.

It does come back as O(N * Log(N)) worst case, O(N) expected. So maybe not fully optimized.

16. I thought of doing something outside of the box of how to count the distinct values using dictionary. The time complexity of my solution was: O(N*log(N)) or O(N)

17. zombie says:

In C++:

18. Suraj Barailee says:

Well this is my solution.

19. Anatolii says:

My Python code using a Dictionary: