Solution to Distinct by codility

18 Mar

March 18, 2014 Sheng 49

Question: https://codility.com/demo/take-sample-test/distinct

Question Name: Distinct

In this question, the expected worst-case time complexity is O(N*log(N)). Thus the designed, or says official, solution would be sort the input and travel them. This solution is practically good. And it is shown as following.

def solution(A):

if len(A) == 0:

distinct = 0

else:

distinct = 1

A.sort()

for index in xrange(1, len(A)):

if A[index] == A[index-1]:

# The same element as the previous one

continue

else:

# A new element

distinct += 1

return distinct

In theory, there is a solution with expected worst-case time complexity being O(N) and worst-case space complexity is O(1). But the coefficients of N and 1 are quite high. And this solution depends on the assumption: “each element of array A is an integer within the range [−1,000,000..1,000,000].” These reasons make it practically bad, while it could get 100/100 here.

#include <stdlib.h>

// A better and portable way is to include <limits.h>

// But I failed to include this head file. I have to hard-code

// it here. Even if the CHAR_BIT is larger than 8, our

// program should works well, except for wasting some space.

// In history, there are some different standard for CHAR_BIT

// http://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html

// http://pubs.opengroup.org/onlinepubs/7908799/xsh/limits.h.html

// But in all cases, CHAR_BIT is at least 8.

#define CHAR_BIT 8

int solution(int A[], int N) {

unsigned char appeared[2000001/CHAR_BIT+1];

// oneInByte is a pre-compute array. onesInByte[i] = j means there

// are j 1s in the binary formation of integer i.

unsigned short int onesInByte[] = {0, 1, 1, 2, 1, 2, 2, 3, 1, 2,

2, 3, 2, 3, 3, 4, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3,

4, 4, 5, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,

2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2,

3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4,

4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3,

4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6,

5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4,

5, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3,

3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4,

5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5,

3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5,

6, 5, 6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6,

6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8};

int index = 0;

int result = 0;

memset(appeared, 0, 2000001/CHAR_BIT+1);

// Record the appeared values

for(index = 0; index < N; ++index) {

appeared[(A[index]+1000000)/CHAR_BIT] |=

(1<<(int)((A[index]+1000000)%CHAR_BIT));

}

// Compute the number of distinct values

for(index = 0; index < 2000001/CHAR_BIT+1; ++index){

result += onesInByte[appeared[index]];

}

return result;

}

Remember not to blindly believe in the Big O.

49 Replies to “Solution to Distinct by codility”

van says:
May 3, 2014 at 4:09 am
I’m just using set. Not sure, that its O(N*log(N)) time but codility showed 100%
1
2
3
4
5
Set set = new HashSet();
for (int i : A) {
set.add(i);
}
return set.size();

Reply
- Sheng says:
  May 4, 2014 at 8:41 pm
  In practice, it is O(N). In theory, the worst case is O(N^2).
  Reply
  - Rajeev says:
    June 13, 2015 at 7:21 am
    I think for a set shouldn’t it be Nlog(N)? As set generally uses BST, hence it stores the elements in a sorted order inside, We don’t care for the ordering so set wouldn’t be an ideal data structure to use..
    If you are aiming for worst case O(N) and average case O(1) you should use unordered_set instead of set. As unordered_set generally uses hashes hence worst case would be O(N) but would guarantee average or best case as O(1)
    Reply
    - Sheng says:
      June 15, 2015 at 11:42 pm
      You are right. I should be more specific. I referred set to hash-implemented set, alos known as unordered_set.
      Thanks a lot for your complement!
      Reply
Pawan Parekh says:
June 12, 2014 at 2:13 pm
len(list(set(A))) will also work….
Reply
- Sheng says:
  June 12, 2014 at 2:47 pm
  Yeah! len(set(A)) works well! Great Python!!!
  Reply
- Martin says:
  July 25, 2014 at 2:56 am
  Just arrived at the same solution. return len(set(A))
  This runtime would be guaranteed to be nlogn in C++, where the set is a RB-Tree. Python uses Hashtables. Where we get into the hashing function discussion…
  Reply
  - Sheng says:
    July 25, 2014 at 8:54 am
    Funny to get some knowledge about the low-level implementation!
    FYI: here is an excellent articel about the choice between RB-Tree and hash table:
    http://programmers.stackexchange.com/questions/234793/why-does-python-use-hash-table-to-implement-dict-but-not-red-black-tree
    Reply
Hari Christian says:
August 1, 2014 at 3:17 am
We are not suppose to use high level API such as SET.
By solving this puzzle we are learn what is the actual code inside SET.
Just my 2 cents.
Reply
- Sheng says:
  August 1, 2014 at 9:06 am
  First of all, thanks for your comment and any discussion is welcome!
  IMO, set is OK. We are not going to programming from scratch. Library, especially the built-in ones, are good for us. Even the assemble language will provide us some macro as high level API. As long as we can finish the task with provided resources, it should be good.
  In addition, I did not use set in my code. Set is typically based on hash table or red black tree. I used sort in the first solution, and bitmap in the second. Bitmap is similar with array, but far from set.
  Reply
nemo says:
August 22, 2014 at 8:05 am
Why so much code? Isn’t that enough?
len({x for x in A})
Reply
- Sheng says:
  August 22, 2014 at 9:08 am
  https://wiki.python.org/moin/TimeComplexity
  You cannot guarantee the worst-case time complexity as required if set or dict is used.
  And for some programmer, who is not using Python, this code is a little hard to understand. len(set(A)) is better.
  Reply

import java.util.Arrays;

import java.util.HashSet;

import java.util.List;

import java.util.Set;

// you can also use imports, for example:

// you can use System.out.println for debugging purposes, e.g.

// System.out.println("this is a debug message");

class Solution {

public int solution(int[] A) {

Set aSet = new HashSet();

for (int i = 0; i < A.length; i++) {

aSet.add(A[i]);

}

return aSet.size();

}

Sheng says:
October 13, 2014 at 9:29 am
In practice, it is O(N). In theory, the worst case is O(N^2).
Reply

amresh says:
November 18, 2014 at 11:22 pm
Here is solution in PHP, please add it to the post
1
2
3
4
5
6
7
function solution($a) {
    // write your code in PHP5.3
    for ($i=0; $i< count($a) ; $i++) {
        $myHash[$a[$i]]  = $a[$i];
    }
    return count($myHash);
}

Reply
- Sheng says:
  November 19, 2014 at 2:22 am
  Thanks for sharing!
  Reply
romanj says:
December 7, 2014 at 1:13 pm
One more PHP solution – it works in similar way but less code
1
2
3
function solution($A) {
return count(array_flip($A));
}

https://codility.com/demo/results/demoT9VNMS-CFK/
Reply
- Sheng says:
  December 7, 2014 at 5:28 pm
  Show time for PHP! I do not know the performance of array_flip() here. But it is really short.
  Thanks!
  Reply
Dan says:
December 30, 2014 at 4:19 pm
For the exact same solution but written in JavaScript I get 66%: https://codility.com/demo/results/demoMKN37U-GUW/
But in C# it gets 100%. What am I missing ?…
Reply
- Henrique says:
  December 31, 2014 at 12:39 pm
  Hi,
  The .sort comparator function should return -1 if a is less than b, 0 if they are equal or 1 if a is greater than b. In your code, you are returning a boolean, turning the result array kind of… unpredictable I think.
  Reply
  - Sheng says:
    January 4, 2015 at 9:25 pm
    OMG!!! This is what I expected. The readers, not only me, help each other.
    Thanks for your awesome answer!
    Reply
Arjun says:
February 18, 2015 at 6:07 am
I GOT 100% in C# it’s Really Short.
But am using inbuild Function here so it is of O(N) Solution i can say.
https://codility.com/demo/results/demoGVEVE9-SY5/
Reply
- Sheng says:
  March 4, 2015 at 1:25 am
  I do not know C#. As far as I see, your solution’s complexity depends on the Distinct() and ToList() functions.
  Anyway, thanks for providing another way.
  Reply

There is another solution using VB.NET code that provides you 100% score in codility Environment.

Private Function solution(A As Integer()) As Integer

Dim size, index, i As Integer

size = A.Length

index = 0

If size = 0 Then

index = 0

Else

Array.Sort(A)

index = 1

For i = 1 To size - 1

If A(i) = A(i - 1) Then

Continue For

Else

index += 1

End If

Return index

End Function

Sheng says:
June 15, 2015 at 11:07 pm
Solution seems good. While the variable name “index” is misleading.
Reply

Hello Sheng,
I love your explanations and proofs. I have been looking your proofs and I’ve been loving them. I addressed this problem by using a dictionary instead, which I believe the worst case complexity is O(N) as I tried to minimize the risk of colliision by checking if the key already exists. Let me know what you think.

public int solution(int[] A) {

if(A== null)

{

return 0;

}

Dictionary<string,int> distinctElement = new Dictionary<string,int>();

for(int i =0 ; i < A.Length ; i++)

{

if(distinctElement.ContainsKey((A[i]+"")))

continue;

distinctElement.Add(A[i]+"",A[i]);

}

return distinctElement.Count;

}

Sheng says:
June 17, 2015 at 10:34 pm
In theory, the worst case of operations on disctionary is O(N), not O(1). Therefore, in worst case, the complexity of your solution is O(N^2).
The good news is: the dictionary is nearly always O(1) in practice.
Reply

Adding a Java O(N) in practice and in theory O(N^2).

import java.util.Set;

import java.util.HashSet;

// https://codility.com/demo/results/demoVS545C-F36/ 100%

public class Distinct {

// testcases

// (1) = 1

// (1,2) = 2

// (2,1,1,2,3,1) = 3

public int solution(int[] A) {

Set<Integer> sparseArray = new HashSet<>();

int distinctCount= 0;

for(int element : A ) {

if (!sparseArray.contains(element) ) {

sparseArray.add(element);

++distinctCount;

}

return distinctCount;

}

I came with a different idea after thinking about the hash collisions and having the boundaries of the problem in mind, I made the assumption of [-1,000,000 to 1,000,000]:

public int solutionWithoutSetCountUntilInputLength(int[] A) {

int length = A.length;

int inputLimit = 1000000;

int[] positives = new int[inputLimit];

int[] negatives = new int[inputLimit]; // should be length - 1 not counting zero

for (int element : A) {

if ( element >=0 ) {

++positives[element];

} else {

int abs = element * -1;

++negatives[abs];

}

int countDistincts = 0;

for (int i: A) {

if (i >= 0 ) {

if ( positives[i] >= 1 ) {

++countDistincts;

positives[i] = 0;

}

} else {

if ( negatives[i * -1] >= 1 ) {

++countDistincts;

negatives[i * -1] = 0;

}

return countDistincts ;

}

it scores 100%
https://codility.com/demo/results/demoMTWUSD-S9M/
I got O(n) or possibly O(n log n) at codility, I think it’s O(n), what do you think?
( Btw the Programming Pearls inspired me this solution).
EDIT by admin: this solution has a bug. See following comment.

After having some concerns of the misused space of my previous solutions I decide to try something different, and I found https://en.wikipedia.org/wiki/Hamming_weight#Language_support
So using BitSet-cardinality() method I could use less space and time to compute the answer.

import java.util.BitSet;

public class Distinct {

public int solution(int[] A) {

// first part for negatives, second part for positives and adding 1

// to count the zero as part of the positives section

int offset = 1_000_000;

BitSet bitSet = new BitSet( (offset * 2) + 1 );

for (int element : A ) {

int index = element >= 0 ? offset + element : (element * -1);

bitSet.set(index);

}

return bitSet.cardinality();

}

100
https://codility.com/demo/results/demoN57YUC-G9Z/
I also used one BitSet since I was reading that BitSet works better for large arrays.
EDIT by admin: this solution has a bug. See following comment.

Sheng says:
August 14, 2015 at 11:20 pm
Still is problematic. For the value and index in your solution:
1
2
3
-1000000 ----- -1  ->  1       -------  1000000
          0        ->  1000000
1 -------- 1000000 ->  1000001 -------  2000000

BUG~
-1000000 and 0 have the same index in the bitset.
Reply

Sheng says:
August 14, 2015 at 11:10 pm
It is O(N). But there IS a bug in your solution: inputLimit should be 1000001 (consider 0 and 1000000).
Reply
- Roberto Gonzalez says:
  August 15, 2015 at 4:23 am
  You’re right, I can’t edit my answer. Great catch!
  Reply
  - Sheng says:
    August 16, 2015 at 5:03 pm
    I appended some warning information. Thanks and enjoy!
    Reply

Hi All,
Thanks for your valuable comments.
I implemented the following O(N) (100/100 ) solution in https://codility.com/demo/results/demoM8HJHM-FR5/

public int solution(int[] A) {

// write your code in C# 6.0 with .NET 4.5 (Mono)

var distinctNumbersCount = 0;

const int minNumber = -1000000;

const int maxNumber = 1000000;

var bitArray = new BitArray(maxNumber - minNumber + 1, false);

foreach (var element in A)

{

if (!bitArray[element + maxNumber])

{

bitArray[element + maxNumber] = true;

distinctNumbersCount += 1;

}

return distinctNumbersCount;

}

, and an O(N log N) solution in https://codility.com/demo/results/demoDZS6YY-QK5/ that uses the array’s sorting strategy.
The performance analysis in Codility shows how the O(N) solution is about 25 ms faster than the O(N log N) in the large tests.
By the way, how can the source code be formatted when posting a comment?
Cheers,
Ernesto

Ernesto Cabrera says:
August 22, 2015 at 10:29 pm
Hi, the following is just another O(N Log N) 100/100 variation based on the idea of sorting the array. https://codility.com/demo/results/demo3ESX6Y-AX7/.
I prefer the O(N) solution posted previously even if the code is not as self-documented as this one.
Cheers,
Ernesto
Reply

Yunus says:
August 23, 2015 at 11:15 am
Hi,
I prefer to use built-in functions. 🙂
1
2
3
def solution(A):
return len(set(A))
pass

Reply
- Nitesh Jaiswal says:
  June 25, 2021 at 5:57 am
  Just 1 thing you need to add as per the question:-
  1
  2
  def solution(A):
  return len(set(A)) if 0 < len(A) <= 100000 else 0
  
  Reply
Bartosz Miller says:
September 12, 2015 at 3:12 pm
JDK 8 for help:
https://codility.com/demo/results/demoQGMNC3-Q8X/
Reply

javascript solution 100%

function solution(a) {

var count = 0;

a.sort(function(a, b){return a-b});

for (var i = 0; i < a.length; i++) {

if (a[i] !== a[i-1]) {

count++;

}

return count++;

}

Another solution for Javascript 100% both

function solution(A) {

// write your code in JavaScript (Node.js 8.9.4)

var diff = {}

for(var value of A) {

if(diff[value] === undefined) {

diff[value] = 1

}

return Object.values(diff).length

}

Codejumper says:
February 27, 2017 at 1:48 pm
1
2
3
4
def solution(A):
    # write your code in Python 2.7
    A=set(A)
    return len(A)

Reply
Spenrose says:
May 16, 2018 at 7:14 am
In Python:
1
2
3
import collections
def solution(A):
return len(collections.Counter(A))

Reply
- Lob says:
  June 28, 2021 at 6:54 pm
  Maybe newer python … not sure when Counter was added. Very fast.
  When I read the instructions I thought … it has to be harder than this … especially after that MinAvgTwoSlice nightmare.
  It does come back as O(N * Log(N)) worst case, O(N) expected. So maybe not fully optimized.
  Reply

I thought of doing something outside of the box of how to count the distinct values using dictionary. The time complexity of my solution was: O(N*log(N)) or O(N)

def solution(A):

distinct = 0

numbers = {}

for n in A:

if not numbers.get(n):

numbers[n] = True

distinct += 1

return distinct

zombie says:
December 22, 2018 at 3:25 pm
In C++:
1
2
3
4
5
6
int solution(vector<int> &A) {
    std::map<int, int> testMap;
    for (auto a : A)
       testMap[a] = a;
    return (int)testMap.size();
}

Reply
Suraj Barailee says:
June 15, 2019 at 7:02 am
Well this is my solution.
1
2
3
4
def solution(A):
    # write your code in Python 3.6
    return len(set(A))
    pass

Reply
Anatolii says:
March 22, 2020 at 4:28 am
My Python code using a Dictionary:
1
2
3
4
5
6
def solution(A):
    distinct = {}
    for value in A:
        distinct[value] = 1

    return len(distinct)

Reply
Jay Bariya says:
January 16, 2023 at 9:15 pm
Use a dictionary or just use a set.
1
2
3
4
5
def solution(A):
    unique_dict = {}
    for int_val in A:
        unique_dict[int_val] = 0
    return len(unique_dict.keys())

Reply

49 Replies to “Solution to Distinct by codility”

Leave a Reply Cancel reply