Question: https://codility.com/demo/take-sample-test/missing_integer

Question Name: Missing-Integer or MissingInteger

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | def solution(A): ''' Solve it with Pigeonhole principle. There are N integers in the input. So for the first N+1 positive integers, at least one of them must be missing. ''' # We only care about the first N+1 positive integers. # occurrence[i] is for the integer i+1. occurrence = [False] * (len(A) + 1) for item in A: if 1 <= item <= len(A) + 1: occurrence[item-1] = True # Find out the missing minimal positive integer. for index in xrange(len(A) + 1): if occurrence[index] == False: return index + 1 raise Exception("Should never be here.") return -1 |

Hi!

Here is a shorter version:

it is not clear from the problem description wether the elements of the array must be <= N.

In case they can take any value within the range it can be changed to:

Thanks for sharing you solutions!

Here, the set should works the same as frozenset, right?

Hi, I think this solution is wrong. If I pass A = [99999999], I get 100000000. Answer should be 1

Yes, you are right. The solution has bug. Please try to replace “start = 0 if smallest < len(a) else smallest" with "start = 0".

Thanks for your good catch!

Yep, set and frozenset works exact same way, frozenset is immutable.

Great! THX!

Hi, here is my solution in PHP 100/100

The same solution ported to Python

Hi Ricardo, your solution is right. But the expected worst-case time complexity is O(N). In your solution, you sorted the input array first, which is O(NlogN).

PS: to post the code, please include your code inside a pre block, instead of code block, like <pre> code </pre>

Thanks for letting me know it Sheng,

When I started the exercise I was concerned about sorting the array (as a small developer, Big-O notation never was my biggest concern, until now) but then I got surprised by the result on Codility. They detected a time complexity of O(N).

Do you know why is that?

For sort() PHP uses a variation of Quicksort, so shouln’t be O(NlogN) as you stated?

Thanks

For the general sorting algorithm in practice (http://en.wikipedia.org/wiki/Sorting_algorithm), the bese worst-case complexity is O(NlogN).

logN is small even with a big N. And Codility’s detection is not exactly accurate, and gave you a wrong estimation here.

Thanks for visiting my site! And enjoy coding and photography!

By the way, your photoes are amazing! I love them!

Thanks, I love photography for as long as I can remember ðŸ˜€

Here is my solution, following “Pigeonhole principle”:

Result:

https://codility.com/demo/results/demoMSXU28-G6F/

Hi,

Thanks for posting your solution. I learned something new from this one. Just wanted to share an opinion: I know that complexity is expressed based on the worst case scenario, but statistically I think that a solution that sorts the array first will almost always be faster. And that’s of course if we assume that the input is an array of random (or seemingly random) numbers, not someone deliberately creating worst case scenarios.

So if we do something like this:

1. sort the array,

2. go through the sorted array (ignoring negative values)

2.1. check if the first positive element is greater than 1, i which case just return 1, which leaves just the sort complexity O(logN), and that’s most cases when dealing with random input

2.2. otherwise look for the first gap (between positive numbers), which, in the case of random input will most probably be very early in the array, giving O(logN) + O(small fraction of N)

So if we were to solve a real world problem, we would probably inspect the input statistically (maybe even ad-hoc, using intuition) and then choose the solution that works the best for the specific use. The worst case O(logN)+O(N) would just be a very very rare occurrence.

Best Regards

ðŸ™‚

Thanks for your discussion! However, the average performance of the best practical sorting algorithms is O(NlogN), not O(logN). Therefore, your solution does NOT fit the requirements.

Bad solution (Pascal)

It is bad because of the time complexity:

O(N**2)

It got a 100% for Correctness and a 25% for Performance. So it’s a fail

Also a bad Pascal solution (same time complexity O(N**2)) score : 100% and 50%

(now sorting the array before doing the calculations)

I sincerely appreciate your sharing and involvement. However, to keep the comunity clear, please only post the good solutions. Sorry for that.

Ruby 100%

100% in JS https://codility.com/demo/results/demoWN6DHU-JVT/

No you didn’t, I just test your solution verbatim and it doesn’t work.

I also tried minutes ago. It does work.

– add up the total of the array (s) and count the number of elements. (n)

– for the full array 1..n+1 the sum would be (1+n+1)*(n+1)/2 i.e. there are n/2 pairs of numbers summing to 1+ (n+1)

– therefore the difference between the expected sum (including the missing element) and the actual calculated sum s is the missing value

– works in O(n)

Yes, logically it is right. But it may lead to overflow, which makes the result unreliable.

Are you sure this is ok if the numbers in array are totally random and can be repeating ?

It works if and only if the numbers are unique and in range of 1 to N + 1. I was wrong.

This is my python solution for 100%:

Assuming the hash function is perfect (which is next to being real), the solution is a good exmaple of hash set.

Thanks!

Here my solution in Java 100%, a little bit verbose but it works:

Here’s the link:

https://codility.com/demo/results/demoA3WP3K-ZCA/

How come every solution checks from the 1..N range?

Shouldn’t an array containing:

A = [3, 4, 6];

return 5 instead of 1?

Also in the problem statement it’s not clear what value to return if the array is complete. How did you guess what they required?

The problem statement: returns the

minimal positive integer(greater than 0) that does not occur in A.Another JS solution this one works 100% ðŸ™‚

Awesome! However, the variable name “min” is a little bit misleading. It is actually the trying prober for the result.

Solution in C#:

I don’t understand limiting the size of the occurrence array to N elements.

According to the problem on Codility,

N is an integer within the range [1..100,000];

each element of array A is an integer within the range [âˆ’2,147,483,648..2,147,483,647].

So A might be [2147483647] for example where N=1 and then the A[item..] references would be out of range.

Have I misunderstood the question?

It’s pigeonhole principle. With N integers, the minimal missing *positive* integer must be in the range of [1, N + 1] (both ends inclusive).

Something I dont get is, wouldn’t doing 2 for loops end up being O(2N)?

Big O is used to measure the order of complexity. O(2N) means exactly the same thing as O(N).

Here is my solution in Python, 100% in both correctness and performance

Expected worst-case time complexity is O(N), while yours is O(NlogN) because of the sorting.

what if I don’t know pigeonhole principle? O(N) time with O(1) space:

reading my blog ðŸ™‚ just kidding