Question: https://codility.com/demo/take-sample-test/missing_integer

Question Name: Missing-Integer or MissingInteger

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | def solution(A): ''' Solve it with Pigeonhole principle. There are N integers in the input. So for the first N+1 positive integers, at least one of them must be missing. ''' # We only care about the first N+1 positive integers. # occurrence[i] is for the integer i+1. occurrence = [False] * (len(A) + 1) for item in A: if 1 <= item <= len(A) + 1: occurrence[item-1] = True # Find out the missing minimal positive integer. for index in xrange(len(A) + 1): if occurrence[index] == False: return index + 1 raise Exception("Should never be here.") return -1 |

Hi!

Here is a shorter version:

it is not clear from the problem description wether the elements of the array must be <= N.

In case they can take any value within the range it can be changed to:

Thanks for sharing you solutions!

Here, the set should works the same as frozenset, right?

Hi, I think this solution is wrong. If I pass A = [99999999], I get 100000000. Answer should be 1

Yes, you are right. The solution has bug. Please try to replace “start = 0 if smallest < len(a) else smallest" with "start = 0".

Thanks for your good catch!

Yep, set and frozenset works exact same way, frozenset is immutable.

Great! THX!

Hi, here is my solution in PHP 100/100

The same solution ported to Python

Hi Ricardo, your solution is right. But the expected worst-case time complexity is O(N). In your solution, you sorted the input array first, which is O(NlogN).

PS: to post the code, please include your code inside a pre block, instead of code block, like <pre> code </pre>

Thanks for letting me know it Sheng,

When I started the exercise I was concerned about sorting the array (as a small developer, Big-O notation never was my biggest concern, until now) but then I got surprised by the result on Codility. They detected a time complexity of O(N).

Do you know why is that?

For sort() PHP uses a variation of Quicksort, so shouln’t be O(NlogN) as you stated?

Thanks

For the general sorting algorithm in practice (http://en.wikipedia.org/wiki/Sorting_algorithm), the bese worst-case complexity is O(NlogN).

logN is small even with a big N. And Codility’s detection is not exactly accurate, and gave you a wrong estimation here.

Thanks for visiting my site! And enjoy coding and photography!

By the way, your photoes are amazing! I love them!

Thanks, I love photography for as long as I can remember ðŸ˜€

Here is my solution, following “Pigeonhole principle”:

Result:

https://codility.com/demo/results/demoMSXU28-G6F/

Hi,

Thanks for posting your solution. I learned something new from this one. Just wanted to share an opinion: I know that complexity is expressed based on the worst case scenario, but statistically I think that a solution that sorts the array first will almost always be faster. And that’s of course if we assume that the input is an array of random (or seemingly random) numbers, not someone deliberately creating worst case scenarios.

So if we do something like this:

1. sort the array,

2. go through the sorted array (ignoring negative values)

2.1. check if the first positive element is greater than 1, i which case just return 1, which leaves just the sort complexity O(logN), and that’s most cases when dealing with random input

2.2. otherwise look for the first gap (between positive numbers), which, in the case of random input will most probably be very early in the array, giving O(logN) + O(small fraction of N)

So if we were to solve a real world problem, we would probably inspect the input statistically (maybe even ad-hoc, using intuition) and then choose the solution that works the best for the specific use. The worst case O(logN)+O(N) would just be a very very rare occurrence.

Best Regards

ðŸ™‚

Thanks for your discussion! However, the average performance of the best practical sorting algorithms is O(NlogN), not O(logN). Therefore, your solution does NOT fit the requirements.

Bad solution (Pascal)

It is bad because of the time complexity:

O(N**2)

It got a 100% for Correctness and a 25% for Performance. So it’s a fail

Also a bad Pascal solution (same time complexity O(N**2)) score : 100% and 50%

(now sorting the array before doing the calculations)

I sincerely appreciate your sharing and involvement. However, to keep the comunity clear, please only post the good solutions. Sorry for that.

Ruby 100%

Ruby

Although your solution is valid (tested in ruby console):

It’ll not work on the

`Codility`

challenge because as of Ruby 1.9.0, String#s are no longer Enumerable. You can’t simply iterate over a String or convert it to an Array100% in JS https://codility.com/demo/results/demoWN6DHU-JVT/

No you didn’t, I just test your solution verbatim and it doesn’t work.

I also tried minutes ago. It does work.

– add up the total of the array (s) and count the number of elements. (n)

– for the full array 1..n+1 the sum would be (1+n+1)*(n+1)/2 i.e. there are n/2 pairs of numbers summing to 1+ (n+1)

– therefore the difference between the expected sum (including the missing element) and the actual calculated sum s is the missing value

– works in O(n)

Yes, logically it is right. But it may lead to overflow, which makes the result unreliable.

Are you sure this is ok if the numbers in array are totally random and can be repeating ?

It works if and only if the numbers are unique and in range of 1 to N + 1. I was wrong.

This is my python solution for 100%:

Assuming the hash function is perfect (which is next to being real), the solution is a good exmaple of hash set.

Thanks!

i am learning python. can you please explain how this works?

set(range(1, len(a) + 2)) is the set of all elements plus one possible missing integer.

set(a) is the set of all candidates, after dedup.

You could Google and find all these build-in functions.

Hi Ben,

I’m new to python programming.

Can you explain to me how you came up with your answer?

Simply genius

Here my solution in Java 100%, a little bit verbose but it works:

Here’s the link:

https://codility.com/demo/results/demoA3WP3K-ZCA/

How come every solution checks from the 1..N range?

Shouldn’t an array containing:

A = [3, 4, 6];

return 5 instead of 1?

Also in the problem statement it’s not clear what value to return if the array is complete. How did you guess what they required?

The problem statement: returns the

minimal positive integer(greater than 0) that does not occur in A.Hi, I came to this answer 5 years later!

I change this line:

max = Math.max(max, length); // taking into account arrays with random elements

with this:

max = Math.max(max, positives.size());

Let me know your opinion.

Best Regards

Another JS solution this one works 100% ðŸ™‚

Awesome! However, the variable name “min” is a little bit misleading. It is actually the trying prober for the result.

Solution in C#:

I don’t understand limiting the size of the occurrence array to N elements.

According to the problem on Codility,

N is an integer within the range [1..100,000];

each element of array A is an integer within the range [âˆ’2,147,483,648..2,147,483,647].

So A might be [2147483647] for example where N=1 and then the A[item..] references would be out of range.

Have I misunderstood the question?

It’s pigeonhole principle. With N integers, the minimal missing *positive* integer must be in the range of [1, N + 1] (both ends inclusive).

Something I dont get is, wouldn’t doing 2 for loops end up being O(2N)?

Big O is used to measure the order of complexity. O(2N) means exactly the same thing as O(N).

Here is my solution in Python, 100% in both correctness and performance

Expected worst-case time complexity is O(N), while yours is O(NlogN) because of the sorting.

what if I don’t know pigeonhole principle? O(N) time with O(1) space:

reading my blog ðŸ™‚ just kidding

My solution in ruby

My solution in javascript

in case you need code in JS, ES6.

`A.sort((a,b) => Math.abs(a) - Math.abs(b)).reduce((res, val) => { val === res ? res++ : res; return res; }, 1))`

Here my solution in Java. It works but the time complexity is: O(N) or O(N * log(N))

My solution using Google go language

My solution using C++ – 100%

solution in Ruby for 100%

but

Detected time complexity:

O(N) or O(N * log(N))

My Java Solution 100%

PHP 100%

Javascript 100% (however, both this and my previous PHP version even though scoring 100% show Detected time complexity: O(N) or O(N * log(N)) SO next I’m going to study that pidgeonhole principal.

If someone is looking for a java solution can try it (100% in both correctness and performance)

This is one easy way based on the edge cases given

This post still comes up so I’ll put my JavaScript solution for keepsake ðŸ™‚

Scores 100% on Task score – Correctness – Performance

Hi, how performance of function set() does look like? Is it much faster compare to single for loop with list or dict as a help?

I cannot see why it’s working. The problem asks for a single integer, while your code returns a dictionary.

100% on codility but time complexity O(N) or O(N*log(N))

Not heard of the pigeon hole but now going to look into it.

My solution. I thought the time complexity should be O(N), Why it says it is O(N**2)?

Because of the

`not in`

operation.This hasn’t been updated in a while… just FYI in case you were confused reading the solution:

The question has changed and now in the array there aren’t only integers between 1 and N allowed, it’s *any* N integers between 1 and 100000.

The pidginhole therefore won’t work anymore.

But I liked reading about your approach.

If you can provide a solution for the new problem NOT using sort (which I do) I’d be grateful to see it. ðŸ™‚

It’s still working well: https://app.codility.com/demo/results/training244KA2-MMX/

pidginhole ðŸ™‚ You said it, and you should know it.

100% solution using Swift – I do not fully understand the Pinghole principle but it works!

My solution in Python:

Solution in Python 100% :-

This is my c++ solution with 100% performance and correctness

100% on correctness and performance. Detected Time Complexity – O(N) or O(NlogN). I’ve explained my code with comments.