Question: https://codility.com/demo/take-sample-test/count_distinct_slices
Question Name: Count-Distinct-Slices or CountDistinctSlices
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | def solution(M, A): accessed = [-1] * (M + 1) # -1: not accessed before # Non-negative: the previous occurrence position front, back = 0, 0 result = 0 for front in xrange(len(A)): if accessed[A[front]] == -1: # Met with a new unique item accessed[A[front]] = front else: # Met with a duplicate item # Compute the number of distinct slices between newBack-1 and back # position. newBack = accessed[A[front]] + 1 result += (newBack - back) * (front - back + front - newBack + 1) / 2 if result >= 1000000000: return 1000000000 # Restore and set the accessed array for index in xrange(back, newBack): accessed[A[index]] = -1 accessed[A[front]] = front back = newBack # Process the last slices result += (front - back + 1) * (front - back + 2) / 2 return min(result, 1000000000) |
Hi,
I was trying to solve this problem in Java, and couldn’t figure out where I went wrong. Perhaps, could you take a look at it and point out the mistake?
Your method to update the begin index is incorrect.
I will explain a little bit of code which was cryptic to me:
result += (newBack – back) * (front – back + front – newBack + 1) / 2
The above line of code is equal to the below ones:
result += (front – back) * (front – back + 1) / 2
result -= (front – newBack) * (front – newBack + 1) / 2
Thanks for improving the post with more details!
Hey Sheng ,
Can you please explain following line
‘ result += (newBack – back) * (front – back + front – newBack + 1) / 2 ‘
Almost every solution uses this but I havent found why this formula is used .
Have you used any theorem to derive this ?
Just like Mr. Botond Orban said, you can deduce the formula from his equation set:
(front – back) * (front – back + 1) / 2 – (front – newBack) * (front – newBack + 1) / 2
= (newBack – back) * (front – back + front – newBack + 1) / 2
PS: the formula: 1+2+3+4+…+n-1+n = n*(n+1)/2
Thanks for your explanation! Your first comment needs my approval. Otherwise, it will show automatically.
Please refer to comment from @Botond Orban:
result += (newBack – back) * (front – back + front – newBack + 1) / 2 is the same as:
result += (front – back) * (front – back + 1) / 2 – (front – newBack) * (front – newBack + 1) / 2.
And
(front – back) * (front – back + 1) / 2 is the number of slices in A[back : front + 1],
(front – newBack) * (front – newBack + 1) / 2. is for A[newBack : front + 1]
Ok, I got 90 and always failed the range check. I didn’t figure out the reason until 2016! 🙂
For whatever ever reason, I thought the int_max is something starting with 2 and with 11 digits, but it only has 10 digits. So, after this great discovery, I finally got my first successful submit in 2016!
BTW, the cleanup of the lookup table may not be necessary.
Well, the reason I came back here is because I fall into the same trap again 🙁
First of all, I overlooked the integer overflow here again. Then, when I tried to fix it, I put 1e10 instead of 1e9, which took me 4 hours to figure that out!
“got 705082704 expected 1000000000”
Sad, old enough to count 0s wrong…
I present here a slight different solution than above using array, which is almost as same as Sheng’s python solution.
Sometimes, I used len(“1000000000”) in Python to count the 0s 🙂
Well, I have tried this one with C#. Apparently, everything is correct, my code is correct and efficient, but for particularly one of the tests of performance, it says the following:
WRONG ANSWER
got 705082704 expected 1000000000
So, what have I done? I submitted a search on Google with the two following terms:
codility 705082704
Then I found your post. I don’t understand this, because I am 99.99% that my code is correct. I am testing millions (literally) of cases and my code seems to be correct. I would understand a mistake if the result 705,082,704 was bigger than 1,000,000,000 (because when the result is bigger than 1,000,000,000 it should return 1,000,000,000), but IT ISN’T, 705,082,704 is lower than 1,000,000,000. I don’t understand it…
another one that’s 100/100 but doesn’t use the n(n+1)/2 formula, as it instead adds up the new slices on each loop iteration, making the code longer but the math simpler
It becomes a bit hard to read code when you name your variables this way. Here is another implementation that doesn’t use n(n + 1)/2 formula, and uses a single loop:
Hi
Here is my solution… it was easier for me to understand… basically I am counting the eligible slice ending at a given index.
Codility 100%, no math, one plain cycle solution.
Hope code is tagged correctly now 🙂
It is correctly tagged now. Thanks very much for sharing!
this is my sample code
Thanks for sharing. However, it does not satisfy the requirement: expected worst-case time complexity is O(N).
I think the codility guideline is missing the slice (1,4) and not sure why they didn’t include
slice (2, 4) when in fact they did include slice (3, 4)
With input [3, 4, 5, 5, 2], slice (1, 4) is [4, 5, 5, 2] and slice (2, 4) is [5, 5, 2]. Both slices contain the duplicated number
5.
Glad to see that you are actively participating. Really thanks for taking out time to reply. I do hope to get done with these demo tests so that I can come back to discuss the new unsolved questions.
You are very welcome! I have much less time on this site. But I am trying to keep it alive 🙂
Simple but efficient
This is my C++ solution:
https://app.codility.com/demo/results/training4UV3KH-Q5M/