Question: https://codility.com/demo/take-sample-test/min_avg_two_slice
Question Name: MinAvgTwoSlice
The key to solve this task is these two patterns:
(1) There must be some slices, with length of two or three, having the minimal average value among all the slices.
(2) And all the longer slices with minimal average are built up with these 2-element and/or 3-element small slices.
Firstly we will prove the statement (1). In all the following discussion, we assume the slices have two or more element. In every array, there must be at lease one slice, to say S, having the Minimal Average (MA). And PLEASE PAY ATTENTION, the following proof is done with the S, which HAS the global minimal average.
- If the length of S is two or three, it follows our conclusion.
- If the length of S is odd, we could divide it into a 3-element sub-slice and some 2-element sub-slice. For example, for the slice [1, 2, 3, 4, 5], we could get a 3-element sub-slice [1, 2, 3] and a 2-element sub-slice [4, 5]. Or differently we could get [1, 2] and [3, 4, 5]. But the split does not matter in the following prove.
- If the sub-slices have different averages, some of them will have smaller average than MA. But it conflicts with the condition that, the MA is known as the global minimal average for all slices. By this contradictory, it’s proved that all the sub-slices MUST have the same average.
- If all the sub-slices have the same average, the average of them must be MA. So all these sub-slices have overall minimal average and length of two or three.
- If the length of S is even, we could divide it into some 2-element sub-slice. For the slice [1, 2, 3, 4], the only possible split is [1, 2] and [3, 4]. Then, similar with the previous case, all these 2-element sub-slices have the global minimal average.
And in the construction in step b, we have already proven the statement (2).
UPDATE 03-13-2014: We are NOT proving that, 4-or-more-element slices cannot have the global minimal average. We just proved that, there MUST be some 2-element and/or 3-element slices, having the global minimal average. In other words, we are NOT SURE whether there are some 4-or-more-element slices holding global minimal average. But we are ONE HUNDRED PERCENT SURE about the 2-element and/or 3-element slices.
UPDATE 03-02-2015: There is another excellent discussion in the comments by Kim. Thanks!
UPDATE 03-14-2016: If you had any question about the statement “If the sub-slices have different averages, some of them will have smaller average than MA”, please read this comment and this comment. Thanks!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | def solution(A): min_avg_value = (A[0] + A[1])/2.0 # The mininal average min_avg_pos = 0 # The begin position of the first # slice with mininal average for index in xrange(0, len(A)-2): # Try the next 2-element slice if (A[index] + A[index+1]) / 2.0 < min_avg_value: min_avg_value = (A[index] + A[index+1]) / 2.0 min_avg_pos = index # Try the next 3-element slice if (A[index] + A[index+1] + A[index+2]) / 3.0 < min_avg_value: min_avg_value = (A[index] + A[index+1] + A[index+2]) / 3.0 min_avg_pos = index # Try the last 2-element slice if (A[-1]+A[-2])/2.0 < min_avg_value: min_avg_value = (A[-1]+A[-2])/2.0 min_avg_pos = len(A)-2 return min_avg_pos |
Hi, I wrote a C version of answer according to your python version, however, I only got score of 90 (one test failed), while yours got 100. I cannot see any difference between my C code and your python code. could you help me? here is my code: . Thanks.
It is because of the accuracy of float number. Change the “float minavg;” to “double minavg;”. And you will get 100. During the computation of float number and/or double number, the computer will use “double” type, and then convert it into float if necessary. So when you compared the “minavg>(A[i]+A[i+1])/2.0”, (A[i]+A[i+1])/2.0 is double, while minavg WAS converted from double into float and IS converting again into double for comparison. Some accuracy is lost in these conversions.
Otherwise, you could use a temporary float variable, to say min_temp, to store the value of (A[i]+A[i+1])/2.0 and (A[i]+A[i+1]+A[i+2])/3.0. Then you could compare the min_temp and minavg. You also will get everything right.
Hope it would be helpful! And your comment is great motivation for me to continue this blog!
Thanks for the insight!
I would never reached the conclusion that the array most have 2-or-3-sized slices with MA. Your proof nailed it, great thinking!
Thanks for your visit and comments!
The only thing that bothers me is that this exercise is connected with Prefix Sums…
You’re not using them at all and with your assumptions greatly simplifies the problem.
Yes, the solution does nothing with the prefix sums. And I have no idea, how it could be applied here. I am also bothered by this issue. If someday you find the prefix sums solution, please do tell me!
I guess the implication from the Codility question is to use a prefix sum to do fast average calculation. So, if you have an input vector of:
input = [ 1,2,3,4,5,6,7,8,9,10 ]
and a prefix sum vector:
prefix = [ 1,3,6,10,15,21,28,36,45,55 ]
You can calculate averages over arbitrary ranges very quickly. So, if you want to calculate the average of the slice from index i to j, its just a matter of:
(prefix[j]-prefix[i]) / (j-i)+1
However, your solution is much better 🙂 and doesn’t need to waste O(n) in space plus another iteration through the data. Great work Sheng!
Well, your method to calculate the average is brilliant. But what to do after that? There are still about O(N^2) slices to check. How could your reduce our trying count for these slices? Are you using the same idea as http://stackoverflow.com/questions/22137951/codility-min-average-slice? Thanks!
Exactly this method is nice. I also assume codility wants something with prefix sums but I stuck that I need inner loops for j index! And it causes O(N^2)
“If the sub-slices have different averages, some of them will have smaller average than MA. But it conflicts with the condition that, the MA is known as the global minimal average for all slices. By this contradictory, it’s proved that all the sub-slices MUST have the same average.”
I’m confused. Isn’t possible to have different averages for different sub-slices?
For instance:
[1, 2, 3, 4, 5]
[1, 2, 3] -> 2 Average
[4, 5] -> 4.5 Average
[1, 2] -> 1.5 Average
[3, 4, 5] -> 4 Average
These are all different. They don’t have the same average.
How it’s proven that isn’t needed to compare the sub-slices with four or five elements?
Can you try clarify a bit? (sorry for not understanding)
Hello, thanks for visiting my blog! When we consider “the sub-slices have different averages”, we are splitting the S, which HAS the global Minimal Average, into sub-slice. But in my bad example, [1, 2, 3, 4, 5] does not hold the global minimal average. A better example would be [1, 2, 1, 2, 1.5, 3, 4, 5]. The slice [1, 2, 1.5, 3, 4, 5] has sub-slices with different averages. But for [1, 2, 1, 2, 1.5], there must be AT LEAST one split, which makes every sub-slices with same average. In other words, it is possible to have different averages for different sub-slices of one slice. But for the slice with global minimal average, it MUST have AT LEAST one set of sub-slices with same average.
But for [1, 2, 1, 2, 1.5], there must be AT LEAST one split, which makes every sub-slices with same average.
Therefore “global minimal average” is 1.5?
But what about slice: 1, 2, 1 -> avg = 1.3
For [1, 2, 1, 2, 1.5], there is one and only one split [1 , 2 , 1], who HAS the global minimal average. And for THIS split [1, 2,, 1], it ONLY has one sub-slice [1, ,2, 1] (we cannot divide the [1, 2, 1] into smaller ones with the challenge restriction.).
So for [1, 2, 1], every sub-slices is with same average.
“there must be AT LEAST one split (to say SPLIT), which makes every sub-slices with same average.” is a necessary condition, NOT a sufficient condition on “SPLIT holds the global minimal average”.
So even for slice [1, 2, 1, 2], all of its sub-slices hold the same average. BUT it’s not enought to get the conclusion that [1, 2, 1, 2] hold the global minimal average.
Hi Sheng, I think I managed to find a solution using prefix sums in O(n), but I can’t seem to pass one test. I was wondering if I could consult you, and ask if you could teach me where I’m going wrong? http://stackoverflow.com/questions/22137951/codility-min-average-slice
Hello, Dan! I have posted my answer in stackoverflow.com in name of Sheng. Hope it be helpful!
Thank you so much, and this makes sense!
My pleasure! And enjoy coding!
Just stumbled on this blog looking for a solution.
Unfortunately, the approach based on Kanade’s algorithm doesn’t work in all cases. I have posted a reply adding the explanation.
Execute me. Who is Kanade?
I do not understand why there must be at least one set of sub-slices with the same average..
Could you explain this more? Why this is a must? Is there a proof that if such situation does not exist (there is split which makes all slices the same), then this bigger slice (the one that we want to split into 2-3 slices) does not have MA
Ok after writing this question I understood how silly it was. Right now I realized that this proof is about, that every slice which has MA, needs to be splitted to slices that are having same average, and also have to be size 2 or 3.
It’s hard to believe sometimes how blind you can be until you ask a stupid question.
Thank you for this blog, I truly envy you for finding this solution, and describing good way to prove it is correct. Big KUDOS for you.
That is fine. So happy to hear the blog be helpful. And I greatly appreciate your comments!
You can spare the use of floating point division also with a simple math trick:
Instead of comparing the averages (A[i]+A[i+1])/2 and (A[i]+A[i+1]+A[i+2])/3 you can multiply both expressions times 6 and get 3*(A[i]+A[i+1]) and 2*(A[i]+A[i+1]+A[i+2]). Then you would similarly get the minimum of this “average*6”.
Sorry for the unimportant comment, ;), it´s just that I am in electronic engineering and care about optimization, always faster to work with integers I think
No, your comments are quite valuable. In my own answer, I also tried your method. But I thought, it will make the solution harder to understand. Thus I excluded it from this post. But in my own coding, I am willing to learn any possible optimization. Thanks!
I think your proof is deeply flawed. The split does matter a lot when constructing sub-slices.
Consider the input [8, 0, 0, 8]. Clearly the total average is 4 and the average of both halfs [8, 0] and [0, 8] is 4 too. But the minimum is 0, obtained from the two-element slice in the middle.
Also consider [10, 0, 6, 0] where the average is 4 and the two-element averages are 5, 3 and 3. Yet the minimal average is 2 obtained by [0, 6, 0].
While I disagree with your proof I agree with statement (1): Checking pairs and triples suffices.
There is a pre-condition, when we consider the split: the to-split slice HAS the global minimal average. Please notice, all the proof is done under such condition.
Yes but it is the precondition of your demostration. It is not the condition of the problem.
With input input [8, 0, 0, 8].
Your code do an evaluation of the split [0,0 ] and [0,0,8]
in your code the variable index is increased one by one step , It is not two by two, or by tree steps. regards
Yes, it is not the condition of the problem, BECAUSE it is an inherent inevitability of the problem. Every array MUST have a slice, holding the global minimal average. And our proof is done with such slice.
Actually, my code tried every 2-or-3-element slices: [8, 0], [8, 0, 0], [0, 0], [0, 0, 8], and [0, 8].
Genius haha
Hi Sheng, I am sorry, but I am struggling to grasp the reasoning behind your solution: you only consider slices composed of 2 or 3 elements, why not bigger? How do you know that bigger slices won’t have a smaller average?
thank you, sorry if the question is silly.
That is fine. It is actually a tricky question. If any bigger slice has the GLOBAL mininal average, it MUST be able to be split into 2-element and/or 3-element sub-slices with the same and global mininal average. See the (b) and (c) parts for the proof.
Ok, now that I revised it, it makes, sense. It’s just I was not expecting to have to do mathematical hypothesis or such, for this test. What I was wondering (linked to my secon question) is: they usually expalain a topic and then have you performing tests about them, why can’t I find solutions about it that do that?
I mean, your solution is brilliant, the thing is…it’s too much brilliant! Clearly it is good, thinking outside the box and all, my problem is: what is the solution they had in mind when they posed the question?
The stackoverflow answer doesn’t seem like it, or my skills in JavaScript are too little to truly understand it?
Yes, the solution on stackoverflow did not use a raw prefix sum. But the similar idea, and from the end to the front. Actually I did not find a solution with exactly prefix sums for this question.
So, how do we know there isn’t a bigger slice with a lower index?
I can’t find an example, so I’m pretty sure it’s right but I’m curious as to why.
Basically, imagine a 4 element slice with the global MA. The code only works if the FIRST 2 or 3 elements of that 4 element slice is the global MA. If, for example the last 2 elements was the glocal MA. Then you wouldn’t be returning the lowest possible index. How do we know that’s always true?
It’s case #c in my proof.
If a 4 element slice (to say, [A, B, C, D]) with the global MA, we can split the slice into two sub-slices as [A, B] and [C, D]. And BOTH [A, B] and [C, D] MUST be with the global MA.
One more thing, sorry for double comment: this demo is inserted in the Prefix Sums chapter, though I cannot seem to find a solution with Prefix Sums in O(N) time complexity. Based on what I can see here, you do not make use of prefix sums, am I wrong?
Yes, you are right. I did not use the prefix sums. If you need a prefix sum solution. You could see it on stackoverflow.
According to the author, that solution uses dynamic programming, which is Lesson_9, and does not use prefix sums. It is too early to use it for Lesson_5, so the question remains open.
Hi, I’d like to suggest another proof that it’s enough to examine pairs and triplets. I’d be glad to hear if you find it correct and reasonable.
Claim: Any slice of four consecutive elements always contains a pair that has a lower (or equal) average, than the average of the whole (4 elements) slice..
Proof by contradiction:
let’s assume that there exists such a 4-elements-slice whose average is smaller than any of the averages of its constituent pairs (2-elements slices).
We then (falsely) assume that: “there exists a1,a2,a3,a4 such that (a1 + a2 + a3 + a4)/4 < (a{j} + a{j+1})/2, for every j=1,2,3".
Now lets examine the consequences.
taking j=1, we find (a1 + a2 + a3 + a4)/4 < (a1 + a2)/2 ==> a3 + a4 < a1 + a2.
and taking j=3, we find a1 + a2 < a3 + a4.
So that, we got (a1 + a2 + a3 + a4)/4 < (a{j} + a{j+1})/2, for *any* j=1,2,3″ is false. In other words – For any set of a1,a2,a3,a4 (i.e., any existing 4-elements slice), there exists a pair of elements (defined above by j) whose average is either smaller or equal to the average of the 4-elements slice”.
So any 4-elements slice can be decomposed to pairs of which one has a smaller (or equal) average. This also applies for any longer slice, since it can be seen as being composed of 4-elements sub-slices, which in turn contain some pairs with smaller averages.
Triple-elements slices can’t be similarly decomposed, as is seen for example in {1,9,1}.
This means we must work with pairs and triplets.
Thanks for your great proof! But I am not clear about this point “This also applies for any longer slice”. What if it is a 5-elements slice? How could we divide it?
Great inspiration Sheng. Very glad I have stumbled on your blog.
Unrelated to this problem may I ask, what would you recommend to do in order to improve one’s algorithmic problem solving skills? I’ve tried to read about popular algorithms but it doesn’t seem to really help on “codility like” problems. Any suggestions?
It is hard to answer this question. To be honest, I am not excellent at these challenges in Codility. And for this problem, I spent more than one week, and asked the help from my friends. It is not an easy job.
The knowledge from books is important. But it is far from enough. You have to keep pratice to gain more experience and improve your skills. After solving this problem, all can see the underlying issue is easy and basic: try all 2-elements and 3-elements slices. Most, if not all, of software developers can solve it in minitues with O(N). And the textbook could teach us the slices algorithms.
But why it takes us some much timet to get the right answer? Because we failed to abstract the challenges into the known data structure and algorithms, in most cases (Some challenges are really hard, and out of this discussion). The textbook will never be able to teach us how to do this job. In my opinion, practice experience is much more helpful in this field than books.
Hope it not be misleading! Enjoy coding!
Hello and thanks for the “2-element and/or 3-element sub-slices” idea.
Here is an implementation using prefix sums:
Codility demo
Thanks for sharing your solution!
You are welcome! And enjoy coding!
Your solution does not look any further than 3. You do not need prefix sums for that.
excellent blog…i like the approach even if it’s not in the method expected by Cdlty
Thanks for visiting my blog! And enjoy coding!
Hi Sheng,
Thanks for the info on using 2 or 3 slices. That was very helpful! Here’s one solution that would work (100% on Codility) and uses Prefix Sums:
Thanks for sharing your solution!
It does not, however, use the prefix sum in any meaningful way; maintaining the “sum of the latest 3” is easier and needs constant memory.
Agree. Prefix-sum solution is still essentially unavailable.
Does your solution work if all the values in the array are negative? I dont think so. Because the slice with least average is the whole array and not any 2 or 3 slice.
It should work well. For example, with the input [-1, -100, -100], the slice with least average is [-100, -100], not [-1, -100, -100].
I think you are missing something, too:
>each element of array A is an integer within the range [−10,000..10,000].
So, what if [4,-10,5,10,-5,8] ?
We are looking for the 4-slice [-10,5,10,-5] MA=0.
But [-10,5,10] MA = 1.67; [5,10,-5] MA=3.33; [5,10] MA=7.5 and so on.
(Used easy numbers for demonstration.)
So this contradicts your starting assumptions on 2/3-slices.
And I think there is no linear solution, you can never be sure on local criteria.
At least nlogn, but I have not thought about that.
I really don’t like this tests. They use it for recruitment here … “forget CVs, forget reports” … it somehow sucks.
In this example [4,-10,5,10,-5,8] , the two-length-slice [4, -10] hold the minimal avg. So my 2/3-slice conclusion still works
Consider this input: int A[] = {-1, -2, -3, -4, -5, -6, 11, -42, 200, 0, -10, -10, -10, -10, -10, -1000};
Is your algorithm still going to give correct answer? index 9, i.e. A[9] is correct answer for the original question.
how on the earth it is actually giving me correct answers !! Finding it hard to believe that it works fine on weired inputs as well 😛
Happy to see you getting the idea!
I interpreted the word ‘minimal’ to mean smallest or “closest to 0”, so that in my solution all my comparisons were made using absolute values. Using this interpretation, which I admit may not be the one Codility intended, means an average of 0 from the slice [-10,5,10,-5] is more minimal than an average of -3 from the slice [4,-10]. Hence I can understand Ron’s point that the answer is in a slice of size 4 rather is only one of size 2 or 3.
It seems like a word trap. Whatever, thanks for sharing your thinking!
I also had the same difficulty interpreting the word ‘minimal’. I think this point should be explained more clearly, or showing an example with both positive and negative values, where the most negative average value would be the ‘minimal’ value.
thank you sheng for your explanation…… i read all, Am a new programmer. So please explain the question, i didnt understand.. Will you.
The solution is kind of a math question, rather than a programming challenge. Have a paper and a pen, then try to follow my proof.
PS: read the content, and also please read the other comments. They are helpful.
Hi Sheng, Your Code is simply awesome, Thanks for the help. Also a small intro abt myself, I am a masters student doing MS in CS at Northeastern University. I have coding challenge at Codility for a Compny “Cloudera”. Do you have any advise or preparation materials !!!! It would be of great help!!!
Thanking you in advance.
Thanks for your visit. But to be honest, I do not know any special way to prepare the interview with Codility.com. The questions on Codility.com are so elegant. Just show your best!
PS: Introduction to Algorithms is an awesome book.
Leetcode.com is a great online practice platform as Codility.com.
Thank You Sheng, for the immediate reply and also for the other Practice Sites. I shall do my best and I wish you the best for your future endeavors :). A lot of people will definitely learn from your site. Kudos to you and your thought of sharing ideas 🙂 Great Work.
Thanks! Just remember one thing about Codility, do consider all corner cases.
Sure Will Do 🙂 Thanks
Hi Sritharan
Have you given the codility test?
No Harsh, I am still doing the codility demo practice, and Sheng’s site is the temple for my reference 🙂 . How abt u ? are u in the game ?
Hi Sritharan,
Can you share what questions were asked in cloudera codility
More likely, they signed the non-disclosure agreement.
without using floats this gives 100% in C (you can convert it in python 🙂 )
Thanks for sharing your C solution. Generally they have the same idea.
Unfortunately this corect and clever solution break the O(n) complexity condition. As you easily see the complexity is now O(n+m).
The problem is hard acordind all conditions and assumption.
Cheers
Sorry, what do you mean by M?
The trial length 2 and 3 is constant number. I cannot found another variable as M.
This is excellent. how long did it take to find this solution. I couldnt get the solution for more than 10 hrs
About one week. And I asked the help from my friends.
Despite all the effort, I failed to solve this problem. Only after I saw the article, I could finally realize that this was not a coding problem, but a mathematical one. I was deeply sad about this, but did not want to stop the challenge. So below is my own interpretation of the problem (or excuse for the failure I have made.)
So basically the problem is to prove next statement.
Let len(s) be the length of a slice s, sum(s) the sum of the all elements of the slice s, and ave(s) the average of the slice s. Then for an arbitrary array, every slice s having len(s) > 3 contains a sub-slice s’ such that ave(s) >= ave(s’).
Proof.
Suppose that s is a slice having len(s) > 3 and does NOT contain a sub-slice s’ such that ave(s) >= ave(s’). Since len(s) > 3, s can be divided into sub-slices t and u.
Then,
ave(t) = sum(t) / len(t), and ave(u) = sum(u) / len(u).
ave(s) = sum(s) / len(s)
= [sum(t) + sum(u)] / [len(t) + len(u)]
= [len(t) * ave(t) + len(u) * ave(u)] / [len(t) + len(u)].
If ave(u) >= ave(t) then let s’ be t
ave(s) = [len(t) * ave(t) + len(u) * ave(u)] / [len(t) + len(u)]
>= [[len(t) + len(u)] * ave(t)] / [len(t) + len(u)]
= ave(t)
= ave(s’).
Otherwise, if ave(t) >= ave(u) then let s’ be u
ave(s) = [len(t) * ave(t) + len(u) * ave(u)] / [len(t) + len(u)]
>= [[len(t) + len(u)] * ave(u)] / [len(t) + len(u)]
= ave(u)
= ave(s’).
This leads a contradiction, and completes the proof.
ps. After writing this note, I saw a counter example [8, 0, 0, 8] as Hunter2 indicated. A workaround of this is to permit a slice having length of one, although the problem description defines a slice to contain at least two elements. I guess that doing this does not harm the soundness of the proof. The point is to ignore all slices having the length of 4 or more, not to consider the slice having length of one. For example, you may argue that [0, 8, 1, 1] can be splitted into [0] and [8, 1, 1], but we can simply ignore [0] and take [1, 1] instead.
Updated:
Uncomfortable with the postscript I wrote, I thought a little more on the proof, and figured out that the postscript above was absolutely NOT necessary. As a matter of fact, at first time I was confused with [8, 0, 0, 8] as Hunter2 indicated and introduced a slice having length of one. But the real point was in a different place.
In the example slice [8, 0, 0, 8] what we want to claim IS that the average of [8, 0, 0, 8], which is 4, is greater than or equals the average of [8, 0] or [0, 8], which is also 4. Hence [8, 0, 0, 8] can be divided into sub-slices having the same or lower average value. It is important that we do NOT claim that the minimal average slice is [8, 0] or [0, 8]. Finding the minimal average slice is another job to do later and that job is to examine slices of size 2 or 3.
Finally thanks to Sheng for the insight. I could never imagine the idea behind the solution without his comment.
One more thing.
When I saw the claim that examining 2-3 slices is enough, what struck immediately in my mind was the mathematical induction. Probably most of us have the experience of proving next statement, that is;
every integer greater than 3 can be expressed as a linear combination of 2 and/or 3.
Then I realized that the same principal can be applied to this problem. How stupid I am! Rest of the problem was not so hard. In fact, I did not even try to code, because sometimes if once a solution is found, it is not much meaningful to code it.
However I feel that I’m fortunate since I still have the textbook of discrete mathematics. I should revisit it once more.
Lastly, I guess that I talked too much alone at this time. Sorry for your inconvenience. Enjoy happy coding!
Wow! I sincerely appreciate your comment. It make this post more complete! Thanks!
thanks for such a wonderful explanation
what happens if the slice is [2,1,1,1]? or [2,1,1,1,1]?
forget it , i was thinking something else
That is fine. It is a bit confusing.
Hi Shen,
One of the perf tests is failing:
WRONG ANSWER
got 79 expected 0
Can you tell me why?
Below is the ruby version of your code.
Please consider the test case [-1, -1, 0, 0, -1, -1, -1]. When there are two slice with the same average, the problem require us to output the first one. But your solution did not check the index of the same-average slices.
I never used the Ruby before. So the following solution only demonstrates how to fix the bug. You could optimize it latter.
Following is a C++ solution that doesn’t check for slice of 3.0 specifically but just keeps track of the previous values.
Essentially the same. But a different implement.
2 and 3 elem. C++ implementation gives wrong answer in medium_range test, so I like your algorithm, it works better than combinations of 2 and 3 elements, because you check 2 and 2,3,4,5… elements. Thanks!
Implementation with floats instead of doubles and 10001 instead of DBL_MAX would work fine in this example as well.
IMO, SA’s solution is good enough to pass all the test. Using 10001 instead of DBL_MAX is a not-so-good practice.
10001 can pass the test, but is essentially too small.
Thank you for sharing this about the 2 and 3 slices. Without that I was not able to do it on O(N) time complexity.
Ruby 100%
It is my great pleasure to see the blog be helpful! And thanks for all your shared Ruby solutions!
Hello a131a,
I am a newbie in ruby. What do you mean in the comment
#add a maximum value element to the array to skip the check of the last pair
#it is not nice, but I am lazy 😉
practically pls.
Hi shang,
I think your solution is just valid for your assumptions about slides of 2 and 3 element size. But I do not see any assumption regarding the size of the slice on the codility exercise. Even they take this slice formed with 4 elements:
A= [4, 2, 2, 5, 1, 5, 8]
A(1, 4), which have an average of (2 + 2 + 5 + 1) / 4 = 2.5
So if we have an array like this:
A=[0, 8, 0, 0, -8, 0]
the slice with the min avg is:
A(1,4)=0
and your code returns:3
How could this result be valid?
I enjoy reading your post solutions! They are well explained!
Thanks.
Cheers.
First of all, thanks for your visiting! My solution is solely based upon the assumptions about slides of 2 and 3 element size. But I also proved the correctness of my assumption in all cases.
For A= [4, 2, 2, 5, 1, 5, 8], the slice [2, 2], whose length is two, is smaller than yours.
For A=[0, 8, 0, 0, -8, 0], the min-avg-slice would be [0, -8] or [-8, 0], whose average is -4. My code will return 3, which is the start index of [0, -8].
PS: please notice that, the return value is the index of the min-avg-slice, NOT the average value.
PPS: enjoy it!
Hi!
I’m really impressed with your solution. I had really like 1.5 h of thinking about what you meant but finally I got the idea. This is really amazing blog and I am sure I will find much more useful content here. Thanks for sharing your solution ( i failed today with mine =) ) .
Julek
Hi Julek!
Thanks for visiting my blog! This challenge is really hard. I spent more than one week and found out the solution with my friends together.
Enjoy coding!
Sure. It is just another example of Codility cheating, i.e. posing mathematical problems as programming ones. There is no way you could solve it in 30 min unless you are utter genius. They should rename it to Mathility IMHO.
Mathility 🙂 🙂 🙂
Hi Sheng, brilliant deduction on the slice sizes.
Here’s a solution in Java using prefix-sums as expected by the cordiality test.
Guys! After all these long discussions, how do you get a job offer with codility? I know my question is irrelevant to the post. But codility is used by a lot of companies to hire people. Usually there is a exam with 4 questions and two hours and you know that if you do not solve all of them you won’t get hired. No matter anyone ever gets hired. The employer looks for somebody who can solve all the problems no matter if there is no one on earth who can solve all the problems. When I see that people are discussing here for months to find a solution for only one of the basic problems of this maniac website, I find it more and more unreasonable to spend time taking codility exams!
Do you agree we me? Do you think codility is resonable? What is your reaction when you want to get hired in a web development company developing for example tourism websites and you have to first fail on a two hour of bloody frogs and trees mathematical questions to proove that you are not eligible to be paid anywhere?
My two cents, the bottom line is: codility is fair to every single applicant. All applicants are trying to solve the same question. And your solutions show your ability, to some extend. The problem is how the employer use codility, rather than codility itself.
The questions in codility are for students that need to prov for employers that they really know how to go from requirements to an application using a programing language. In my case, I remember I was ok in solving these king of problems after finishing my university studies but today, after almost 10 years of working as developer in the “real world”, I’m useless.
It is actually the major disadvantage of Codility. It only focuses on the programming skills, while cannot cover all your experience and ability, especially the project experience.
Thanks for taking a read. I’m working on trying to condense this as much as possible but I’m getting bad values on the extreme values and large sequence. On the extreme I’m getting 99998 expected 0 and the large I’m getting 99988 expected 3. Any idea what is going on?
When i is len(A) – 2, the three_chunk is:
three_chunk = sum(A[i:i+3])/3.0 = sum(A[ len(A) – 2 : len(A) + 1])/3.0
Essentially it is:
three_chunk = sum(A[ len(A) – 2 : len(A) + 1])/3.0 = sum(A[ len(A) – 2 : len(A)])/3.0 = (A[len(A) -2] + A[len(A) – 1]) / 3.0
You are dividing the last TWO items with 3.0
Hope it helps.
Adding another Java solution 100/100 using your technique and prefix sums:
Hi Cheng and Noamo, wise solution and explanation respectively !
After all this resumes to prove the reduction to the absurdity, math. I’ve also tried to prove for an array of 3 elements analyzing the two pair, no conclusion, so no absurdity on that :p
More of the same, but now backwards 🙂
missing the triplet part. cut and past doesn’t work properly…
To post code, please read “Guideline for Comments” in the side column.
Amazing.
Have been a trying to get it not to timeout for complex cases for a while.
Next time I’ll try longer before googling it though.
thank you 🙂
No problem. Enjoy coding!
I think I am just too numb to recognize 2-slice and 3-slice are the only possible options for a global minimal.
So, I guess I went the hard way to solve it. First try only granted me 80%. The second run, well, would be useless in any real testing scenario, got 100%.
I know this is not the optimal solution in terms of the logic. But it proves that you still can get it done even without discovering the patterns. LOL
Any right solution, following the time/space requirement, is a good solution 🙂
Thank you. This is helpful for me to solve this problem.
My pleasure 🙂
Sheng, congrats for your solution as it really seems is the best. I only want to know how to demonstrate the following assertion which I think is the key for the entire mathematical problem solving:
“If the sub-slices have different averages, some of them will have smaller average than MA.”
How can it be demonstrated that if the sub-slices have different averages, some of them will have smaller average then the average of the hole parent slice?
Sorry if it’s a stupid question, but I’d like to understand it mathematically.
Hello Bogdan, sorry for this late response. Let S be the whole slice, and S1, S2, …, SN be the sub-slices such that S = S1 + S2 + S3 + … + SN. If the sub-slices have DIFFERENT averages, avg(S1), avg(S2), …, avg(SN) are not always the same. Not losing the generality, we assume that avg(S1) is:
1. for all i in [2, 3, …, N] such that, avg(S1) <= avg(Si); 2. Exist at least one i in [2, 3, …, N] such that, avg(S1) < avg(Si); In other words, S1 is the sub-slice, which has the smallest average value among all the sub-slices. avg(S) =sum(S) / len(S) = (sum(S1) + sum(S2) + … + sum(SN)) / (len(S1) + len(S2) + .. + len(SN)) = (avg(S1) * len(S1) + avg(S2) * len(S2) + … + avg(SN) * len(SN)) / (len(S1) + len(S2) + .. + len(SN)) > (avg(S1) * len(S1) + avg(S1) * len(S2) + … + avg(S1) * len(SN)) / (len(S1) + len(S2) + .. + len(SN))
= avg(S1) * (len(S1) + len(S2) + … + len(SN)) / (len(S1) + len(S2) + .. + len(SN))
= avg(S1)
That is, avg(S) > avg(S1). Proof done.
Simple and concise, thanks !
You are welcome 🙂
@Bogdan This is a general mathematical approach when demonstrating certain statements. Given an array A of N integers where N >= 2, we can slice it up in slices of 2 and 3 numbers. Let’s call these slices S1, S2, … Sk.
Now consider the average of all elements in A. Let’s call this average AvA. The sum of all numbers in A can be calculated as N * AvA.
Now let’s consider the average of all slices that we have created. Let’s call them AvS1, AvS2, … AvSk. What this problem is essentially saying is that AvA cannot be smaller than Min(AvS1, AvS2, … , AvSk).
Let’s assume for a moment that AvA is smaller than the smallest average of all the slices. Let’s call that smallest average MinAvS. Now think of the sum of each slice. Since the average of the slice is bigger or equal than MinAvS (we picked MinAvS because it was the smallest one of all, the sum of the numbers in the slice is bigger than 2 * MinAvS or 3 * MinAvS (depending on how many numbers in the slice). Now, if you add all slices together (and that would be our initial array A that has N integers), the result would be bigger or equal to N * MinAvS, right? But the result is N * AvA, as we have already established and since MinAvS is bigger that AvA we get to an impossible situation hence our assumption that AvA is smaller than the smallest average of all the slices is impossible.
Here is my C# solution that scores 100% on codility:
Hi Sheng, that’s one hell of a great take. I’m having doubts about your proof though.
I get the 1st part of the proof: If a slice S of any length holds the global minimum average MA, than no other slice can have an average less than that. Ergo every split of S would yield sub-slices with averages >= MA. So far so good.
But your further assumption is (quote) “If the sub-slices have different averages, some of them will have smaller average than MA” (why?), from which you conclude that all sub-slices of S must have the same average, MA.
And here’s what bothers me: Since S contains its every sub-slice, the correlation between S and it’s sub-slices is independent of array elements outside of S, and therefore independent of whether the average of S is or isn’t a global minimum within its broader context. Hence your postulate implies that the average of every sequence must be equal to that of any sub-sequence of it, which is clearly not the case.
On the other hand, your solution is clearly correct (it does score a perfect 100% on codility), so your postulate must be correct, too. I therefore assume that it’s not the postulate, but the proof/explanation that is flawed / misleading.
I would very much like to understand the solution completely though. Care to give the proof an alternate take?
Hello ZOC, your question is great! For the “If the sub-slices have different averages, some of them will have smaller average than MA”, please refer to the newly updated part of the post.
PS:
The correlation between S and it’s sub-slices is independent of array elements outside of S: right
The correlation between S and it’s sub-slices is independent of whether the average of S is or isn’t a global minimum within its broader context: agree
IF we ONLY consider “the correlation between S and it’s sub-slices”, we CANNOT get the statement. “average of every sequence must be equal to that of any sub-sequence of it”. Correct.
BUT, when we consider “the correlation between S and it’s sub-slices” UNDER the presumption “S is the slice with minimal average”, we CAN get the statement. “average of every sequence must be equal to that of any sub-sequence of it”. See the updated post for more details please.
But, one thing I can’t understand is, how to prove that this method can return the smallest starting point among all possible slices with minimal average?
If you can understand the logic to find the min-avg slices with 2+ elements, the “smallest starting point among all possible slices with minimal average” is exactly the same as: find the position of the first (if there are 2+) smallest element. In line #8 and #12, the comparison is <, rather than <=.
But how about this case: there are 4 elements totally, and the first two or first three elements don’t have the minimal average, but the first four can have minimal average, and may be the second and the third element together can result in minimal average. I mean, how to prove that above case I mentioned is not possible?
It’s impossible. Please refer the statement ” (2) And all the longer slices with minimal average are built up with these 2-element and/or 3-element small slices.” and its proof.
In your specific case, there are 4 elements totally, and the average of 4 elements is the minimal average. The we can divide the array into two subslices: [first, second], [third, fourth]. And it MUST be true: avg(4 elements) = avg([first, second]) = avg([third, fourth]) (Proof is available in section c).
Our algorithm will find the position correctly with your case.
Got it. Thank you so much!
No problem. Enjoy it!
This is my solution
I don’t really see why we need to check for slices with both 2 and 3 items? I think checking for slices of 2 items is enough. Can any one give me an example that require checking a slice of 3 items?
Try [-6, 0, -6].
My solution in Go: http://pastebin.com/eSkQKjLw
Java 100%
Please refer to “Guideline for Comments” (in the right column) to post code. Thanks!
Can somebody tell me why is this not 100%?
http://pastebin.com/Qe6tffka
You should update “score3” in each iteration.
Hi Sheng.. I have a small query.. just by looking at this array [2,1,0,1,-100,1,2,3,-50].. the answer should be 4.. but your code is returning 3… I hope the test case is valid.
Why should the answer be 4? I think my code is correct, and the answer is 3 ([1, -100]).
This post was such a great help — thanks Sheng! This was a really hard problem for me to solve – much harder than any of the previous ones I tried. Here’s my C++ solution:
You are welcome! Enjoy coding!
Can’t say the proof is perfect. Just personal thoughts and deductions. However, it satisfied the Codility. There’re 4 element slice such as [-5, 2, 3, -5], which can easily prove we need to go further. Maybe somebody can give a example of 10 elements slice. But that’s not a problem. we can consider all all the slices contains 2, 3, 4,5,6 … 10 elements. because the time complexity still is O(N) unless we consider about all N elements slice. So it’s not a perfect proved theory but we can handle most situations in the world and that’s how we do things everyday. 🙂
I actually considered the 4-element slice in my proof #c . And I did not see any solid statements in your comment, that can say my proof is not correct.
For example, with input [-5, 2, 3, -5], the result slice should be 2-element slice [-5, 2]. Please correct me if I was wrong.
I m a little confuse about this example.
-5, 2 —> -3/2 = -1.5
-5,2,3,-5 —> -5/4 = -1.25
so is the slice of 4 element not minimal for this example?
Been practising all these problems all day .. could be my brain burning out and not seeing the obvious.
thanks
hrm, -1.5 < -1.25. So no, the slice of 4 elements is not minimal.
Hi guys. I noticed that none of you care about making two divisions per iteration when you could avoid them and use more additions instead. I looked for the smallest sum over two consecutive elements (sum2 in my code) and for the smallest sum over three consecutive elements (sum3 in my code). It’s only at the very end that I compute the matching averages.
Your solution has the same big O complexity. However, it does improve the practical performance.
For those who can’t figure out the “average of 2 or average of 3” theorem, here goes another approach. Well, I didn’t figure out the “theorem” at the first time so there is no way I can discover it this time either. Here goes my second passed solution without using the “theorem”…
Again, I don’t believe most people could recognize this “theorem” in a 30 minutes interview… or, I am too dummy…
Update by micropentium6: The boundary condition is interesting here (Admin: line #13). Both > and >= will pass codility test. They both return the correct answer, however you can tell the actual min avg at certain given index could be wrong. Surprisingly, the returned indexes remain correct…
A little too tricky to be an interview question, IMO.
My solution is very similar, but I used a helper function to find the averages. That way I can just focus on the logic used to find all the windows.
Hi. Your solution returns 2 for this test case [-1, -2, -3, -4, 1, 2]. It is true that slice [2, 4] has the global Minimum Average but also slice [1,3] has the global minimum average. And the problem says that we must return the starting point of the slice with the global minimum average. If there are more than one slice, which is a case in this example, than theoretically even philosophically must not we return the leftest slice which has the global minimum average? Why we must return the index of the slice which has the global MA and the length of 2 or 3? Isn’t this mathematical “injustice” to not to return the slice which has the global average and is appeared before the one you are returning.
I would like to figure out a few errors in your comment:
1. For [-1, -2, -3, -4, 1, 2], [1, 3] is NOT holding the global minimum average.
2. My whole article is trying to prove that MA exists in 2- or 3-item slice. Please read before posting.
3. Please double read the problem description: If there is more than one slice with a minimal average, you should return the smallest starting position of such a slice.
Hi Sheng! Thank you very much for this blog and your explanations about Math and algorithms.. I’m learning a lot.
I figured some kind of suffix_sum for this exercise but it has some major flaws (60%).
It’s failing several inputs.
I am unable to see what those flaws are and codility wont tell me which are the inputs I fail.
While I understood the magic of your reasoning, do you think you can help me to spot some of the flaws of mine?
It’s in php but i guess it’s simple enough..
PS: So simple it’s plain wrong :/..
Hi Sourcecer! Actually you can get which cases did your solution failed. Simply add one line at the very beginning of the function:
Then in the result page, click the black solid triangle and you will see the input.
Thanks for a great write up. A quick thing to point out – I think line 18 is unnecessary in your code.
Yes, you are totally right. It was for debug, and I forgot to remove it 🙂
Hey, Sheng, I came to such a proof. Is it right?
As far as I see, it’s correct.
Thanks. I found your approach very useful, the way you’ve built the inequalities and debunked them in a very simple way was the last thing I needed to understand the solution.
Hi mate, good work congrats.
Not sure if shared by someone else but here is my approach without 2 or 3 items max assumption.
Javascript 100% solution
https://app.codility.com/demo/results/trainingWDTNV3-CDX/
https://github.com/jiqsaw/code-challenges/blob/master/src/min-avg-two-slice.ts
This way seemed to me more proper although I see you have less code lines however to prove and believe in that assumption takes longer than solving the problem.
Basically any set of numbers 4 and above has to have a slice either equal to or below the average of those 4 numbers. I don’t understand the whole MA speak. But if you have 4 numbers and the overall avg is 4 and the first 2 average to 4 the second 2 have to average to 4 as well or it would cause an overall average of something other then 4. With that being said…if one of the sub splices are higher then the average the other HAS to be lower to maintain the overall average. if the first 2 averages to say 8, then the second set of 2 has to be lower then the average of all 4 which was 4.
Lets do a slice of 3. If the first 3 average to 4 like the overall average the last number has to be 4 for the average to still be 4. But if the single number is lets say 5(one digit over avg) then the other 3 numbers have to equal 11 which when averaged is less then 4.
With that being said when you split things into subsplices you will ALWAYS end up with atleast one below or at the overall average. Just as you will always have one above or at the overall average. As you break things into subsplices and subsplices 2 and 3 item splices are the smallest subsplices you can achieve. Because of this you will never have a splice that would out do atleast one sub splice…the closest you get is even and at that point the full splice of 4+ is mute.
Atleast that’s how I understand it in laymans terms. Correct me if I am wrong.
Hi. From my point of view, you can solve this task if you note one thing: you must check slices of size<4, which means size can be 2 or 3. I came to that rule from a little different reasoning, maybe wrong:
1. Initial slice of minimal size consist of 2 elements.
Question: How many elements you can add to initial slice
a) achieving lesser value of new, created that way, slice;
b) ensuring that these added elements do not contain another slice of value lesser than of slice created by adding.
2. Adding one element.
From simple mathematics: from (v12)/2 v3 < v12/2
In words, if third element is lesser than value of original slice, than adding it create another slice with lesser value than value of original slice.
3. Adding two elements
Again from simple mathematics you have condition that in order to obtain a new slice (consists of elements 1234) with lesser value than slice of elements 12 you get: v34 < v12.
In words, if this operation is to be effective, means you will obtain slice of four elements with value lesser than original slice with two elements, than these additional two elements must have sum of their values lesser than sum of original two elements.
But this breaks condition 1b), means you have another slice of elements 3 and 4 with value lesser than slice consists of elements 1 and 2. So operation of adding two more elements to slice of two, can't fulfill condition of solution.
4. Adding more elements has no sense, because it is always case that slice value of added elements has to be lesser than slice value of original elements and (of course) slice value of sum original and added elements.
Conclusion?
In order to ensure that adding to slice of two elements will give us a new slice with lesser value and in the same moment, there will be no another slice of even lesser value among added elements, can be achieve only by adding ONE element of value lesser than value of original slice.
Hoѡdy, i red your bloɡ occasionally and і own a similar one aand i was just wondering if
you get a lot of sрam comments? If so how do you stop it, any plugin or anything you can advise?
I get so much lately it’s diving me insane so any help is verry much appreciated.
I would suggest reCaptcha. It blocks most of the spam comments.
Hi Sheng, thank you very much for writing this detailed explanation.
Lets consider the following array [0, 0, 0, -1].
If I understand the question correctly, the answer should be 0 with average of -0.25.
And if I understand your answer correctly, it’ll return 1 with average of -0.3333333.
Am I missing something? Thanks in advance.
my bad. just realized after writing the comment that -0.333333 < -0.25. your answer is perfect. thank you!
Hi,
I’m sorry if I didn’t understand something, but if I take the following :
[200,-50,-100,4,2,-100,100]
The smaller average will be for the group of 4 elements beginning at index 2 : [-100,4,2,-100] => -97
If I consider only 2 and 3 elements groups, I’ll keep the group at index 1 [-50, -100] => -75 that is incorrect.
Did I miss something?
Thanks for your help 🙂
oops, didn’t say nothing
My solution in C++:
So you’ve read all the way down this far into the comments? Then you’re probably still wondering why the slice with minimal average has 2 or 3 elements, as the “proof” given in this article isn’t very good.
This made it all clear to me
https://stackoverflow.com/a/63024500/2780179
I’m not sure if the Codility changed the question assumptions since this discussion took place, but as of now it has the following assumptions:
N is an integer within the range [2..100,000];
each element of array A is an integer within the range [−10,000..10,000].
This algorithm considering only slices of 2 and 3 elements may work if the array is composed of elements greater or equal 0, but it will easily fail on the following simple array:
[1, 2, 3, 4, -11, 1]
For which the MA is -0.2 on the slice (0,4), which is a 5 element slice. Am I missing something?
How about slice [-11, 1]? Its average is -5.
Hi Sheng, thanks so much for your help!
I have come up with similar ideas like you, however, I have just restricted the testing window to be two slices, and I am having difficulty understanding why we care about 3 elements slices at all.
I appreciate that any legal sub-slice of a MA slice must have the same average, and the minimum sub-slice can be either 2 or 3, depending on the data. But a 3 elements MA candidate can be further breakdown into (2,1) or (1,2), and the slice 2 and 1 will have at one greater or smaller than the average of 3.
i.e. in the example [1, 2, 1, 2, 1.5, 3, 4, 5]. we have abig MA slice [1, 2, 1, 2, 1.5] with 2 slice minimum candidate being [1,2] and three slice minimum candidate begin [1, 2, 1.5]. if we cut [1, 2, 1.5] we have (1.5, 1.5) or (1.5,1.5). Does this meaning that if just run a 2 sliced window search, that would makes our program sufficient. I am sorry I really cannot produce a secarino where not making 3 element wise window search produces miss counting.
Thank you so much and hope to hear back from you!
BW,
Rundong
By question description: “the slice contains at least two elements”. Therefore, we cannot do like: “a 3 elements MA candidate can be further breakdown into (2,1) or (1,2),”
Javascript 100%
My JavaScript 100% version. I noticed we only need a call to findMinAverage once, if we use pointers for left and right index.
Ok, I got the idea.
But is there a mathematical hypothesis behind it or it was an insight?
Python solution- 100%. Detected time complexity – O(N). Even before looking at the solution, the proof that global minimum average lies in slices with 2 or 3 elements, was found by pattern recognition. The questions wants to know the position of start of slice, where the average is minimum.
Let us assume a hypothetical situation, where we find a global minimum average in slice, S4 which contains 4 elements. In every such case, there must exist a slice S2 or S3 with 2 or 3 elements respectively, where S2_min/S3_min <= S4_min.
Instead of taking random numbers, let's start with variables, to prove this case. For an average to be present within a slice, S, it must contain integers which are decreasing in nature or constant nature. Example of constant, S = [X,X,X,X]. In this case it doesn't matter where you slice, the average is constant = X. Although, a minimum average cannot be present in case where there are more than two identical integers in row, because the function will return ambiguous values. Example of decreasing nature, S = [X,X-2,X-4,X-6]. In this case, with 4 elements average is X-3, whereas with just two elements average is X-5.
Inspired by Sheng’s great solution, I wrote the following proof:
Assume you found the slice with minimal average. Assume the first two consecutive elements of the slice have higher average. If you take out (any) elements with higher average from the (or any) slice, then your reduced slice now has lower average (This is intuitive but should be proven. See proof below). Thus, having first two consecutive elements with higher average (to take out) contradicts the initial assumption that you found the slice with minimal average.
Intermediate conclusion:
As long as you can take out two consecutive first elements, then either their average is lower or equal to the slice’s, or the reduced slice’s average is lower, which contradicts the initial assumption that you found the slice with minimal average.
It is given that a slice must contain at least 2 elements. Thus, a slice from which you can take out first two consecutive elements has more than three elements.
Conclusion:
For any slice with the minimal average, it is either of size 3, or there is a slice of size 2 with that minimal average.
S0 = (a0+a1+a2+…an)/n
S1 = (a2+…an)/(n-2)
S2 = (a0+a1)/2
==> S2 = ((n)S0-(n-2)S1)/2
S2 > S0 ==> ((n)S0-(n-2)S1)/2 > S0 ==> S1 < S0.