Solution to Min-Avg-Two-Slice by codility

6 Feb

Question: https://codility.com/demo/take-sample-test/min_avg_two_slice

Question Name: MinAvgTwoSlice

The key to solve this task is these two patterns:  

(1) There must be some slices, with length of two or three, having the minimal average value among all the slices.
(2) And all the longer slices with minimal average are built up with these 2-element and/or 3-element small slices.

Firstly we will prove the statement (1). In all the following discussion, we assume the slices have two or more element. In every array, there must be at lease one slice, to say S, having the Minimal Average (MA). And PLEASE PAY ATTENTION, the following proof is done with the S, which HAS the global minimal average.

  1. If the length of S is two or three, it follows our conclusion.
  2. If the length of S is odd, we could divide it into a 3-element sub-slice and some 2-element sub-slice. For example, for the slice [1, 2, 3, 4, 5], we could get a 3-element sub-slice [1, 2, 3] and a 2-element sub-slice [4, 5]. Or differently we could get [1, 2] and [3, 4, 5]. But the split does not matter in the following prove.
    • If the sub-slices have different averages, some of them will have smaller average than MA. But it conflicts with the condition that, the MA is known as the global minimal average for all slices.  By this contradictory, it’s proved that all the sub-slices MUST have the same average.
    • If all the sub-slices have the same average, the average of them must be MA. So all these sub-slices have overall minimal average and length of two or three.
  3. If the length of S is even, we could divide it into some 2-element sub-slice. For the slice [1, 2, 3, 4], the only possible split is [1, 2] and [3, 4]. Then, similar with the previous case, all these 2-element sub-slices have the global minimal average.

And in the construction in step b, we have already proven the statement (2).

UPDATE 03-13-2014: We are NOT proving that, 4-or-more-element slices cannot have the global minimal average. We just proved that, there MUST be some 2-element and/or 3-element slices, having the global minimal average. In other words, we are NOT SURE whether there are some 4-or-more-element slices holding global minimal average. But we are ONE HUNDRED PERCENT SURE about the 2-element and/or 3-element slices.

UPDATE 03-02-2015: There is another excellent discussion in the comments by Kim. Thanks!

UPDATE 03-14-2016: If you had any question about the statement “If the sub-slices have different averages, some of them will have smaller average than MA”, please read this comment and this comment. Thanks!

151 Replies to “Solution to Min-Avg-Two-Slice by codility

  1. Hi, I wrote a C version of answer according to your python version, however, I only got score of 90 (one test failed), while yours got 100. I cannot see any difference between my C code and your python code. could you help me? here is my code: . Thanks.

    • It is because of the accuracy of float number. Change the “float minavg;” to “double minavg;”. And you will get 100. During the computation of float number and/or double number, the computer will use “double” type, and then convert it into float if necessary. So when you compared the “minavg>(A[i]+A[i+1])/2.0”, (A[i]+A[i+1])/2.0 is double, while minavg WAS converted from double into float and IS converting again into double for comparison. Some accuracy is lost in these conversions.

      Otherwise, you could use a temporary float variable, to say min_temp, to store the value of (A[i]+A[i+1])/2.0 and (A[i]+A[i+1]+A[i+2])/3.0. Then you could compare the min_temp and minavg. You also will get everything right.

      Hope it would be helpful! And your comment is great motivation for me to continue this blog!

  2. Thanks for the insight!

    I would never reached the conclusion that the array most have 2-or-3-sized slices with MA. Your proof nailed it, great thinking!

  3. The only thing that bothers me is that this exercise is connected with Prefix Sums…
    You’re not using them at all and with your assumptions greatly simplifies the problem.

    • Yes, the solution does nothing with the prefix sums. And I have no idea, how it could be applied here. I am also bothered by this issue. If someday you find the prefix sums solution, please do tell me!

      • I guess the implication from the Codility question is to use a prefix sum to do fast average calculation. So, if you have an input vector of:

        input = [ 1,2,3,4,5,6,7,8,9,10 ]

        and a prefix sum vector:

        prefix = [ 1,3,6,10,15,21,28,36,45,55 ]

        You can calculate averages over arbitrary ranges very quickly. So, if you want to calculate the average of the slice from index i to j, its just a matter of:

        (prefix[j]-prefix[i]) / (j-i)+1

        However, your solution is much better 🙂 and doesn’t need to waste O(n) in space plus another iteration through the data. Great work Sheng!

  4. “If the sub-slices have different averages, some of them will have smaller average than MA. But it conflicts with the condition that, the MA is known as the global minimal average for all slices. By this contradictory, it’s proved that all the sub-slices MUST have the same average.”

    I’m confused. Isn’t possible to have different averages for different sub-slices?

    For instance:
    [1, 2, 3, 4, 5]

    [1, 2, 3] -> 2 Average
    [4, 5] -> 4.5 Average

    [1, 2] -> 1.5 Average
    [3, 4, 5] -> 4 Average

    These are all different. They don’t have the same average.

    How it’s proven that isn’t needed to compare the sub-slices with four or five elements?

    Can you try clarify a bit? (sorry for not understanding)

    • Hello, thanks for visiting my blog! When we consider “the sub-slices have different averages”, we are splitting the S, which HAS the global Minimal Average, into sub-slice. But in my bad example, [1, 2, 3, 4, 5] does not hold the global minimal average. A better example would be [1, 2, 1, 2, 1.5, 3, 4, 5]. The slice [1, 2, 1.5, 3, 4, 5] has sub-slices with different averages. But for [1, 2, 1, 2, 1.5], there must be AT LEAST one split, which makes every sub-slices with same average. In other words, it is possible to have different averages for different sub-slices of one slice. But for the slice with global minimal average, it MUST have AT LEAST one set of sub-slices with same average.

      • But for [1, 2, 1, 2, 1.5], there must be AT LEAST one split, which makes every sub-slices with same average.

        Therefore “global minimal average” is 1.5?

        But what about slice: 1, 2, 1 -> avg = 1.3

        • For [1, 2, 1, 2, 1.5], there is one and only one split [1 , 2 , 1], who HAS the global minimal average. And for THIS split [1, 2,, 1], it ONLY has one sub-slice [1, ,2, 1] (we cannot divide the [1, 2, 1] into smaller ones with the challenge restriction.).

          So for [1, 2, 1], every sub-slices is with same average.

          “there must be AT LEAST one split (to say SPLIT), which makes every sub-slices with same average.” is a necessary condition, NOT a sufficient condition on “SPLIT holds the global minimal average”.

          So even for slice [1, 2, 1, 2], all of its sub-slices hold the same average. BUT it’s not enought to get the conclusion that [1, 2, 1, 2] hold the global minimal average.

  5. I do not understand why there must be at least one set of sub-slices with the same average..

    Could you explain this more? Why this is a must? Is there a proof that if such situation does not exist (there is split which makes all slices the same), then this bigger slice (the one that we want to split into 2-3 slices) does not have MA

    • Ok after writing this question I understood how silly it was. Right now I realized that this proof is about, that every slice which has MA, needs to be splitted to slices that are having same average, and also have to be size 2 or 3.

      It’s hard to believe sometimes how blind you can be until you ask a stupid question.

      Thank you for this blog, I truly envy you for finding this solution, and describing good way to prove it is correct. Big KUDOS for you.

  6. You can spare the use of floating point division also with a simple math trick:
    Instead of comparing the averages (A[i]+A[i+1])/2 and (A[i]+A[i+1]+A[i+2])/3 you can multiply both expressions times 6 and get 3*(A[i]+A[i+1]) and 2*(A[i]+A[i+1]+A[i+2]). Then you would similarly get the minimum of this “average*6”.

    • Sorry for the unimportant comment, ;), it´s just that I am in electronic engineering and care about optimization, always faster to work with integers I think

      • No, your comments are quite valuable. In my own answer, I also tried your method. But I thought, it will make the solution harder to understand. Thus I excluded it from this post. But in my own coding, I am willing to learn any possible optimization. Thanks!

  7. I think your proof is deeply flawed. The split does matter a lot when constructing sub-slices.

    Consider the input [8, 0, 0, 8]. Clearly the total average is 4 and the average of both halfs [8, 0] and [0, 8] is 4 too. But the minimum is 0, obtained from the two-element slice in the middle.

    Also consider [10, 0, 6, 0] where the average is 4 and the two-element averages are 5, 3 and 3. Yet the minimal average is 2 obtained by [0, 6, 0].

    While I disagree with your proof I agree with statement (1): Checking pairs and triples suffices.

    • There is a pre-condition, when we consider the split: the to-split slice HAS the global minimal average. Please notice, all the proof is done under such condition.

      • Yes but it is the precondition of your demostration. It is not the condition of the problem.
        With input input [8, 0, 0, 8].
        Your code do an evaluation of the split [0,0 ] and [0,0,8]
        in your code the variable index is increased one by one step , It is not two by two, or by tree steps. regards

        • Yes, it is not the condition of the problem, BECAUSE it is an inherent inevitability of the problem. Every array MUST have a slice, holding the global minimal average. And our proof is done with such slice.

          Actually, my code tried every 2-or-3-element slices: [8, 0], [8, 0, 0], [0, 0], [0, 0, 8], and [0, 8].

  8. Hi Sheng, I am sorry, but I am struggling to grasp the reasoning behind your solution: you only consider slices composed of 2 or 3 elements, why not bigger? How do you know that bigger slices won’t have a smaller average?

    thank you, sorry if the question is silly.

    • That is fine. It is actually a tricky question. If any bigger slice has the GLOBAL mininal average, it MUST be able to be split into 2-element and/or 3-element sub-slices with the same and global mininal average. See the (b) and (c) parts for the proof.

      • Ok, now that I revised it, it makes, sense. It’s just I was not expecting to have to do mathematical hypothesis or such, for this test. What I was wondering (linked to my secon question) is: they usually expalain a topic and then have you performing tests about them, why can’t I find solutions about it that do that?
        I mean, your solution is brilliant, the thing is…it’s too much brilliant! Clearly it is good, thinking outside the box and all, my problem is: what is the solution they had in mind when they posed the question?
        The stackoverflow answer doesn’t seem like it, or my skills in JavaScript are too little to truly understand it?

        • Yes, the solution on stackoverflow did not use a raw prefix sum. But the similar idea, and from the end to the front. Actually I did not find a solution with exactly prefix sums for this question.

      • So, how do we know there isn’t a bigger slice with a lower index?
        I can’t find an example, so I’m pretty sure it’s right but I’m curious as to why.
        Basically, imagine a 4 element slice with the global MA. The code only works if the FIRST 2 or 3 elements of that 4 element slice is the global MA. If, for example the last 2 elements was the glocal MA. Then you wouldn’t be returning the lowest possible index. How do we know that’s always true?

        • It’s case #c in my proof.

          If a 4 element slice (to say, [A, B, C, D]) with the global MA, we can split the slice into two sub-slices as [A, B] and [C, D]. And BOTH [A, B] and [C, D] MUST be with the global MA.

  9. One more thing, sorry for double comment: this demo is inserted in the Prefix Sums chapter, though I cannot seem to find a solution with Prefix Sums in O(N) time complexity. Based on what I can see here, you do not make use of prefix sums, am I wrong?

      • According to the author, that solution uses dynamic programming, which is Lesson_9, and does not use prefix sums. It is too early to use it for Lesson_5, so the question remains open.

  10. Hi, I’d like to suggest another proof that it’s enough to examine pairs and triplets. I’d be glad to hear if you find it correct and reasonable.

    Claim: Any slice of four consecutive elements always contains a pair that has a lower (or equal) average, than the average of the whole (4 elements) slice..
    Proof by contradiction:
    let’s assume that there exists such a 4-elements-slice whose average is smaller than any of the averages of its constituent pairs (2-elements slices).
    We then (falsely) assume that: “there exists a1,a2,a3,a4 such that (a1 + a2 + a3 + a4)/4 < (a{j} + a{j+1})/2, for every j=1,2,3".
    Now lets examine the consequences.
    taking j=1, we find (a1 + a2 + a3 + a4)/4 < (a1 + a2)/2 ==> a3 + a4 < a1 + a2.
    and taking j=3, we find a1 + a2 < a3 + a4.
    So that, we got (a1 + a2 + a3 + a4)/4 < (a{j} + a{j+1})/2, for *any* j=1,2,3″ is false. In other words – For any set of a1,a2,a3,a4 (i.e., any existing 4-elements slice), there exists a pair of elements (defined above by j) whose average is either smaller or equal to the average of the 4-elements slice”.

    So any 4-elements slice can be decomposed to pairs of which one has a smaller (or equal) average. This also applies for any longer slice, since it can be seen as being composed of 4-elements sub-slices, which in turn contain some pairs with smaller averages.
    Triple-elements slices can’t be similarly decomposed, as is seen for example in {1,9,1}.
    This means we must work with pairs and triplets.

    • Thanks for your great proof! But I am not clear about this point “This also applies for any longer slice”. What if it is a 5-elements slice? How could we divide it?

  11. Great inspiration Sheng. Very glad I have stumbled on your blog.
    Unrelated to this problem may I ask, what would you recommend to do in order to improve one’s algorithmic problem solving skills? I’ve tried to read about popular algorithms but it doesn’t seem to really help on “codility like” problems. Any suggestions?

    • It is hard to answer this question. To be honest, I am not excellent at these challenges in Codility. And for this problem, I spent more than one week, and asked the help from my friends. It is not an easy job.

      The knowledge from books is important. But it is far from enough. You have to keep pratice to gain more experience and improve your skills. After solving this problem, all can see the underlying issue is easy and basic: try all 2-elements and 3-elements slices. Most, if not all, of software developers can solve it in minitues with O(N). And the textbook could teach us the slices algorithms.

      But why it takes us some much timet to get the right answer? Because we failed to abstract the challenges into the known data structure and algorithms, in most cases (Some challenges are really hard, and out of this discussion). The textbook will never be able to teach us how to do this job. In my opinion, practice experience is much more helpful in this field than books.

      Hope it not be misleading! Enjoy coding!

  12. Hi Sheng,

    Thanks for the info on using 2 or 3 slices. That was very helpful! Here’s one solution that would work (100% on Codility) and uses Prefix Sums:

  13. Does your solution work if all the values in the array are negative? I dont think so. Because the slice with least average is the whole array and not any 2 or 3 slice.

    • It should work well. For example, with the input [-1, -100, -100], the slice with least average is [-100, -100], not [-1, -100, -100].

  14. I think you are missing something, too:
    >each element of array A is an integer within the range [−10,000..10,000].

    So, what if [4,-10,5,10,-5,8] ?
    We are looking for the 4-slice [-10,5,10,-5] MA=0.
    But [-10,5,10] MA = 1.67; [5,10,-5] MA=3.33; [5,10] MA=7.5 and so on.
    (Used easy numbers for demonstration.)

    So this contradicts your starting assumptions on 2/3-slices.

    And I think there is no linear solution, you can never be sure on local criteria.
    At least nlogn, but I have not thought about that.

    I really don’t like this tests. They use it for recruitment here … “forget CVs, forget reports” … it somehow sucks.

    • In this example [4,-10,5,10,-5,8] , the two-length-slice [4, -10] hold the minimal avg. So my 2/3-slice conclusion still works

      • Consider this input: int A[] = {-1, -2, -3, -4, -5, -6, 11, -42, 200, 0, -10, -10, -10, -10, -10, -1000};
        Is your algorithm still going to give correct answer? index 9, i.e. A[9] is correct answer for the original question.

      • I interpreted the word ‘minimal’ to mean smallest or “closest to 0”, so that in my solution all my comparisons were made using absolute values. Using this interpretation, which I admit may not be the one Codility intended, means an average of 0 from the slice [-10,5,10,-5] is more minimal than an average of -3 from the slice [4,-10]. Hence I can understand Ron’s point that the answer is in a slice of size 4 rather is only one of size 2 or 3.

        • I also had the same difficulty interpreting the word ‘minimal’. I think this point should be explained more clearly, or showing an example with both positive and negative values, where the most negative average value would be the ‘minimal’ value.

  15. thank you sheng for your explanation…… i read all, Am a new programmer. So please explain the question, i didnt understand.. Will you.

    • The solution is kind of a math question, rather than a programming challenge. Have a paper and a pen, then try to follow my proof.

      PS: read the content, and also please read the other comments. They are helpful.

  16. Hi Sheng, Your Code is simply awesome, Thanks for the help. Also a small intro abt myself, I am a masters student doing MS in CS at Northeastern University. I have coding challenge at Codility for a Compny “Cloudera”. Do you have any advise or preparation materials !!!! It would be of great help!!!

    Thanking you in advance.

  17. without using floats this gives 100% in C (you can convert it in python 🙂 )

  18. Despite all the effort, I failed to solve this problem. Only after I saw the article, I could finally realize that this was not a coding problem, but a mathematical one. I was deeply sad about this, but did not want to stop the challenge. So below is my own interpretation of the problem (or excuse for the failure I have made.)

    So basically the problem is to prove next statement.

    Let len(s) be the length of a slice s, sum(s) the sum of the all elements of the slice s, and ave(s) the average of the slice s. Then for an arbitrary array, every slice s having len(s) > 3 contains a sub-slice s’ such that ave(s) >= ave(s’).

    Proof.

    Suppose that s is a slice having len(s) > 3 and does NOT contain a sub-slice s’ such that ave(s) >= ave(s’). Since len(s) > 3, s can be divided into sub-slices t and u.

    Then,

    ave(t) = sum(t) / len(t), and ave(u) = sum(u) / len(u).

    ave(s) = sum(s) / len(s)
    = [sum(t) + sum(u)] / [len(t) + len(u)]
    = [len(t) * ave(t) + len(u) * ave(u)] / [len(t) + len(u)].

    If ave(u) >= ave(t) then let s’ be t

    ave(s) = [len(t) * ave(t) + len(u) * ave(u)] / [len(t) + len(u)]
    >= [[len(t) + len(u)] * ave(t)] / [len(t) + len(u)]
    = ave(t)
    = ave(s’).

    Otherwise, if ave(t) >= ave(u) then let s’ be u

    ave(s) = [len(t) * ave(t) + len(u) * ave(u)] / [len(t) + len(u)]
    >= [[len(t) + len(u)] * ave(u)] / [len(t) + len(u)]
    = ave(u)
    = ave(s’).

    This leads a contradiction, and completes the proof.

    ps. After writing this note, I saw a counter example [8, 0, 0, 8] as Hunter2 indicated. A workaround of this is to permit a slice having length of one, although the problem description defines a slice to contain at least two elements. I guess that doing this does not harm the soundness of the proof. The point is to ignore all slices having the length of 4 or more, not to consider the slice having length of one. For example, you may argue that [0, 8, 1, 1] can be splitted into [0] and [8, 1, 1], but we can simply ignore [0] and take [1, 1] instead.

    • Updated:

      Uncomfortable with the postscript I wrote, I thought a little more on the proof, and figured out that the postscript above was absolutely NOT necessary. As a matter of fact, at first time I was confused with [8, 0, 0, 8] as Hunter2 indicated and introduced a slice having length of one. But the real point was in a different place.

      In the example slice [8, 0, 0, 8] what we want to claim IS that the average of [8, 0, 0, 8], which is 4, is greater than or equals the average of [8, 0] or [0, 8], which is also 4. Hence [8, 0, 0, 8] can be divided into sub-slices having the same or lower average value. It is important that we do NOT claim that the minimal average slice is [8, 0] or [0, 8]. Finding the minimal average slice is another job to do later and that job is to examine slices of size 2 or 3.

      Finally thanks to Sheng for the insight. I could never imagine the idea behind the solution without his comment.

      • One more thing.

        When I saw the claim that examining 2-3 slices is enough, what struck immediately in my mind was the mathematical induction. Probably most of us have the experience of proving next statement, that is;

        every integer greater than 3 can be expressed as a linear combination of 2 and/or 3.

        Then I realized that the same principal can be applied to this problem. How stupid I am! Rest of the problem was not so hard. In fact, I did not even try to code, because sometimes if once a solution is found, it is not much meaningful to code it.

        However I feel that I’m fortunate since I still have the textbook of discrete mathematics. I should revisit it once more.

        Lastly, I guess that I talked too much alone at this time. Sorry for your inconvenience. Enjoy happy coding!

  19. Hi Shen,

    One of the perf tests is failing:
    WRONG ANSWER
    got 79 expected 0

    Can you tell me why?
    Below is the ruby version of your code.

    • Please consider the test case [-1, -1, 0, 0, -1, -1, -1]. When there are two slice with the same average, the problem require us to output the first one. But your solution did not check the index of the same-average slices.

      I never used the Ruby before. So the following solution only demonstrates how to fix the bug. You could optimize it latter.

  20. Following is a C++ solution that doesn’t check for slice of 3.0 specifically but just keeps track of the previous values.

    • 2 and 3 elem. C++ implementation gives wrong answer in medium_range test, so I like your algorithm, it works better than combinations of 2 and 3 elements, because you check 2 and 2,3,4,5… elements. Thanks!
      Implementation with floats instead of doubles and 10001 instead of DBL_MAX would work fine in this example as well.

      • IMO, SA’s solution is good enough to pass all the test. Using 10001 instead of DBL_MAX is a not-so-good practice.

        10001 can pass the test, but is essentially too small.

  21. Thank you for sharing this about the 2 and 3 slices. Without that I was not able to do it on O(N) time complexity.

    Ruby 100%

    • Hello a131a,
      I am a newbie in ruby. What do you mean in the comment
      #add a maximum value element to the array to skip the check of the last pair
      #it is not nice, but I am lazy 😉
      practically pls.

  22. Hi shang,

    I think your solution is just valid for your assumptions about slides of 2 and 3 element size. But I do not see any assumption regarding the size of the slice on the codility exercise. Even they take this slice formed with 4 elements:
    A= [4, 2, 2, 5, 1, 5, 8]
    A(1, 4), which have an average of (2 + 2 + 5 + 1) / 4 = 2.5

    So if we have an array like this:
    A=[0, 8, 0, 0, -8, 0]
    the slice with the min avg is:
    A(1,4)=0
    and your code returns:3
    How could this result be valid?

    I enjoy reading your post solutions! They are well explained!

    Thanks.

    Cheers.

    • First of all, thanks for your visiting! My solution is solely based upon the assumptions about slides of 2 and 3 element size. But I also proved the correctness of my assumption in all cases.

      For A= [4, 2, 2, 5, 1, 5, 8], the slice [2, 2], whose length is two, is smaller than yours.

      For A=[0, 8, 0, 0, -8, 0], the min-avg-slice would be [0, -8] or [-8, 0], whose average is -4. My code will return 3, which is the start index of [0, -8].

      PS: please notice that, the return value is the index of the min-avg-slice, NOT the average value.

      PPS: enjoy it!

  23. Hi!
    I’m really impressed with your solution. I had really like 1.5 h of thinking about what you meant but finally I got the idea. This is really amazing blog and I am sure I will find much more useful content here. Thanks for sharing your solution ( i failed today with mine =) ) .
    Julek

    • Hi Julek!

      Thanks for visiting my blog! This challenge is really hard. I spent more than one week and found out the solution with my friends together.

      Enjoy coding!

      • Sure. It is just another example of Codility cheating, i.e. posing mathematical problems as programming ones. There is no way you could solve it in 30 min unless you are utter genius. They should rename it to Mathility IMHO.

  24. Hi Sheng, brilliant deduction on the slice sizes.

    Here’s a solution in Java using prefix-sums as expected by the cordiality test.

  25. Guys! After all these long discussions, how do you get a job offer with codility? I know my question is irrelevant to the post. But codility is used by a lot of companies to hire people. Usually there is a exam with 4 questions and two hours and you know that if you do not solve all of them you won’t get hired. No matter anyone ever gets hired. The employer looks for somebody who can solve all the problems no matter if there is no one on earth who can solve all the problems. When I see that people are discussing here for months to find a solution for only one of the basic problems of this maniac website, I find it more and more unreasonable to spend time taking codility exams!
    Do you agree we me? Do you think codility is resonable? What is your reaction when you want to get hired in a web development company developing for example tourism websites and you have to first fail on a two hour of bloody frogs and trees mathematical questions to proove that you are not eligible to be paid anywhere?

    • My two cents, the bottom line is: codility is fair to every single applicant. All applicants are trying to solve the same question. And your solutions show your ability, to some extend. The problem is how the employer use codility, rather than codility itself.

      • The questions in codility are for students that need to prov for employers that they really know how to go from requirements to an application using a programing language. In my case, I remember I was ok in solving these king of problems after finishing my university studies but today, after almost 10 years of working as developer in the “real world”, I’m useless.

        • It is actually the major disadvantage of Codility. It only focuses on the programming skills, while cannot cover all your experience and ability, especially the project experience.

  26. Thanks for taking a read. I’m working on trying to condense this as much as possible but I’m getting bad values on the extreme values and large sequence. On the extreme I’m getting 99998 expected 0 and the large I’m getting 99988 expected 3. Any idea what is going on?

    • When i is len(A) – 2, the three_chunk is:
      three_chunk = sum(A[i:i+3])/3.0 = sum(A[ len(A) – 2 : len(A) + 1])/3.0
      Essentially it is:
      three_chunk = sum(A[ len(A) – 2 : len(A) + 1])/3.0 = sum(A[ len(A) – 2 : len(A)])/3.0 = (A[len(A) -2] + A[len(A) – 1]) / 3.0
      You are dividing the last TWO items with 3.0

      Hope it helps.

  27. Adding another Java solution 100/100 using your technique and prefix sums:

  28. Hi Cheng and Noamo, wise solution and explanation respectively !
    After all this resumes to prove the reduction to the absurdity, math. I’ve also tried to prove for an array of 3 elements analyzing the two pair, no conclusion, so no absurdity on that :p
    More of the same, but now backwards 🙂

    missing the triplet part. cut and past doesn’t work properly…

  29. Amazing.
    Have been a trying to get it not to timeout for complex cases for a while.
    Next time I’ll try longer before googling it though.
    thank you 🙂

  30. I think I am just too numb to recognize 2-slice and 3-slice are the only possible options for a global minimal.

    So, I guess I went the hard way to solve it. First try only granted me 80%. The second run, well, would be useless in any real testing scenario, got 100%.

    I know this is not the optimal solution in terms of the logic. But it proves that you still can get it done even without discovering the patterns. LOL

  31. Sheng, congrats for your solution as it really seems is the best. I only want to know how to demonstrate the following assertion which I think is the key for the entire mathematical problem solving:
    “If the sub-slices have different averages, some of them will have smaller average than MA.”

    How can it be demonstrated that if the sub-slices have different averages, some of them will have smaller average then the average of the hole parent slice?

    Sorry if it’s a stupid question, but I’d like to understand it mathematically.

    • Hello Bogdan, sorry for this late response. Let S be the whole slice, and S1, S2, …, SN be the sub-slices such that S = S1 + S2 + S3 + … + SN. If the sub-slices have DIFFERENT averages, avg(S1), avg(S2), …, avg(SN) are not always the same. Not losing the generality, we assume that avg(S1) is:
      1. for all i in [2, 3, …, N] such that, avg(S1) < = avg(Si);

      2. Exist at least one i in [2, 3, …, N] such that, avg(S1) < avg(Si);

      In other words, S1 is the sub-slice, which has the smallest average value among all the sub-slices.

      avg(S)
      =sum(S) / len(S)
      = (sum(S1) + sum(S2) + … + sum(SN)) / (len(S1) + len(S2) + .. + len(SN))
      = (avg(S1) * len(S1) + avg(S2) * len(S2) + … + avg(SN) * len(SN)) / (len(S1) + len(S2) + .. + len(SN))
      > (avg(S1) * len(S1) + avg(S1) * len(S2) + … + avg(S1) * len(SN)) / (len(S1) + len(S2) + .. + len(SN))
      = avg(S1) * (len(S1) + len(S2) + … + len(SN)) / (len(S1) + len(S2) + .. + len(SN))
      = avg(S1)

      That is, avg(S) > avg(S1). Proof done.

  32. @Bogdan This is a general mathematical approach when demonstrating certain statements. Given an array A of N integers where N >= 2, we can slice it up in slices of 2 and 3 numbers. Let’s call these slices S1, S2, … Sk.

    Now consider the average of all elements in A. Let’s call this average AvA. The sum of all numbers in A can be calculated as N * AvA.

    Now let’s consider the average of all slices that we have created. Let’s call them AvS1, AvS2, … AvSk. What this problem is essentially saying is that AvA cannot be smaller than Min(AvS1, AvS2, … , AvSk).

    Let’s assume for a moment that AvA is smaller than the smallest average of all the slices. Let’s call that smallest average MinAvS. Now think of the sum of each slice. Since the average of the slice is bigger or equal than MinAvS (we picked MinAvS because it was the smallest one of all, the sum of the numbers in the slice is bigger than 2 * MinAvS or 3 * MinAvS (depending on how many numbers in the slice). Now, if you add all slices together (and that would be our initial array A that has N integers), the result would be bigger or equal to N * MinAvS, right? But the result is N * AvA, as we have already established and since MinAvS is bigger that AvA we get to an impossible situation hence our assumption that AvA is smaller than the smallest average of all the slices is impossible.

    Here is my C# solution that scores 100% on codility:

  33. Hi Sheng, that’s one hell of a great take. I’m having doubts about your proof though.

    I get the 1st part of the proof: If a slice S of any length holds the global minimum average MA, than no other slice can have an average less than that. Ergo every split of S would yield sub-slices with averages >= MA. So far so good.

    But your further assumption is (quote) “If the sub-slices have different averages, some of them will have smaller average than MA” (why?), from which you conclude that all sub-slices of S must have the same average, MA.

    And here’s what bothers me: Since S contains its every sub-slice, the correlation between S and it’s sub-slices is independent of array elements outside of S, and therefore independent of whether the average of S is or isn’t a global minimum within its broader context. Hence your postulate implies that the average of every sequence must be equal to that of any sub-sequence of it, which is clearly not the case.

    On the other hand, your solution is clearly correct (it does score a perfect 100% on codility), so your postulate must be correct, too. I therefore assume that it’s not the postulate, but the proof/explanation that is flawed / misleading.

    I would very much like to understand the solution completely though. Care to give the proof an alternate take?

    • Hello ZOC, your question is great! For the “If the sub-slices have different averages, some of them will have smaller average than MA”, please refer to the newly updated part of the post.

      PS:
      The correlation between S and it’s sub-slices is independent of array elements outside of S: right
      The correlation between S and it’s sub-slices is independent of whether the average of S is or isn’t a global minimum within its broader context: agree
      IF we ONLY consider “the correlation between S and it’s sub-slices”, we CANNOT get the statement. “average of every sequence must be equal to that of any sub-sequence of it”. Correct.

      BUT, when we consider “the correlation between S and it’s sub-slices” UNDER the presumption “S is the slice with minimal average”, we CAN get the statement. “average of every sequence must be equal to that of any sub-sequence of it”. See the updated post for more details please.

  34. But, one thing I can’t understand is, how to prove that this method can return the smallest starting point among all possible slices with minimal average?

    • If you can understand the logic to find the min-avg slices with 2+ elements, the “smallest starting point among all possible slices with minimal average” is exactly the same as: find the position of the first (if there are 2+) smallest element. In line #8 and #12, the comparison is <, rather than <=.

      • But how about this case: there are 4 elements totally, and the first two or first three elements don’t have the minimal average, but the first four can have minimal average, and may be the second and the third element together can result in minimal average. I mean, how to prove that above case I mentioned is not possible?

        • It’s impossible. Please refer the statement ” (2) And all the longer slices with minimal average are built up with these 2-element and/or 3-element small slices.” and its proof.

          In your specific case, there are 4 elements totally, and the average of 4 elements is the minimal average. The we can divide the array into two subslices: [first, second], [third, fourth]. And it MUST be true: avg(4 elements) = avg([first, second]) = avg([third, fourth]) (Proof is available in section c).

          Our algorithm will find the position correctly with your case.

  35. This is my solution

  36. I don’t really see why we need to check for slices with both 2 and 3 items? I think checking for slices of 2 items is enough. Can any one give me an example that require checking a slice of 3 items?

  37. My solution in Go: http://pastebin.com/eSkQKjLw

  38. Hi Sheng.. I have a small query.. just by looking at this array [2,1,0,1,-100,1,2,3,-50].. the answer should be 4.. but your code is returning 3… I hope the test case is valid.

  39. This post was such a great help — thanks Sheng! This was a really hard problem for me to solve – much harder than any of the previous ones I tried. Here’s my C++ solution:

  40. Can’t say the proof is perfect. Just personal thoughts and deductions. However, it satisfied the Codility. There’re 4 element slice such as [-5, 2, 3, -5], which can easily prove we need to go further. Maybe somebody can give a example of 10 elements slice. But that’s not a problem. we can consider all all the slices contains 2, 3, 4,5,6 … 10 elements. because the time complexity still is O(N) unless we consider about all N elements slice. So it’s not a perfect proved theory but we can handle most situations in the world and that’s how we do things everyday. 🙂

    • I actually considered the 4-element slice in my proof #c . And I did not see any solid statements in your comment, that can say my proof is not correct.

      For example, with input [-5, 2, 3, -5], the result slice should be 2-element slice [-5, 2]. Please correct me if I was wrong.

  41. Hi guys. I noticed that none of you care about making two divisions per iteration when you could avoid them and use more additions instead. I looked for the smallest sum over two consecutive elements (sum2 in my code) and for the smallest sum over three consecutive elements (sum3 in my code). It’s only at the very end that I compute the matching averages.

  42. For those who can’t figure out the “average of 2 or average of 3” theorem, here goes another approach. Well, I didn’t figure out the “theorem” at the first time so there is no way I can discover it this time either. Here goes my second passed solution without using the “theorem”…

    Again, I don’t believe most people could recognize this “theorem” in a 30 minutes interview… or, I am too dummy…

    Update by micropentium6: The boundary condition is interesting here (Admin: line #13). Both > and >= will pass codility test. They both return the correct answer, however you can tell the actual min avg at certain given index could be wrong. Surprisingly, the returned indexes remain correct…

  43. My solution is very similar, but I used a helper function to find the averages. That way I can just focus on the logic used to find all the windows.

  44. Hi. Your solution returns 2 for this test case [-1, -2, -3, -4, 1, 2]. It is true that slice [2, 4] has the global Minimum Average but also slice [1,3] has the global minimum average. And the problem says that we must return the starting point of the slice with the global minimum average. If there are more than one slice, which is a case in this example, than theoretically even philosophically must not we return the leftest slice which has the global minimum average? Why we must return the index of the slice which has the global MA and the length of 2 or 3? Isn’t this mathematical “injustice” to not to return the slice which has the global average and is appeared before the one you are returning.

    • I would like to figure out a few errors in your comment:

      1. For [-1, -2, -3, -4, 1, 2], [1, 3] is NOT holding the global minimum average.

      2. My whole article is trying to prove that MA exists in 2- or 3-item slice. Please read before posting.

      3. Please double read the problem description: If there is more than one slice with a minimal average, you should return the smallest starting position of such a slice.

Leave a Reply

Your email address will not be published. Required fields are marked *