Question: http://codility.com/demo/take-sample-test/genomic_range_query
Question Name: GenomicRangeQuery
This is a typical case that uses more space for less time.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | def solution(S, P, Q): result = [] DNA_len = len(S) mapping = {"A":1, "C":2, "G":3, "T":4} # next_nucl is used to store the position information # next_nucl[0] is about the "A" nucleotides, [1] about "C" # [2] about "G", and [3] about "T" # next_nucl[i][j] = k means: for the corresponding nucleotides i, # at position j, the next corresponding nucleotides appears # at position k (including j) # k == -1 means: the next corresponding nucleotides does not exist next_nucl = [[-1]*DNA_len, [-1]*DNA_len, [-1]*DNA_len, [-1]*DNA_len] # Scan the whole DNA sequence, and retrieve the position information next_nucl[mapping[S[-1]] - 1][-1] = DNA_len-1 for index in range(DNA_len-2,-1,-1): next_nucl[0][index] = next_nucl[0][index+1] next_nucl[1][index] = next_nucl[1][index+1] next_nucl[2][index] = next_nucl[2][index+1] next_nucl[3][index] = next_nucl[3][index+1] next_nucl[mapping[S[index]] - 1][index] = index for index in range(0,len(P)): if next_nucl[0][P[index]] != -1 and next_nucl[0][P[index]] <= Q[index]: result.append(1) elif next_nucl[1][P[index]] != -1 and next_nucl[1][P[index]] <= Q[index]: result.append(2) elif next_nucl[2][P[index]] != -1 and next_nucl[2][P[index]] <= Q[index]: result.append(3) else: result.append(4) return result |
In this one I implemented a Segmented Tree, which was aaaa looooootttt more code than yours @o@.
Segmented Tree is a more powerful and more universal solution for this kind of questions. The major reason, that I did not use it here, is: I did not master it. If you are willing, could you please share your code here, as another solution?
Hey Allesandro, could you please post your code, if you are ok to share.
(by the way I Subscribed to your blog, cool posts for those interested in programming challenges!)
Thanks a lot! These questions should be also helpful to the internship/job seekers, like me~
Here is my solution. Python, result 100/100. A bit different approach than yours, closer to Lesson 3 topic (Prefix Sums).
Sorry for bad tag usage in my previous post attempt 🙂
Thanks for sharing! I removed your previous mis-formatted comment to keep clean.
Great code!! Just making it a little more verbose without a dictionary for expressiveness :
Great! Thanks!
Solution in Golang without going to a map to convert nucleotide to number.
Thanks for sharing!
Why don’t set the initialize value of next_nucl to DNA_len instead of -1?
Yes, you can. It is more traditional and better to use DNA_len as the not-exist-item-index.
Trying to understand the time complexity problem here. This is apparently O(N*M) instead of the required O(N+M). Can someone explain why?
The for loop itself is O(M). And inside the loop, the sorted statement is O(NlogN). Totally, it is O(M*N*logN).
C# solution – similar approach
Adding a Java Solution 100/100, https://codility.com/demo/results/demoJFB5EV-EG8/
I know the ImpactFactorHolder class might be too much, but It was a little bit tricky for me on what was happening, my initial attempts I used a two dimensions arrays, but somehow I wasn’t able to deal with some of the test cases and adding that extra class helped me a little bit to understand and solve the problem.
I took the liberty of Implementing the original solution in a more pythonic way, using dictionaries, removing the “-1” trick, using reversed(), enumerate() and zip() instead of looping with an index
I had an idea but could not implement it due to practical number range limits. Small numbers may work.
Instead of impact 1,2,3,4; encode them as primes 2,3,5,7.
Instead of prefix-summing impacts, prefix multiplying the primes (equivalent to summing logs of primes).
Instead of subtracting prefix sums, use division.
For each segment, the division result is tested for divisibility of 2, then 3, then 5, then 7. If divisible, decode back to the minimum impact 1,2,3,4 correspondingly.
Yes, if they are short and the product is small, it works.
Hi,
I am trying out Codility exercises to help me learn coding as I am preparing for interviews. Many thanks to Sheng, and other contributors, for the enormous help he is providing to people like me who is trying to learn on our own.
For this problem, I cooked up this solution in Python but I am not sure if it matches all the time and space restrictions of the problem. Any help would be appreciated a lot. Thanks in advance.
Hi, thanks for visiting my blog! Unfortunately your solution does not match the time restrictions of the problem. Let the N be the length of the DNA S, and M be the length of queries (P and Q), the statement “min(set(SS[P[i]:(Q[i]+1)]))” is O(N), and it is inside one loop of O(M). Therefore, the total complexity is O(M*N).
Here’s a 100% score solution I did straight from codility in java. It’s definitely a hack’n’slash patch job to deal with some of the test cases, but I honestly didn’t understand some of the logic that others were employing.(Especially Luciano Issoe’s InRange variables in his last FOR loop. A quick explanation would be great =D)
Anyhow, hope this can help!
Hi Sheng,
This is a really great solution. How did you come up with this approach? Like is there any tricks or tips to know how I should approach the solution for this problem? I gave up after 3hrs solving since my approaches were wrong. I just want to know how you figured this problem out. Like ways to think and steps to approach the solution.
Thanks 🙂
Hi Tim. Sorry, nothing but practice more 🙂
Hello, here’s something I came up with in C, since there aren’t many C solutions I thought it could help. Seems kinda long and probably could use some optimisation.
Feel free to comment!
Thanks for C !
I scored 87% with this solution. The detected time complexity is O(N + M) but I get “Timeout Error” on the last check. Could you tell me why by any chance?
It’s O(N*M), not O(N+M). Because (‘A’ in S[i:j+1]) is O(N) in worst case.
Hi, I got 100% with a similar solution, also with “detected time complexity O(N + M)”:
Sheng, can you comment on why mine might have passed the performance checks? Should it not pass in general?
Thanks 🙂
The “detected time complexity” is not always accurate. Because “”‘A’ in sub_S” is an O(N) operation, I would take your solution as O(N*M).
Hi Sheng, i’ve made a similar code, that got 100/100 as well an it ilustrates pretty well what you’re trying to say:
It’s the exact same principle as the examples above, I just tried not to exceed in the if else repetition, but it does exactly the same thing.
On the worst case cenario, it is indeed O(N*M) since python uses linear search. If all the characters of the DNA were “T”, on the lasr position, it would be N*M, but since the avarege is lower than N*M it gets as N+M
Here a nice PHP solution
Please refer to “Guideline for Comments” (in the right column) to post code.
Javascript code 100%
Small improvement to @karol solution.
I came up with this.
Another solution in python , got 100/100 on codility, many thanks to all the contributors especially Sheng
Your are welcome :–)
Another Javascript solution:
Clean VB:
Hello,
I am more a C++ dev, but I tried it in python, and it also gets the holy 100/100.
I don’t understand why you don’t loop through the letter, as this is a fixed size loop it does not account in time diff, right ?
Also I find my cost function very ugly now that I see what you did :p.
Yes, it’s more a design style, rather than a performance concern 🙂
I’ve solved this using prime numbers, but I’m failing at performance tests. I can’t see why. my time complexity is O(N+M)
Hi,
Here’s my Python 3.6 solution, I think it’s a bit simpler than what I’ve seen here.
section
Javascript
Hi Sheng
Thanks!
Your solution led me in right direction.
Take a look at this “optimization”.
I wrote below answer to this… would love some insights…
Hi,
How is code O(N*M)
Here’s my solution, O(N + M):
Pretty neat and easy to understand solution. Thanks.
Here is my simple python solution O(N+.M) 100%
shorter version:
Here’s what I went with. Python. I went with a prefix sums approach.
My solution in C++
Received 100/100 with O(N+M).
Same solution, but a bit more intuitively written (Javascript):
Follow-along video: https://www.youtube.com/watch?v=4SyckIAmYXk