Saturday, September 3, 2016

A jam week

Google Code Jam 2016 Finals was the main event of the first August week (problems, results, top 5 on the left, analysis, live stream recording). There was very close competition at the top during the entire contest, but in the end the favorite Gennady.Korotkevich has pulled quite a bit ahead of the field - congratulations!

The problems were interesting in different ways this time, and it's hard to pick a single most beautiful one - but I will choose problem C "Gallery of Pillars". You are given two numbers n and r. There's a nxn grid of pillars, with a pillar of radius r centered in each point of the grid except the bottom-left corner. How many pillars are visible from the bottom-left corner?

This could initially seem to be a tedious but standard geometry problem, but the constraints were very high: n is up to 109, and r is between 10-6 and 0.5. These constraints eliminate any chance of passing for any solution that just tries to enumerate all visible pillars. Can you count them faster?

Distributed Google Code Jam 2016 Finals continued the exploration of the new field on Saturday (problems, results, top 5 on the left, analysis). Just like in non-distributed Code Jam, the last year's winner could retain the crown, but the fight was even more fierce this time. Congratulations Bruce!

This was also the first time I took a stab at solving distributed problems myself. My result was nothing too impressive (39 points from B-small, E-small and E-large), but now I can appreciate and share with you the nicer problems :)

The only problem I solved, problem E "gold", was very nice. You were looking for gold in a sequence of 1011 cells. You was also given the number of cells that contain gold, which was up to 107. Your program was executed on 100 machines. Each machine could ask questions about the location of the gold in the following way: given a cell number, the reply was either "this cell contains gold", "the closest cell with gold is to the left", "the closest cell with gold is to the right", or "the closest cells with gold to the left and to the right are at the same distance". One question took 0.2 microseconds, and the time limit was 15 seconds, so each worker could use at most 750000 questions, meaning that even between all 100 workers there was no time to query all 1011 input cells. The workers could also communicate with each other, each worker sending at most 8MB of data distributed between at most 5000 messages to other workers. How does one find all gold so quickly?

Finally, Codeforces Round 366 wrapped up the week on Sunday (problems, results, top 5 on the left, analysis). The problems turned out to be very tough, with two problems enough for second place, but it was of course better to solve three and win, just like kcm1700 did - congratulations!

In my last summary I've mentioned an unsolved Yandex problem: consider sequences of 2n parentheses of n different types such that there's exactly one opening and exactly one closing parenthesis of each type. A sequence is considered good if the parentheses can be colored with two colors in such a way that the subsequences formed by each color are balanced parenthesis sequences. How many good sequences exist for the given n? n is up to 500.

The first approach that comes to one's mind might be: let's consider the number of ways to pick two balanced parenthesis sequences of lengths k and n-k, and then multiply that by the number of ways to interleave them, and then by n! to assign the parentheses types.

This solution is incorrect because it might count a good sequence more than once, as there might be more than one way to color it with two colors in the way described above. In fact, each good sequence is counted at least twice, as we can swap the two interleaved sequences and get the same result! And in general, each good sequence is counted as many times as the number of valid colorings it has.

The next natural question is: how many valid colorings does a given good sequence have? This is relatively easy to determine: we can draw a graph between parentheses types, connecting two types if the corresponding parentheses must have different colors, which happens if the corresponding segments are intersecting but not containing one another. Now valid colorings of the parentheses correspond to colorings of this graph without two vertices of the same color sharing an edge. And that, in turn, is either 0 (in case the graph is not bipartite), or 2 to the power of the number of its connected components otherwise.

Even though the last observation doesn't lead to a solution directly, it provides valuable insight into the structure of the problem. We will call the sequences corresponding to a connected bipartite graph simple, and the rest complex. Each simple sequence has exactly two valid colorings. Each complex sequence can be decomposed in the following way: the connected component containing the first parenthesis, and the remaining connected components. That connected component corresponds to a simple sequence of smaller length, and the rest correspond to simple sequences that can be inserted anywhere into it.

Now we finally have a clear path towards the solution. We will count the number of colored simple and complex sequences of each length (so each sequence will be counted the number of times equal to its number of valid colorings). In order to count the number of simple sequences, we will subtract the number of complex sequences from the total number of sequences which we've determined above. And in order to count the complex sequences, we will try all possible sizes for the simple sequence containing the first parenthesis, and multiply the number of simple sequences of that size by the number of ways to insert more simple sequences into it.

The latter can be counted using two-dimensional dynamic programming: how many ways are there to insert several sequences with a total of n parentheses into a simple sequence with length k (in other words, with 2k+1 different available insertion positions).

And finally, after knowing the number of simple sequences, we need to find the total number of sequences, but without counting different colorings of the same sequence multiple times. This is counted by analogous two-dimensional dynamic programming, but without multiplying by 2 each time we add a new sub-sequence.

Thanks for reading, and check back later for the next week's summary!

No comments:

Post a Comment