Sunday, July 30, 2017

A week with two trees

This week's contests were clustered on Sunday. In the morning, the first day of IOI 2017 took place in Tehran (problems, results, top 5 on the left, video commentary). The "classical format" problems wiring and train were pretty tough, so most contestants invested their time into the marathon-style problem nowruz. Three contestants have managed to gain a decent score there together with 200 on the other problems, so it's quite likely that they will battle for the crown between themselves on day 2. Congratulations Attila, Yuta and Mingkuan!

A few hours later, Codeforces Round 426 gave a chance to people older than 20 :) (problems, results, top 5 on the left). The round contained a bunch of quite tedious implementation problems, and Radewoosh and LHiC have demonstrated that they are not afraid of tedious implementation by solving 4 out of 5. Well done!

In my previous summary, I have raised a sportsmanship question, asking whether it's OK to bail out of an AtCoder contest without submitting anything. The poll results came back as 42% OK, 28% not OK, and 30% saying that the contest format should be improved. As another outcome of that post, tourist and moejy0viiiiiv have shared their late submission strategies that do not consider bailing out as a benefit, so I was kind of asking the wrong question :) I encourage you to read the linked posts to see how the AtCoder format actually makes the competition more enjoyable.

I have also mentioned a bunch of problems in that post, so let me come back to one of them: you are given two rooted trees on the same set of 105 vertices. You need to assign integer weights to all vertices in such a way that for each subtree of each of the trees the sum of weights is either 1 or -1, or report that it's impossible.

First of all, we can notice that the value assigned to each vertex can be computed as (the sum of its subtree) - (the sum of subtrees of all its children). Since all said sums are either -1 or 1, the parity of this value depends solely on the parity of the number of children. So we can compare said parities in two trees, and if there's a mismatch, then there's definitely no solution.

Now, after playing a bit with a few examples, we can start to believe that in case all parities match, there is always a solution. During the actual contest I've implemented a brute force to make sure this is true. Moreover, we can make an even stronger hypothesis: there is always a solution where all values are -1,0 or 1. Again, my brute force has confirmed that this is most likely true.

For vertices where the value must be even, we can assume it's 0. For vertices where the value must be odd, we have two choices: -1 or 1. Let's call the latter odd vertices.

Our parity check guarantees that each subtree of each tree will have an odd number of odd vertices, in order to be able to get -1 or 1 in total. So we must either have k -1's and k+1 1's, or k+1 -1's and k 1's. Here comes the main idea: in order to achieve this, it suffices to split all odd vertices into pairs in such a way that in each subtree all odd vertices but one form k whole pairs, and each pair is guaranteed to have one 1 and one -1.

Having said that idea aloud, we can very quickly find a way to form the pairs: we would just go from leaves to the root of the tree, and each subtree will pass at most one unpaired odd vertex to its parent. There we would pair up as many odd vertices as possible, and send at most one further up.

Since we have not one, but two trees, we will get two sets of pairs. Can we still find an assignment of 1's and -1's that satisfies both sets? It turns out we can: the necessary and sufficient condition for such assignment to exist is for the graph formed by the pairs to be bipartite. And sure enough, since the graph has two types of edges, and each vertex has at most one edge of each type, any cycle must necessarily have the edge types alternate, and thus have even length. And when all cycles have even length, the graph is bipartite.

Looking back at the whole solution, we can see that the main challenge is to restrict one's solution in the right way. When we work with arbitrary numbers, there are just too many dimensions in the problem. When we restrict the numbers to just -1, 0, and 1, the problem becomes more tractable, and it becomes easier to see which ideas work and which don't. And when we add the extra restriction in form of achieving the necessary conditions by pairing up the vertices, the solution flows very naturally.

Thanks for reading, and check back next week!

Sunday, July 23, 2017

An unsportsmanlike week

Yandex.Algorithm 2017 Final Round took place both in Moscow and online on Tuesday (problems, results, top 5 on the left). My contest started like a nightmare: I was unable to solve any of the given problems during the first hour, modulo a quick incorrect heuristic submission on problem F. As more and more people submitted problems D (turned out to be a straightforward dynamic programming) and E (turned out to be about finding n-th Catalan number), I grew increasingly frustrated, but still couldn't solve them. After my wits finally came back to me after an hour, all problems suddenly seemed easy, and I did my best to catch up with the leaders, submitting 5 problems. Unfortunately, one of them turned out to have a very small bug, and I was denied that improbable victory :)

Congratulations to tourist, W4yneb0t and rng.58 on winning the prizes!

Here's the Catalan number problem that got me stumped. Two people are dividing a heap of n stones. They take stones in turns until none are left. At their turn, the first person can take any non-zero amount of stones. The second person can then also take any non-zero amount of stones, but with an additional restriction that the total amount of stones he has after this turn must not exceed the total amount of stones the first person has. How many ways are there for this process to go to completion, if we also want the total amount of stones of each person to be equal in the end?

TopCoder Open 2017 Round 2C was the first event of the weekend (problems, results, top 5 on the left, parallel round resultsmy screencast). The final 40 contestants for the Round 3 were chosen, and kraskevich advanced with the most confidence of all, as he was the only contestant to solve all problems. Congratulations!

This round contained a very cute easy problem. Two teams with n players each are playing a game. Each player has an integer strength. The game lasts n rounds. In the first round, the first player of the first team plays the first player of the second team, and the stronger player wins. In the second round, the first two players of the first team play the first two players of the second team, and the pair with the higher total strength wins. In the i-th round the strengths of the first i players in each team are added up to determine the winner. You know the strengths of each player, and the ordering of the players in the second team. You need to pick the ordering for the players in the first team in such a way that the first team wins exactly k rounds. Do you see the idea?

And finally, AtCoder Grand Contest 018 took place on Sunday (problems, results, top 5 on the left, analysis, my screencast with commentary). tourist has adapted his strategy to the occasion, submitting the first four problems before starting work on the last two, but in the end that wasn't even necessary as he also solved the hardest problem and won with a 15 minute margin. Well done!

As you can see in the screencast, I was also trying out this late submission strategy this time, and when I solved the first four and saw that Gennady has already submitted them, I was quite surprised. There was more than an hour left, so surely I'd be able to solve one more problem? And off I went, trying to crack the hardest problem because it gave more points and seemed much more interesting to solve than the previous one.

I have made quite a bit of progress, correctly simplifying the problem to the moment where the main idea mentioned in the analysis can be applied, but unfortunately could not come up with that idea. That left me increasingly anxious as the time ran out: should I still submit the four problems I have (which turned out to be all correct in upsolving), earning something like 40th place instead of 10th or so that I would've got had I submitted them right after solving? Or should I avoid submitting anything, thus not appearing in the scoreboard at all and not losing rating, but showing some possibly unsportsmanlike behavior?

I have to tell you, this is not a good choice to have. Now I admire people who can pull this strategy off without using the escape hatch even more :) To remind, the benefits of this strategy, as I see them (from comments in a previous post), are:
1) Not giving information to other competitors on the difficulty of the problems.
2) Not allowing other competitors to make easy risk/reward tradeoffs, as if they know the true scoreboard, they might submit their solutions with less testing when appropriate.

I ended up using the escape hatch, which left me feeling a bit guilty, but probably more uncertain than guilty. Do you think this is against the spirit of competition, as PavelKunyavskiy suggests? Please share your opinion in comments, and also let's have a vote:

Is it OK to bail out of a contest after a poor performance with late submit strategy at AtCoder?

Here's the hardest problem that led to all this confusion: You are given two rooted trees on the same set of 105 vertices. You need to assign integer weights to all vertices in such a way that for each subtree of each of the trees the sum of weights is either 1 or -1, or report that it's impossible. Can you do that?

In my previous summary, I have mentioned a hard VK Cup problem. You are given an undirected graph with 105 vertices and edges. You need to assign non-negative integer numbers not exceeding 106 to the vertices of the graph. The weight of each edge is then defined as the product of the numbers at its ends, while the weight of each vertex is equal to the square of its number. You need to find an assignment such that the total weight of all edges is greater than or equal to the total weight of all vertices, and at least one number is nonzero.

The first idea is: as soon as the graph has any cycle, we can assign number 1 to all vertices in the cycle, and 0 to all other vertices, and we'll get what we want. So now we can assume the graph is a forest, and even a tree.

Now, consider a leaf in that forest, and assume we know the number x assigned to the only vertex this leaf is connected to. If we now assign number y to this leaf, then the difference between the edge weight and the vertex weight will increase by x*y-y*y. We need to pick y to maximize this difference, which is finding the maximum of a quadratic function, which is achieved when y=x/2, and the difference increases by x2/4.

Now suppose we have a vertex that is connected to several leaves, and to one other non-leaf vertex for which we know the assigned number x. After we determine the number y for this vertex, all adjacent leaves must be assigned y/2, so we can again compute the difference as the function of y and find its maximum. It will be a quadratic function again, and the solution will look like x*const again.

Having rooted the tree in some way, this approach allows us to actually determine the optimal number for all vertices going from leaves to the root as multiples of the root number, and we can check if the overall delta is non-negative or not.

There's an additional complication caused by the fact that the resulting weights must be not very big integers. We can do the above computation in rational numbers to make all numbers integer, but they might become quite big. However, if we look at the weight differences that some small trees get, we can notice that almost all trees can get a non-negative difference, and come up with a case study that finds a subtree in a given tree which still has a non-negative difference but can be solved with small numbers. I will leave this last part as an exercise for the readers.

Wow, it feels good to have caught up with the times :) Thanks for reading, and check back next week!

A red-black week

Last week, Codeforces presented the problems from the VK Cup finals as two regular contests. First off, Codeforces Round 423 took place on Tuesday (problems, results, top 5 on the left, analysis). W4yneb0t has continued his string of excellent Codeforces performances with the best possible result - a victory, which has also catapulted him to the second place in the overall rating list. Well done!

In between the Codeforces rounds, TopCoder held its SRM 718 very early on Thursday (problems, results, top 5 on the left, anlaysis). The round seems to have been developing in a very exciting manner: three people submitted all three problems during the coding phase, then two of them lost points during the challenge phase, and the last remaining person with three problems failed the system test on the easiest problem! When the dust settled, snuke still remained in the first place thanks for his solution to the hard problem holding. Congratulations on the second SRM victory!

Codeforces Round 424 rounded up the week's contests (problems, results, top 5 on the left, analysis). TakanashiRikka was the fastest to solve the first four problems, and (probably not a sheer coincidence :)) the only contestant to have enough time to consider all cases in the hardest problem correctly while solving at least one other problem. Congratulations on the well-deserved first place!

Here's that tricky problem. You are given an undirected graph with 105 vertices and edges. You need to assign non-negative integer numbers not exceeding 106 to the vertices of the graph. The weight of each edge is then defined as the product of the numbers at its ends, while the weight of each vertex is equal to the square of its number. You need to find an assignment such that the total weight of all edges is greater than or equal to the total weight of all vertices, and at least one number is nonzero.

In my previous summary, I have mentioned a difficult IPSC problem: we start with a deck of 26 red and 26 black cards, and a number k (1<=k<=26). The first player takes any k cards from the deck, and arranges them in any order they choose. The second player takes any k cards from the remaining deck, and arranges them in any order they choose, but such that their sequence is different from the sequence of the first player. The remaining 52-2k cards are shuffled and dealt one by one. As soon as the last k dealt cards exactly match one of the player's sequences, that player wins. In case no match happens after the cards run out, we toss a coin, and each player wins with probability 50%. What is the probability of the first player winning, assuming both play optimally?

Solving this problem required almost all important programming contest skills: abstract mathematical reasoning, knowledge of standard algorithms, coming up with new ideas, good intuition about heuristics, and of course the programming skill itself.

We start off by noticing that the second player has a very simple way to achieve 50% winrate: he can just choose a sequence that is a complement of the first player's sequence (replace red cards by black and vice versa), and then everything is completely symmetric.

How can the second player achieve more? He has two resources: first, he can choose a string that is more likely to appear in the sequence of the remaining cards. Second, he can choose a string that, when it appears together with the string of the first player, tends to appear earlier.

The strings that are more likely to appear are those that leave an equal proportion of reds and blacks (after taking out the string of the first player once and the string of the second player twice), and have no borders (prefixes that are equal to suffixes). This is because we can count the number of ways a given string can appear by multiplying the number of positions it can appear in by the number of ways to place the remaining characters after the matching part is fixed. The number of ways to place the remaining characters is maximized then the remaining characters have equal numbers of blacks and reds. This slightly overcounts the number of ways because in some cases the string can appear more than once; the lack of borders minimizes the number of such occurrences.

The strings that tend to appear earlier when both appear are those which have a suffix which matches a prefix of the first player's string. At best, if the first player string is s+c, where s is a string of length k-1 and c is a character, the second player should pick his string from 'r'+s and 'b'+s. In this case as soon as there's a match of the first player's string not in the first position, we can have a >50% chance to have a match of our string one position before.

Now we can already make the first attempt at a solution: let's try likely candidates for the first player's best move - it should likely be among the strings that have the most appearances; the second player should then choose either another string with lots of appearances, or a string that counter-plays the first player's string in the manner described above. However, this is not enough to solve the problem - we will get a wrong answer.

As part of implementing the above solution, we had to also implement the function to count the sought probability for the given pair of strings. It's also not entirely trivial, and can be done by using dynamic programming where the state is the number of remaining red and black cards, and the state in the Aho-Corasick automaton of the two strings.

So, where do we go from there? Since we already have the function that computes the probability, we can now run it on all pairs of strings for small values of k and try to notice a pattern. We get something like this:

1 0.5 r b
2 0.5 rb br
3 0.3444170488792196 rbr rrb
4 0.35992624362382514 rrbb rrrb
5 0.3777939526283981 rrbrb brrbr
6 0.413011479190688 rbrbrr brbrbr
7 0.45319632265323256 rrbrrbb brrbrrb
8 0.4782049196004824 rrbbrrbb brrbbrrb

No obvious pattern seems to appear. However, we can notice that for large values of k, more precisely when 3k>52, the answer will be 0.5 simply because there is not enough remaining cards for either string to appear. So we only need to research the values of k between 9 and 17 now.

And here comes another key idea: we need to believe that by cutting enough branches early, our exhaustive search solution can run in a few minutes for all those values. At first, this seems improbable. For example, for k=16 we have 65536 candidates for each string, and four billion combinations in total, not to mention the Aho-Corasick on the inside. However, from our previous attempts at a solution we have some leads. More precisely, we know which strings of the second player are the most likely good answers for each string of the first player.

This allows us to get a good upper bound on the first player's score for each particular string reasonably quickly, which leads to the following optimization idea: let's run the search for all strings of the first player at the same time, and at each point we will take the "most promising" string - the one with the highest upper bound so far - and make one more step of the search for it, in other words try one more candidate for the second player's string, which may lower its upper bound. We continue this process until we arrive at the state where the most promising candidate does not have anything else to try, because we already ran through all possible second player strings for it - and this candidate then gives us the answer.

This search runs relatively fast because for most first player strings, our heuristics will give us an upper bound that is lower than the ultimate answer very quickly, and we will stop considering those strings further. It is still quite slow for larger values of k, so we need a second optimization on top: we can skip the Aho-Corasick part in the simple case where there's simply not enough cards of some color for the second player's string to appear. With those two optimizations, we can finally get all the answers in a few minutes.

Thanks for reading, and check back soon for this week's summary!

Sunday, July 16, 2017

A postcard week

The July 3 - July 9 week had a pretty busy weekend. On Saturday, IPSC 2017 has gathered a lot of current contestants together with veterans that come out of retirement just for this contest every year (problems, results, top 5 on the left, analysis). The (relatively) current contestants prevailed this time, with team Past Glory winning with a solid 2 point gap. Well done!

They were one of only two teams who has managed to solve the hardest problem L2. It went like this: we start with a deck of 26 red and 26 black cards, and a number k (1<=k<=26). The first player takes any k cards from the deck, and arranges them in any order they choose. The second player takes any k cards from the remaining deck, and arranges them in any order they choose, but such that their sequence is different from the sequence of the first player. The remaining 52-2k cards are shuffled and dealt one by one. As soon as the last k dealt cards exactly match one of the player's sequences, that player wins. In case no match happens after the cards run out, we toss a coin, and each player wins with probability 50%. What is the probability of the first player winning, assuming both play optimally?

Just an hour later, TopCoder Open 2017 Round 2B selected another 40 lucky advancers (problems, results, top 5 on the left, analysis, parallel round results, my screencast). dotory1158 had a solid point margin from his solutions, and did not throw it all away during the challenge phase, although the contest became much closer :) Congratulations on the win!

The final round of VK Cup 2017 took place in St Petersburg on Sunday (problems, results, top 5 on the left). Continuing the Snackdown trend from the last week, two-person teams were competing. The xray team emerged on top thanks to a very fast start - in fact, they already got enough points for the first place at 1 hour and 38 minutes into the contest, out of 3 hours. Very impressive!

And finally, AtCoder hosted its Grand Contest 017 on Sunday as well (problems, results, top 5 on the left, analysis). Once again the delayed submit strategy has worked very well for tourist, but this time the gap was so huge that the strategy choice didn't really matter. Way to go, Gennady!

Problem D in this round was about the well-known game of Hackenbush, more precisely green Hackenbush: you are given a rooted tree. Two players alternate turns, in each turn a player removes an edge together with the subtree hanging on this edge. When a player can not make a move (only the root remains), he loses. Who will win when both players play optimally?

If you haven't seen this game before, then I encourage you to try solving the problem before searching for the optimal strategies in the Internet (which has them). I think the solution is quite beautiful!

Thanks for reading, and check back for this week's summary.

A hammer week

TopCoder has hosted two SRMs during the June 26 - July 2 week. First off, SRM 716 took place on Tuesday (problems, results, top 5 on the left, analysis). ACRush came back to the SRMs after more than a year, presumably to practice for the upcoming TCO rounds. He scored more points than anybody else from problem solving - but it was only good enough for the third place, as dotory1158 and especially K.A.D.R shined in the challenge phase. Well done!

Later on the same day, Codeforces held its Round 421 (problems, results, top 5 on the left, analysis). There was a certain amount of unfortunate controversy with regard to problem A, so the results should be taken with a grain of salt - nevertheless, TakanashiRikka was the best on the remaining four problems and got the first place. Congratulations!

The second TopCoder round of the week, SRM 717, happened on Friday (problems, results, top 5 on the left, analysis, my screencast). I had my solution for the medium problem fail, but even without that setback I would finish behind Deretin - great job on the convincing win!

Here's what that problem was about. You are given two numbers n (up to 109) and m (up to 105). For each i between 1 and m, you need to find the number of permutations of n+i objects such that the first i objects are not left untouched by the permutation. As an example, when n=0 we're counting derangements of each size between 1 and m.

My solution for this problem involved Fast Fourier Transformation because, well, I had a hammer, and the problem was not dissimilar enough to a nail :) And it failed because I've reused FFT code from my library, which I've optimized heavily to squeeze under the time limit in a previous Open Cup round, and to which I've introduced a small bug during that optimization :(

And on the weekend, two-person teams competed for the fame and monetary prizes at the onsite finals of CodeChef Snackdown 2017 (problems, results, top 5 on the left, analysis). The last year's winner "Messages compress" were in the lead for long periods of time, but in the last hour they tried to get three problems accepted but got zero, which gave other teams a chance. Team Dandelion have seized that chance and won solving 9 problems out of 10. Way to go!

Thanks for reading, and check back for the next week's summary.

FBHC2017 Finals

There was one quite important competition that I forgot to mention two summaries back: Facebook Hacker Cup 2017 onsite finals took place on June 14 (problems, results, top 5 on the left, analysis, my screencast). In this round I have managed to make three different bugs in my solution to the second problem and the way those bugs combined led to my program passing the samples almost by accident, but of course it did not pass the final testing. On the bright side, not spending more time on this problem allowed me enough time to solve all other problems, so maybe the three bugs were actually a crucial component of the victory :)

Thanks for reading, and check back for the hopefully more complete next week's summary!

Saturday, July 15, 2017

A dynamic nimber week

The June 19 - June 25 week did not actually have any rounds that I'd like to mention, so let me turn back to the problem from the previous week's summary.

You are given an acyclic directed graph with n<=15 vertices and m arcs. Vertices 1 and 2 each contain a chip. Two players take alternating turns, each turn consisting of moving one of the chips along an arc of the graph. The player who can't make a valid move loses. We want to know which player wins if both play optimally. Now consider all 2m subsets of the graph's arcs; for how many of them does the first player win if we keep only the arcs from this subset?

Without the subset part, the problem is pretty standard. We need to compute the nimbers for all vertices of the graph, and the first player wins if and only if the nimbers for the first two vertices are different.

However, we do not have the time to even iterate over all subsets, and thus we can not apply this naive algorithm. Dynamic programming comes to the rescue, allowing to reuse some computations. The dynamic programming idea that is closest to the surface is: go in a topological ordering, and keep computing the nimbers. The nimber for a vertex depends on which arcs going from this vertex are included in the subset, and on the values of nimbers for the vertices reachable from this one, so our dynamic programming state should include only the values of nimbers for the already processed vertices, reducing the running time from 2m to something around the n-th Bell number. That is still not too little, but it turns out this approach could be squeezed to pass.

However, it turns out there is another beautiful dynamic programming idea that helps us move to the running time on the order of 3n. Instead of going vertex-by-vertex, we will now go nimber-by-nimber. For each consecutive nimber, we will try all possibilities for a subset a of vertices having this nimber. What are the requirements on such a subset? From the definition of nimbers we get:
  1. For each vertex in this subset, there must be an arc to at least one vertex with each smaller nimber.
  2. There must be no arcs within this subset.
Condition 2 is pretty easy to check, but condition 1 is not. However, we can notice that we can instead check:
  1. For each vertex that still has no assigned nimber (and thus will eventually have a higher nimber), there must be an arc to at least one vertex in this subset.
  2. There must be no arcs within this subset.
The new condition 1 guarantees that all higher nimbers will have arcs to all lower nimbers, so by the time we reach a certain nimber, we don't need to care about arcs to lower nimbers anymore. Now we can see that the state of our dynamic programming can simply be the last placed nimber and the set of vertices that do not yet have a nimber.

Finally, we can notice that the last placed nimber does not actually take part in the computations at all, so we can reduce the number of states n times more to just remembering the set of vertices that do not yet have a nimber, yielding overall complexity of O(n*3n).

Thanks for reading, and check back for the next week's summary!