Steamhammer has a UCB bug
Ack, Steamhammer has a typo in its UCB formula! A parenthesis is misplaced. What a blunder! In UCB1_bound(), change this inexcusable mistake:
return sqrt(2.0 * log(double(total)/tries));
to this:
return sqrt(2.0 * log(total) / tries);
The typecast has no formal effect in C++11 and later, and made it harder to see the error.
The current Steamhammer 1.4.1 uses UCB only for deciding whether to steal gas, when AutoGasSteal is turned on. I had been wondering why it chose to steal gas so often against so many opponents. Was the gas steal really that effective? When I looked again at the code, I soon spotted the mistake.
The behavior is approximately right when the number of games is small. That’s how it passed my end-to-end tests. As the number of games goes up, it gets more and more wrong. It’s impossible to be too careful in testing. :-/
The upcoming version will use UCB for opening selection—not in the most direct way, like most bots, but with a twist to cope with the large number of openings, too many to explore. Good thing I caught the bug in time.
Comments
Paul Goodman on :
Jay Scott on :