Thursday, January 31, 2019

Is AlphaStar what intuition without reason looks like?

Deepmind, a group owned by Google, has been making waves recently with their game playing AIs.  First there was the one that taught itself to play Atari games.  Then, most famously, they created AlphaGo which went on to beat the world champion at Go.  That really generated waves since nobody had been expecting a computer to crack Go any time soon.  They've also done a few other things such as figuring out what shapes proteins fold into just from their chemical composition.

Last Thursday they revealed their newest creation, AlphaStar the Starcraft 2 playing bot.  Starcraft 2 is the sequel to a game I played way back in high school.  It's an example of what is called a realtime strategy game or RTS.  The way these work is that you have a bunch of soldiers or other forces which you use to fight your opponent.  But at the same time you have other units under your command that can gather resources which you can use to create more units.  So you have to make tradeoffs between creating more resource gatherers to help you in the long run or more fighting units to help you right now.  And since you build your army as the game progresses you have to decide on its composition which will hopefully perform well against your opponent's composition.  Your giving orders to all your units in real time, your worker might be in the middle of gathering resources when you say "Hey, build this building there" and then it starts doing that instead.  Also, the way these generally work is that you can't see the entire map at once but only those places where your units can see, so there's an aspect of scouting out what your opponent is doing and also preventing them from scouting what you're doing.

All of this is very different from a game like Go or Chess where the entire board is known and the players take turns moving.

All this raises the question of how AlphaStar works.  The team has written a blog entry about this but it doesn't shed a whole lot of light.  We know that it uses a deep neural network like AlphaGo did and we know that it decides on a variable amount of time to wait between actions.  More information will presumably be available when they publish their journal paper but I'm going to go and do some speculation here.

AlphaGo was a combination of a deep neural network and an alpha beta search.  The way an alpha beta search works is that, at the top level you look at all the possible moves you can make and choose the one that gives you the best result.  How do you know which one gives the best result?  Well, for the board after each move you run an alpha-beta search from your opponent's position after you have made that move and assume that they make the best move from their position.  In chess you look at each of the dozen or so possible moves you could make, look at the dozen moves your opponent could make from each for 144 total, look at the dozen you could make in response to each of those for 1728 position total...  If you hit a victory condition for you or your opponent you can stop searching but usually the exponential explosion of possibilities will overwhelm the computer before that happens.  So a simple way to do it is search to some depth and have some metric of who is ahead, like assigning point values to pieces and totaling those.  More sophisticated chess programs have more sophisticated ways of determining how good a given board position is.  They also have ways of guessing what the best move is and only considering sensible moves rather than exploring down every path to the same depth.

AlphaGo's big breakthrough was using expertly trained neural networks to decide how good a board position was and which moves seemed promising and were worth exploring.  Neural networks work very well for that sort of thing.

 But as far as I can tell AlphaStar works entirely with neural networks and doesn't have anything like the alpha beta framework that AlphaGo used.  That seems to have left it with some vulnerabilities of the sort that AlphaGo didn't have.  It won most of its games but in the final exhibition game the player who goes by the nickname MaNa was able to beat it.  Partially this was just good play but there was a window where AlphaStar had the upper hand and was sending its army to go and destroy MaNa's base.  MaNa, to prevent this, sent a couple of its units in a flying vehicle to go and harass AlphaStar's production buildings.  AlphaStar saw this and turned its bit army around to head off this threat.  Then MaNa loaded up these units into their transport and retreated and AlphaStar moved its army back to the attack.

So far, so typical of a high level Starcraft 2 game.  But then MaNa did the same thing and AlphaStar responded the same way.  And it happened a third time.  Then a fourth.  At this point a human player would have seen that this would keep happening and adjusted their play.  But AlphaStar was, in a very robotic way, incapable of seeing the pattern and gave MaNa enough time to create a counter to its army and lost when MaNa finally attacked.

In his book, Thinking, Fast and Slow, Daniel Kahneman describes two different systems that seem to coexist in each of our brains.  There's system 1 which is fast, automatic, frequent, emotional, stereotypic, and unconscious.  Then there's system 2 which is slow, effortful, infrequent, logical, calculating, conscious.  We use system 1 most of the time, to quote Kahneman "We think much less than we think we think" but we also use system 2 for many tasks. 

When I look at AlphaStar's performance in that game it looks like a system with a finely tuned intuition for which moves will be best at any given moment, a superbly trained system 1.  But at the same time it seems to utterly lack any reflective capability or abstract reasoning, which is to say a system 2.  AlphaGo had the framework of the alpha beta search to fall back on when intuition failed and so effectively had both but AlphaStar doesn't, rendering it vulnerable to humans who grasp its weakness.

Deep neural networks have taken the AI world by storm for good reason.  They seem to be able to duplicate basically anything a human system 1 can do.  But they can't substitute for system 2.  This is maybe a bit ironic since partial substitution for system 2 is what computers have historically been best at.  Still, it looks like we're at least one paradigm short of an AI that can fully replicate human intelligence.

No comments:

Post a Comment

The limitations of blindsight

Blindsight, made famous by a book of the same name in science fiction circles by Peter Watts, is a disorder caused by damage to the primary...