Gaming the System

All work, all play.

Managing Your Bag of Tricks (How to Decide between Known and Unknown Strategies)

Each time a strategy is employed, it’s predictive success rate changes. In Rock, Paper, Scissors a successful prediction is worth 1, while an unsuccessful prediction is worth 0 or -1. If success rate is (Cumulative prediction value)/(Number of predictions), where each evaluated prediction increments the (Number of Predictions) +1 and the value of the prediction increments the (cumulative prediction value) +1, -1, or 0, then even a value-neutral prediction leads to a more accurate success rate.

The success rate of any give strategy is bound by the random elements in that strategy. It will often be correct to include a random element in your predicitions (for instance, when the most successful strategy available to you does not fully disambiguate the choice), and in this case, it is impossible that at its limits the most successful strategy will be 100% predictive.

A continuation of the previous post’s rules for prioritizing strategy anticipation (basically cascading tiebreaks for picking the strategy of making your next choice):

1. More successful over less successful.
2. Generating less ambiguous information about opponent over generating more ambiguous information about opponent.
3. Higher EV of ambiguous set over lower EV of ambiguous set.
4. Less complicated over more complicated.
5. More information to be gained (less previous data points) over less information to be gained (more previous data points).
6. Generating more ambiguous information about yourself over generating less ambiguous information about yourself.

While considering yesterday’s post, I was worried that I was too cavalierly dismissing short-term randomness, i.e. the system I conceptualized last night and in this post deprioritizes any strategy that randomly lost a round even though a random outcome doesn’t seem to provide information about the strategy. Recently, I realized that an A.I. following the above priority list may not need a step prioritizing accounting for randomness accurately. Basically, as newly-generated strategies increase in sophistication they will individually incorporate better and better methods for accounting for randomness.

Of course, it may be that without such a step, step 4 above would preclude any randomness-resistant strategies indefinitely as they are necessarily more complicated than an otherwise similar non-resistant strategy. And that the system does need some pressure that ensures more sophisticated strategies are considered, especially as misassessment of strategy incurs higher and higher costs.

The steps above provide enough structure to decide between already known strategies and to kick out to new-strategy generation when none of the known strategies historically outperform randomness.

The next system needed is one that will intelligently generate new strategies or sets of them so that the enormous set of currently unknown strategies doesn’t need to be brute-force filled in and then brute-force priortized according to the above list.

More on that in a future post. And again, I hope at some point to get to the end of my rambling process and to find time to go back and polish up my thoughts. Input towards that end is always appreciated.

Advertisement

December 14, 2011 - Posted by | Education, Ethics, Game Design, Game Theory

5 Comments »

  1. Well, isn’t the optimal strategy to Rock Paper Scissors to randomly choose an option with 1/3 probability each iteration? Often times outcomes that have results based on randomness have an optimal strategy that is also determined by randomness. Each individual datapoint (-1,0,1) is meaningless, but to maximize the overall value of that strategy over an infinitely large set of games. It seems that AI’s could handle this sort of decision making even better than humans can. Each individual is predisposed to selecting rock paper or scissors at a certain rate, and often times based off of factors that are actually -EV. The fact that there is no practical way to randomize your choice each time is why the game is more fun than, say, flipping a coin or rolling a die, but if two perfect computers played repeatedly, the outcomes would be the same.

    Comment by Chad Havas (@torerotutor) | December 14, 2011 | Reply

    • @Chad, there is no upper bound on RPS strategies. If two identical strategies played each other, then yes, the outcomes would be the same. But in your “two perfect computers” example (I am assuming you are substituting ‘computer’ for ‘strategy’) either strategy can improve by recognizing that its opponent is a copy of itself and subsequently exploit that fact. In turn, the other strategy can recognize that it is being accurately modeled and adjust to avoid exploitation. There is no such thing as a “perfect computer[/strategy]” in RPS because any opponent that recognizes they are playing against the “perfect” strategy immediately has the advantage.

      Comment by Tim Tschumy | December 14, 2011 | Reply

    • Actually, I realize you are probably saying that your two perfect computers are playing a perfectly random strategies. It is true that this is an equilibrium when either:

      1. Each agent is aware of her ranking relative to all other agents in the player pool. That is, regardless of the initial strategies deployed by each agent, the bottom 50% will immediately switch to a completely random strategy. Then, of the remaining 50% the bottom half will also switch. And so on until only the agent with the best strategy is left (who is indifferent to playing their original strategy or random).

      2. Each agent is unaware of her relative ranking but her utility function weighs the neutral EV of a random strategy higher than the more variable outcome of attempting to outplay her opponents.

      So, if you have a player pool where agents desire to do better than neutral EV and are unaware of the source code employed by the other agents, the optimal strategy is almost certainly not purely random. These conditions are met in, for example, a contest where people submit code for an RPS AI to be played against unknown opponents (http://www.rockpaperazure.com/).

      Comment by Tim Tschumy | December 14, 2011 | Reply

  2. I was going to make this point in my other reply but seems more appropriate here. On one side we have our AI gifted with the ability to procedurally generate new strategies the opponent might be using, and countering the most appropriate one at any given juncture.

    On the other side we have:

    1. An opponent utilizing a fixed set of strategies. The outcome here is that eventually our AI conceives all strategies in the opponent’s set and is able to leverage that to victory.
    2. An opponent playing randomly. Our AI never builds a set of strategies that contains the opponent’s set of strategies. Each time our AI puts the opponent on a particular strategy the evaluation of that strategy approaches a limit of “break even”.
    3. An opponent running the same AI. The set of strategies our AI conceives of roughly paces the set of strategies the opponent’s AI conceives of until each AI reaches the point of recognizing that none of its strategies outperforms randomness breaking even.
    3b. Or because there are always more strategies to evaluate, neither AI ever reaches that point.

    Think that covers all cases.

    Comment by bmoreno54 | December 14, 2011 | Reply

  3. For those interested, a fascinating RPS CS tourniment was hosted years ago. The write up is here: http://webdocs.cs.ualberta.ca/~darse/rsbpc.html I’m not enough of a wonk to understand all the coding behind it, but the descriptions of the winning bots are fascinating.

    Also, if anyone would like to play RPS on Twitter, follow @play140smash and challenge my handle @Wobbles for a game.

    Comment by Wobbles | December 15, 2011 | Reply


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,378 other followers