Minimax
在游戏理论中,minimax是一种决策规则,用于最大程度地减少最严重的潜在损失;换句话说,玩家考虑了对他的策略的所有最佳对手的反应,并选择了策略,以便对手的最佳策略给予尽可能多的回报。
这name "minimax" comes from小型的mizing the loss involved when the opponent selects the strategy that givesmax一世mum loss, and is useful in analyzing the first player's decisions both when the players move sequentially and when the players move simultaneously. In the latter case, minimax may give aNash equilibrium如果有一些额外的条件,则游戏。
Minimax is also useful incombinatorial games,,,,一世n which every position is assigned a payoff. The simplest example is assigning a "1" to a winning position and "-1" to a losing one, but as this is difficult to calculate for all but the simplest games, intermediate evaluations (specifically chosen for the game in question) are generally necessary. In this context, the goal of the first player is to maximize the evaluation of the position, and the goal of the second player is to minimize the evaluation of the position, so the minimax rule applies. This, in essence, is how computers approach games like棋而且,尽管可以实现Minimax的“天真”实现,但可以进行各种计算改进。
Contents
正式定义
s你ppose player 选择策略 ,其余的玩家选择策略配置文件 。如果 表示球员的效用函数 on strategy profile , 这minimax游戏的定义为
Intuitively speaking, the minimax (for player )一世sone of two equivalent formulations:
- minimax是其他玩家可以强迫玩家的最小价值 接收,不知道球员 's strategy
- 这minimax一世sthe largest value player 可以保证何时被告知所有其他球员的策略。
同样,最大值被定义为
可以直观地理解为:
- 最大值是最大的价值参与者 可以保证当他不知道任何其他玩家的策略时
- 这最大值一世sthe smallest value the other players can force player to receive, while knowing player 's strategy
For example, consider the following payoff matrix (the rows represent the first player's choices and the columns represent the second player's choices):
1 | 2 | 3 | |
1 | -1 | -2 | -1 |
2 | 2 | 2 | 1 |
3 | -1 | -1 | 0 |
两个玩家在三种策略之间都有选择。在这样的回报矩阵中,从第一个玩家的角度来看:
- 这最大值是每行最小的值中最大的值
- 这minimax是每列中最小的值中最小的值
因此,最大值是-2、1和-1(即1)中最大的,而最小值为2、2和1中的较小(即1)。
It is extremely important to note that
一世。e. the maximin is always at most the minimax.
可以通过指出在Minimax中,可以直观地理解这一点 有效地选择他的策略后learning everyone else's, which can only increase his payoff.
在上面的示例中,最大值和最小值实际上是相等的。在这种情况下(并非总是发生!),两个参与者的最小策略给出了Nash equilibriumof the game.
这在零和游戏,其中minimaxalwaysgives aNash equilibrium游戏中的最小值和最大值一定是相等的。
minimax定理
这minimaxtheorem建立条件,何时何时等于函数的最小值和最大值。更准确地说,最小值定理给出了何时的条件
正式
minimax定理:
Let 是两个compact凸组, be a continuous function on pairs 。如果 一世s凸孔, IE。
- 是所有固定的凸
- 对于所有固定
然后
这application of the minimax theorem to zero-sum games is especially important, as it becomes equivalent to
对于具有许多策略有限的零和游戏,有回报 and amixed strategy对于每个玩家
- 玩家1最多可以实现 ,即使考虑到玩家2的策略
- 玩家2最多可以实现 ,即使有了球员1的策略
which is equivalent to establishing aNash equilibrium。
它是不rtant to note that the minimax strategy may be mixed; in general,
It is not necessarily the case that the pure minimax strategy for each player leads to a Nash equilibrium.
例如,考虑收益矩阵
1 | 2 | 3 | |
1 | 3 | -2 | 2 |
2 | -1 | 0 | 4 |
3 | -4 | -3 | 1 |
第一个玩家的最小值选择是策略2,第二个玩家的最小值选择也是策略2。但是两个选择策略2的玩家都不会导致NASH平衡。任何一个玩家都会选择对对方的了解改变其策略。实际上,最小策略的混合策略是:
- 玩家1选择策略1有概率 和策略2有概率
- 玩家2选择策略1有概率 和策略2有概率
一世sstable and represents a Nash equilibrium.
In combinatorial games
在组合游戏中,例如棋and Go, theminimax算法给出选择下一个最佳移动的方法。首先,评估功能 从一组职位到实数,需要代表第一个玩家的回报。例如,具有评估+1.5的国际象棋位置非常有利于第一个球员,而评估的位置 一世sa chess position in which White is checkmated. Once such a function is known, each player can apply the minimax principle to the tree of possible moves, thus selecting their next move by truncating the tree at some sufficiently deep point.
更具体地说,给定treeof possible moves in which the leaves have been evaluated using the function ,,,,a player递归assigns to each node an evaluation based on the following:
- 如果节点处于均匀的深度,这意味着第一个播放器正在移动,则节点的评估是最大of the evaluations of its children.
- 如果the node is at odd depth, meaning that the second player is on move, the evaluate of the node is the小型的mumof the evaluations of its children.
For example, in the below tree the evaluations of the leaves are calculated first (with 99 and -99 representing a won/lost game respectively); this fills in the bottom row of the tree. At depth 2, the first player is on move, so he should select the move that maximizes the evaluation. This means that each evaluation in the depth 2 row is the最大其子树中的数字。在深度1时,第二个玩家正在移动,因此他应该选择最小化评估的举动。这意味着深度1行中的每个评估是小型的mum其子树中的数字。最后,在深度0时,第一个玩家正在移动,因此他应该选择最大化评估的举动,从而总体评估4。
当然,象棋和Go之类的游戏要复杂得多,在任何可能的时刻都可以进行数十个动作(而不是上面示例中的1-3)。因此,使用Minimax算法完全解决这些游戏是不可行的,这意味着评估功能是在树上足够深的点上使用的(例如,大多数现代的国际象棋引擎应用于16至18之间的深度),而Minimax则是用于填充其其余相对较小的树的其余部分。