code/treeexp/notes.txt

   1
   2
   3 search tree T (with branching factor b and depth d)
   4 set of handles H of size as number of leaves (b^d)
   5 search algorithm: chooses a move until arrives at a leave,
   6                  returns the handle in the leave
   7 playout algorithm: selects a move until arrives at a leave,
   8                  returns outcome of the handle in the leave
   9                  gets selector
  10                  passes to next level a next-selector(selector, level,
  11                  n-levels)
  12 regret is computed as
  13         the mean of the best handle less the mean of the selected handle
  14
  15
  16 playout algorithms to compare:
  17         UCT: always UCB
  18         VCT: UVB once, then UCB
  19         UVT: always UVB
  20
  21