*** Welcome to piglix ***

Thompson sampling


In artificial intelligence, Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists in choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Consider a set of contexts , a set of actions , and rewards in . In each round, the player obtains a context , plays an action and receives a reward following a distribution that depends on the context and the issued action. The aim of the player is to play actions such as to maximize the cumulative rewards.


...
Wikipedia

...