home
Add a piglix
All Activity
piglix
Tags
Categories
Users
About
FAQ

Thompson sampling

In artificial intelligence, Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists in choosing the action that maximizes the expected reward with respect to a randomly drawn belief.

Consider a set of contexts ${\mathcal {X}}$ , a set of actions ${\mathcal {A}}$ , and rewards in $\mathbb {R}$ . In each round, the player obtains a context $x\in {\mathcal {X}}$ , plays an action $a\in {\mathcal {A}}$ and receives a reward $r\in \mathbb {R}$ following a distribution that depends on the context and the issued action. The aim of the player is to play actions such as to maximize the cumulative rewards.

...
Wikipedia