There is a wealth of literature describing how dopamine is used in reinforcement learning as a proxy of the value for a particular stimulus in conditioning tasks. These cues are context-dependent, and the plasticity is clearly adaptive. For example, attacking a lone animal may yield a food reward, but if that animal is in a pack of other animals who will protect each other, it may not be such a good idea. The lateral prefrontal cortex (LPFC) is believed to be involved with abstracting sensory information into behavioral cues, and involved especially in this context-dependent decision making.
One of the requirements for context-dependent decision making is that animals can make responses to stimuli that have not yet been rewarded directly, but have indirect associations to other stimuli which have been rewarded directly. Animals have shown to be able to exhibit transitive inference (if A->B and B->C, then A->C), causal reasoning, and categorical inference, and none of these necessitate conscious reasoning. But how do LPFC neurons accomplish these tasks? There are two suggestions: temporal-difference learning, where firing rates from upstream neurons would integrate depending on the context, and model-based learning, where a previously learned structure would be applied in a novel format.
In order to differentiate between the two hypotheses, Pan et al. designed a asymetric reward choice experiment for 3 monkeys, and recorded the action potentials for single neurons extracellularly after inserting electrodes into the cortex. The monkeys chose between the visual stimuli by merely gazing at the correct stimuli for a duration that varied between 800-1200 ms, and their eye movements were recorded using a machine with a 500 Hz sample rate. Here is their diagram of their choice experiment (figure 1C from Pan et al., 2008):
There were two blocks in this design: reward instruction trials (RITs), where the monkeys were trained to receive a higher reward for one response and a smaller reward for the other, and sequential paired-association trials (SPATs), where the monkeys were forced to make a series of responses before they recieved a reward. On the SPAT trials, the monkeys were forced to make inference about which choice would eventually lead to a higher reward (4 vs. 1). The monkeys learned both the RIT and the SPAT blocks well, and all 3 showed a higher rate of correct responses for the large reward sequence.
The researchers then analyzed the brain activation data they gathered, and grouped each neuron into either “reward”-type or “stimulus reward”-type depending on how each neurons responded in connection to its neighbors. For the “reward”-type cells, the response pattern was the same towards both types of stimuli (large and small). However, for the “stimulus reward”-type neurons, one subgroup was found to predict a large reward for one category of stimuli and another subgroup was found to predict a small reward for the other category of stimuli. The authors hypothesize that through these different neuron responsen patterns, the LPFC is able to represent category-based reward information, which could be used for category-based inference. The leap from this cellular category-based inference to context-dependent decision making is not too great, and therefore this work is quite fascinating.
Pan X, Sawa K, Tsuda I, Tsukada M, Sakagami M 2008 Reward prediction based on stimulus categorization in primate lateral prefrontal cortex. Nature Neuroscience 11(6): 703-712. doi:10.1038/nn.2128.