Cognitive neuroscience of visual object recognition

Object recognition is the ability to perceive an object's physical properties (such as shape, colour and texture) and apply semantic attributes to it (such as identifying the object as an apple). This process includes the understanding of its use, previous experience with the object, and how it relates to others. Regardless of an object's position or illumination, humans possess the ability to effectively identify and label an object. Humans are one of the few species that possess the ability of invariant visual object recognition. Both “front end” (knowledge/goal driven) and “back end” (sensory driven) processing are required for a species to be able to recognize objects at varying distances, angles, lighting, etc.…

One model of object recognition, based on neuropsychological evidence, provides information that allows us to divide the process into four different stages.

It should be noted that, within these stages, there are more specific processes that take place to complete the different processing components. In addition, other existing models propose integrative hierarchies (top-down and bottom-up), as well as parallel processing, as opposed to this general bottom-up hierarchy.

Visual recognition processing has been typically viewed as a bottom-up hierarchy in which information is processed sequentially with increasing complexities, where lower-level cortical processors, such as the primary visual cortex, are at the bottom of the processing hierarchy and higher-level cortical processors, such as the inferotemporal cortex (IT), are at the top, where recognition is facilitated. A most recognized bottom-up hierarchical theory is David Marr's theory of vision. In contrast, an increasingly popular recognition processing theory, is that of top-down processing. One model, proposed by Moshe Bar (2003), describes a "shortcut" method in which early visual inputs are sent, partially analyzed, from the early visual cortex to the prefrontal cortex (PFC). Possible interpretations of the crude visual input is generated in the PFC and then sent to the inferotemporal cortex (IT) subsequently activating relevant object representations which are then incorporated into the slower, bottom-up process. This "shortcut" is meant to minimize the amount of object representations required for matching thereby facilitating object recognition. Lesion studies have supported this proposal with findings of slower response times for individuals with PFC lesions, suggesting use of only the bottom-up processing.

...
Wikipedia