Emergent human-like covert attention in feedforward convolutional neural networks
Abstract
Covert attention allows the selection of locations or features of the visual scene without moving the eyes. Cues and contexts predictive of a target’s location orient covert attention and improve perceptual performance. The performance benefits are widely attributed to theories of covert attention as a limited resource, zoom, spotlight, or weighting of visual information. However, such concepts are difficult to map to neuronal populations. We show that a feedforward convolutional neural network (CNN) trained on images to optimize target detection accuracy and with no explicit incorporation of an attention mechanism, a limited resource, or feedback connections learns to utilize cues and contexts in the three most prominent covert attention tasks (Posner cueing, set size effects in search, and contextual cueing) and predicts the cue/context influences on human accuracy. The CNN’s cueing/context effects generalize across network training schemes, to peripheral and central pre-cues, discrimination tasks, and reaction time measures, and critically do not vary with reductions in network resources (size). The CNN shows comparable cueing/context effects to a model that optimally uses image information to make decisions (Bayesian ideal observer) but generalizes these effects to cue instances unseen during training. Together, the findings suggest that human-like behavioral signatures of covert attention in the three landmark paradigms might be an emergent property of task accuracy optimization in neuronal populations without positing limited attentional resources. The findings might explain recent behavioral results showing cueing and context effects across a variety of simple organisms with no neocortex, from archerfish to fruit flies.