Humans, but Not Deep Neural Networks, Often Miss Giant Targets in Scenes
Abstract
Even with great advances in machine vision, animals are still unmatched in their ability to visually search complex scenes. Animals from bees [1, 2] to birds [3] to humans [4–12] learn about the statistical rela- tions in visual environments to guide and aid their search for targets. Here, we investigate a novel manner in which humans utilize rapidly acquired information about scenes by guiding search toward likely target sizes. We show that humans often miss targets when their size is inconsistent with the rest of the scene, even when the targets were made larger and more salient and observers fixated the target. In contrast, we show that state-of-the-art deep neural networks do not exhibit such deficits in finding mis- scaled targets but, unlike humans, can be fooled by target-shaped distractors that are inconsistent with the expected target’s size within the scene. Thus, it is not a human deficiency to miss targets when they are inconsistent in size with the scene; instead, it is a byproduct of a useful strategy that the brain has im- plemented to rapidly discount potential distractors.