I'm reading through Gregory Bateson's Mind and Nature: A Necessary Unity.
[T]he fact of image formation remains almost totally mysterious. How it is done, we know not -- nor, indeed, for what purpose.
It is all very well to say that it makes a sort of adaptive sense to present only the images to consciousness without wasting psychological process on consciousness of their making. But there is no clear primary reason for using images at all or, indeed, for being aware of any part of our mental processes.
Speculation suggests that image formation is perhaps a convenient or economical method of passing information across some sort of interface. Notably, where a person must act in a context between two machines, it is convenient to have the machines feed their information to him or her in image form (Bateson 2002:24, emphasis in original).
Here, Bateson describes a military system for controlling antiaircraft fire, with the main point being that the aiming system involves the presentation and calibration of images to the gunner. He then continues:
The system contains two interfaces: sensory system-man and man-effector system. Of course, it is conceivable that in such a case, both the input information and the output information could be processed in digital form, without transformation into an iconic mode. But it seems to me that the iconic device is surely more convenient not only because, being human, I am a maker of mental images but also because at these interfaces images are economical or efficient. If this speculation is correct, then it would be reasonable to guess that mammals form images because the mental processes of mammals must deal with many interfaces (ibid. 24-25, emphasis added).
It seems to me that the idea of "efficiency" or "economy" here, when expressed in fitness terms, must most probably be relevant to time. Images are a fast way to transfer information. This kind of information transmission exploits the ability of the ancestral visual perception system to differentiate images by their fine properties. With a highly multidimensional presentation (two dimensions for the images, plus potentially a third for binocular vision, additional for brightness and color contrasts), an few iconic images and some relationships among them can be presented and distinguished very rapidly from thousands of other possible images.
In that sense, a picture really is worth a thousand words, or at least a sufficiently large number of words to take much longer to transmit and perceive.
In some contexts, images also have the advantage of being easier to learn. A verbal shorthand for aiming antiaircraft guns might be imagined ("10 o'clock for three clicks"), but even if it could be done as quickly as an image, it would be difficult to standardize among gunners (standardization would have to be taught) and would lack easy or rapid feedback. The use of such shorthand therefore is most common in situations where direct presentation of images is impossible.
Happily for our purposes, for most of our evolution ancient humans could not transmit images directly to each other. Language is a very effective means of communication for many purposes -- far more rapid than images for some kinds of information.
I wonder, though, if there is any more information in iconic visual communication than in equivalent verbal forms. Certainly the visual icons may be faster -- again why I think much of the advantage of images may have been that they were fast -- but the ease of learning icons may be exaggerated.
Bateson G. 2002. Mind and Nature: A Necessary Unity. Hampton Press, Cresskill NJ. Amazon