Visual Information Retrieval using Synthetic Imagery

September 2006 - present

Course: PhD project
Software: C#/C++
Funded by: NWO and BRICKS


Description

Our own visual imagination allows us to form mental images based on our memories and experiences. When we are learning new visual concepts, we often construct such images based on real objects or scenes to help understand or clarify the primary features which are associated with the concept. An example from real life would be when a journalist is looking for a photo to accompany his article and asks an archivist to find it, see an illustration in Figure 1. The journalist has a scene in mind and tells the archivist what it looks like, in this case sky, grass and trees. The archivist imagines scenes that contain the concepts mentioned. However, because the image concepts might not be perfectly transferred, the imagined scenes do not necessarily initially resemble the image the journalist is thinking of. After talking back and forth to obtain more detailed information, the archivist is better able to imagine imagine the scene and as a result is better able to return suitable images from the photo database.

Visual imagination

Figure 1. A journalist asks an archivist to find a particular kind of photo.

A new paradigm we call artificial imagination is the digital analogy of our own visual imagination. We endow the computer with the ability to intelligently synthesize images and to present them to the user to ask whether or not they are relevant. These synthesized images are constructed in such a way that they target one or more particular features that are important to the query. Our idea is that the generated images (which are not in the database) are more in line with the user’s thoughts and consequently the user will be able to select more images as relevant during an iteration.

Which images should be synthesized?

We base the synthesis on images on which feedback has already been given and their locations in feature space. Our approach is to examine their relevance in relation to the other database images and attempt to find points in feature space that will clarify uncertain or emphasize important features once feedback on them is given by the user. An approach inspired by evolutionary algorithms is very well suited for this task, because such algorithms take a population and evolve it towards better solutions. In our situation this leads to evolving a population of images towards a solution ideally containing all images of interest to the user. Our algorithm has four steps, just like an evolutionary algorithm.

Starting population Pick a random selection from the database and show it to the user.
Crossover Use previous positive feedback images to create various subsets. Create a query point for each subset by combining the contained images.
Mutation Use the negative feedback to push away the query points to a more favorable area in feature space and introduce random elements.
Survival After synthesis of these points, they are presented to the user. We let the user choose which images are relevant and thus survive.

How can we synthesize images?

In many feature spaces there is no direct one-on-one mapping between image features and image pixels, and as a result image reconstruction is not well-defined. Take for instance a feature space that is built up on a color histogram: even if the histogram tells us how much red, green and blue were present in the original image, we can reconstruct the image easily since we do not know which pixel should be assigned which color. Other feature spaces are more complex and image reconstruction can be near-impossible.

In our approach, we simultanously use two different feature spaces. One which gives us great searching capability, but for which imange reconstruction is not well-defined. And we use a second feature space that has the complete opposite qualities. Each image is therefore associated with a feature vector in both spaces. When a user gives feedback, we use the former feature space to perform a search in the image database in order to return the best matches, whereas we use the second feature space to synthesize images using our evolutionary- inspired algorithm.

Here we give an illustration of the usefulness of synthetic images. If a user is looking for crosshatched images, but only an image containing horizontal lines and an image containing vertical lines are shown on screen, the algorithm is able to synthesize a crosshatched image, see Figure 2. If the user is looking for a color adjusted version of an image the algorithm can synthesize an appropriate image, see Figure 3. From our experiments we have seen that the search is steered more quickly into the correct direction if the synthetic images are used in queries.


   

Figure 2. Synthesis of a crosshatched image (right) from images containing horizontal (left) and vertical lines (middle).

   

Figure 3. Synthesis of a color adjusted image (right) from textures containing the pattern (left) and the color (middle).

Of course, we realize we have only used images that are actually textures (i.e. images containing patterns), rather than images that one sees in every day life. In contrast with such general images, textures do not necessarily have any semantic meaning and modifications to a texture generally result in a new valid texture, making our approach appropriate for this category of image search. As our method has no knowledge of semantic concepts (e.g. it cannot synthesize an image containing a dog on a beach given an image of a beach and another with a dog), it is not really suitable for general image search. However, new semantic image synthesis techniques appear very promising and we will look into them in the near future.

Publications

For more information and experimental results, take a look at the CIVR2007 and ICPR2008 papers in the publications section.