Beginners Robotics Guide : Image Processing -I

Rate this Entry
THEORY OF IMAGE PROCESSING (Special Thanks to Pulkit Gaur)

Modern digital technology has made it possible to manipulate multi-dimensional signals with systems that range from simple digital circuits to advanced parallel computers. The goal of this manipulation can be divided into three categories:

* Image Processing image in -> image out

* Image Analysis image in -> measurements out

* Image Understanding image in -> high-level description out

Image understanding requires an approach that differs fundamentally from the theme of this book. Further, we will restrict ourselves to two-dimensional (2D) image processing although most of the concepts and techniques that are to be described can be extended easily to three or more dimensions. Readers interested in either greater detail than presented here or in other aspects of image processing are referred to

We begin with certain basic definitions. An image defined in the "real world" is considered to be a function of two real variables, for example, a(x,y) with a as the amplitude (e.g. brightness) of the image at the real coordinate position (x,y). An image may be considered to contain sub-images sometimes referred to as regions-of-interest, ROIs, or simply regions. This concept reflects the fact that images frequently contain collections of objects each of which can be the basis for a region. In a sophisticated image processing system it should be possible to apply specific image processing operations to selected regions. Thus one part of an image (region) might be processed to suppress motion blur while another part might be processed to improve color rendition.

The amplitudes of a given image will almost always be either real numbers or integer numbers. The latter is usually a result of a quantization process that converts a continuous range (say, between 0 and 100%) to a discrete number of levels. In certain image-forming processes, however, the signal may involve photon counting which implies that the amplitude would be inherently quantized. In other image forming procedures, such as magnetic resonance imaging, the direct physical measurement yields a complex number in the form of a real magnitude and a real phase.


There are several methods of capturing images and some of them are described in this section. Images can be divided into several types. There are light intensity images, images dependent on the distance to the objects, the heat of the objects or other physical factors. This section deals only with light intensity images for light in the visible area. Image acquisition is divided in real time acquisition and still picture acquisition.


Real time acquisition is capturing images of the scene at the time when the image is to be used. This method of image acquisition is often done with video cameras converting the image to a electrical signal. For images used for video, it is necessary to take pictures in a rate of minimum 25 pictures a second to make a reasonable picture with no flickering. The most
used cameras for real time acquisition captures 25 frames a second consisting of 625 lines (British standard). This means that the signal does not consist of pixels and their color but of a varying signal, which describes the light intensity of the single line.
This signal is not usable for image processing before it is converted to pixels describing the intensity of each color of the picture. This conversion is normally done by a frame grabber connected to a computer. The frame grabber converts the images to a usable format in the rate acquired by the program connected to it. This sometimes gives problems for the frame
grabber because it can not follow the rate the program wants the image. The number of colors or the resolution can be reduced to reduce the amount of data power, which have to be used.
Real time acquisition is the form of image acquisition, which gives the most possibilities in machine vision and it is therefor the most used type.


Acquisition of still pictures is less complicated than real time acquisition. The method is either by camera or by scanner. The principle of their operation is the same but the scanner moves a single row of recording elements over the image while the camera have a two dimensional array of recording elements. The advantage of the scanner is that the resolution is higher than for a camera, but the scanner needs a more fixed distance to the object. The camera has the advantage of a lens that can focus on the image and it is usually faster than the scanner, which have to move mechanically. The method of recording by both scanner and camera is by elements of CCD (Charge Coupled Devices). It is capacitors, which hold a charged proportional to the light incident on the device. They can record color images if filters are used. This means that three times as many capacitors have to be used, which makes the cameras more expensive. The analogue signals from the capacitors are converted to a digital image, which is stored in the memory of the camera or transferred to the computer if a scanner is used. The images are in the computer saved in the format required. Scanners and cameras is also linked directly to the computer and thereby used directly in machine vision applications.
A digital camera is used in this project and the image is from the camera downloaded to the computer. In the computer, it is stored in a format, which is used in the following image processing. The different image formats is described in the following section.


This section describes some methods for describing images and storing them. The basics about describing images in gray-scale and colors are described. Here after are formats of storing images described, and the format chosen for this project is presented.


An image consists of a two dimensional array of pixels. Each pixel is given a color or grey level corresponding to the number in the array and the image type.

There are three main types of images:

. Black/white
. Greyscale
. Colour

Black and white images or binary images have only two color and can therefor be represented by one bit per pixel either 0 (black) or 1 (white). Greyscale images have up to 256 grey levels in black and white, in which 0 is black and 255 is white. Some types of greyscale images have only 16 different levels and they are stored in only 4 bits compared to an image with 256 grey levels which take up 8 bits memory. The most used format is 255 grey levels and that is the format used in this project.
Color images can have up to 16.7 million different colors. These consist of three basic colors, which are red, green and blue. Each of the basis colors has an intensity level between 0 and 255. This type of color image takes up 24 bit of memory per pixel. Other types of color images, in which the number of combinations are reduced and thereby the number of colors can be stored in a smaller amount of memory. They have normally a predefined combination of colors for a specific number. It is for image-processing an advantage to reduce the number of colors or grey levels to a minimum to reduce the memory required. The number of colors shall though always be sufficient to enable performance of the wanted operations.


Storing images can be done by several different methods.

. Vector images
. Pixel images
. Compressed images

Some images are stored as vectors in which the image information is the vector description of the different lines in the image. An example of such an image is a Auto Cad drawing (.dwg). Pixel images are stored with a description of the colors or grey level of the image for each pixel. Before the program reads the image, it has to know the size and type of the image. This
is in some image formats included in the beginning of the file. This method of storing images takes a lot of memory space.
To reduce the size of the image file different compression methods are used. They differ from complicated mathematical operations to simple methods. A simple method of compressing is to describe a color and the number of proceeding pixels with the same color. This method of compressing is best for images, which do not change color between each pixel.


Thresholding is an operation in which a greyscale image is converted to a binary image. This is often done to simplify the image after which other image processing operations can be applied. An example is for text recognition in which a grey scale image is converted to a binary image, which is easier to process when the characters shall be identified. The conversion to a binary image is done by changing all the pixels with a grey level under the threshold level to 0 (black) and all over or equal to the threshold level to 255 (white). The threshold level can be found by mathematical methods or by setting it manually by studying the image and the histogram for the image. Two types of errors can be made when choosing the threshold level.

Type 1 All the pixels in the wanted object are not included in the new image Type 2 Pixels included should not have been in the new image. Before choosing the threshold level it has to be determined what kind of error, which is the most acceptable. This determines if the threshold shall be set a little higher or lower. The following subsections describe methods of determining the threshold level.


In manual thresholding, the threshold is determined from experience or the colors are studied. It is an easy way to set the threshold.

Another manual method of determining the threshold level is by studying the histogram for the image. A histogram is a graphical representation of the number of pixels at each grey level or colour level. If there are a big difference between the background and the wanted object it is possible to see which grey level the object has and which grey level the background has. A grey level between these two peaks is the chosen for the threshold.


Automatic thresholding is a mathematical operation in which the threshold level is found by mathematical operations. If the size of the object is known the threshold level can be set so that number of pixels are present in the new image. The size of the object is though seldom known and other methods must therefore be used.
An image consists often of two parts the object and the background. The grey level of the object is usually distributed by a statistical distribution and the background is distributed after another.

Automatic detection of the threshold level is difficult in images in which the lighting is uneven. This is because the background and the object have different grey levels depending on where in the image they are. A method of avoiding this problem is to change the lighting but if that is not possible other methods can be used. The image can be segmented in smaller
images for which a threshold level is found. The images can be added again after the threshold process is made.
When the image is converted to a binary image, it is possible to perform operations on the image to obtain the wanted result.
Tags: None Add / Edit Tags