Texture Segmentation Using Iterative K-means

From MeliWiki
Jump to: navigation, search

<Contributed to MeLi Wiki by Professor Dimitrios Charlampidis, Department of Electrical Engineering, University of New Orleans>


Introduction

General information about texture segmentation:

The objective of texture segmentation is to identify different uniform textural regions in an image. A texture segmentation example is shown in Figure 1. The image on the left is the original image consisting of three uniform textural regions. The image on the right presents an “ideal” segmentation result. In other words, the segmentation process produces an image of the same size as the original textural image, in which different colors represent different uniform textural regions in the original image. Texture segmentation is used in medical applications, such as identification of skin lesions and identification of low density regions in bones using X-ray, and remote sensing applications, such as separation of different vegetation regions in satellite imagery.

Figure 1. A texture segmentation example.

There are several approaches to texture segmentation. Commonly, the segmentation process consists of two major steps, namely feature vector extraction and feature vector clustering. In feature vector extraction, several textural features are extracted from localized regions in the image. All features extracted from the same localized region comprise a single feature vector. In feature vector clustering, the feature vectors extracted from all localized regions are grouped into several groups. Ideally, all vectors belonging to the same group should correspond to visually similar local texture regions.

Details about a specific approach to texture segmentation, based on feature vector extraction and feature vector clustering, is presented next. Feature extraction is based on image filtering and fractal dimension, while clustering is based on K-means. Prior to presenting the texture segmentation technique, image filtering (or image masking) is described in detail.

Image filtering

The terms filter and filtering are used not only in image and signal processing, but also in everyday life. Examples include water filters, air filters, and fuel filters. In general, a filter accepts an input, such as a noisy image, water, or fuel, which is then converted into an output, such as a de-noised image, purified water, or purified fuel. Essentially, a filter is a medium that allows only certain components of the input to pass through. In image processing, filters are used in several applications including de-noising, smoothing, edge detection, and embossing, among others. The term masking is used by the image processing community to describe the process of filtering, owing to the fact that the filters are commonly represented as small matrices or masks. On the other hand, the wider signal processing community associates filtering with the term convolution. Nevertheless, masking and convolution are similar operations, and their difference will be discussed in a later subsection. In this subsection, the steps required in a filtering operation are presented via an example. As discussed earlier, since images are discrete, they are represented as two-dimensional matrices. Similarly, masks are also represented as two-dimensional matrices, usually smaller than images in terms of their size. Consider the following example of an image of size <math> 8 \times 8 </math>, and a mask of size <math> 2 \times 2 </math>. The particular mask is not necessarily utilized in a particular application, but is used here in order to demonstrate the masking operation. Pointers for choosing specific masks depending on the application at hand are discussed in a following subsection.

Table 1.jpg

The masking operation is performed by continuously sliding the mask on the image, until the center of the mask travels over all image pixels. At each mask location, image pixels are multiplied with corresponding mask values, and the results of all multiplications are added to produce the final result. The result is then assigned as a pixel value in a new, masked image.

For the image and mask shown above, the masking operation works as follows. First, the center of the mask is placed at the leftmost upper corner of the image, i.e. at the image pixel with coordinates <math> (0,0) </math>. Let us define the mask size as <math> M_{1} \times M_{2} </math>, where <math> M_{1} </math> is the mask height and <math> M_{2} </math> the mask width. If both the mask width and height are odd, the mask center is at the array location <math> ((M_{1}-1)/2,(M_{2}-1)/2) </math>. For instance, for a mask of size <math> 3 \times 5</math>, the center is at the mask location <math>(1,2)</math>. As a reminder, the coordinates of the leftmost upper corner of the array representing the mask are considered to be <math>(0,0)</math>. In the case where the mask width or height is not odd, the mask center is considered to be one of the mask array locations close to <math> ((M_{1}-1)/2,(M_{2}-1)/2) </math>, since there is no array element located exactly at the center of the mask. For example, the mask center in the case where the height of the mask is even but the width is odd can be assigned at <math>(M_{1}/2,(M_{2}-1)/2)</math>, and in the case where both height and width of the mask are even can be assigned at <math>(M_{1}/2,M_{2}/2)</math>. Therefore, in the case where the mask size is <math>2 \times 2</math>, the center of the mask is considered to be at <math>(1,1)</math>. In this example, the 2 of the mask overlaps with the 1 of the image. The rest of the mask values are outside the image, where it is assumed that the image values are equal to zero. The masking result for this particular point is equal to:<math> \color{Blue} \bold 1 \color{Black} \times 0 + \color{Blue} \bold 5 \color{Black} \times 0 + \color{Blue} \bold 0 \color{Black} \times 0 + \color{Blue} \bold 2 \color{Black} \times 1 = \color{Black} \bold 2</math>.

Table 2.jpg

Then, mask is shifted by one pixel, and the process is repeated:

Table 3.jpg

In this case, the 2 of the mask overlaps with the 2 of the image. The 0 of the mask overlaps with the 1 of the image. The rest of the mask values are outside the image, where it is assumed that the image values are equal to zero.

Then, the result of masking for this particular point is <math> \color{Blue} \bold 1 \color{Black} \times 0 + \color{Blue} \bold 5 \color{Black} \times 0 + \color{Blue} \bold 0 \color{Black} \times 1 + \color{Blue} \bold 2 \color{Black} \times 2 = \color{Black} \bold 4</math>.

The process is then repeated until the mask reaches the right most point at the first image row (these calculations are not shown here). Then, the mask is taken back to the leftmost side, but now it is shifted down by one pixel, and is “centered” at the leftmost pixel of the second row, as shown next.

Table 4.jpg

In this case, the 2 of the mask overlaps with the 100 of the image. The 5 of the mask overlaps with the 1 of the image. The other mask values fall outside the image.

Then, the result of masking for this particular point is <math> \color{Blue} \bold 1 \color{Black} \times 0 + \color{Blue} \bold 5 \color{Black} \times 1 + \color{Blue} \bold 0 \color{Black} \times 0 + \color{Blue} \bold 2 \color{Black} \times 100 = \color{Black} \bold 2 \color{Black} \bold 0 \color{Black} \bold 5</math>.

Then, the mask is shifted by one pixel to the right and the process is repeated:

Table 5.jpg

In this case, the 2 of the mask overlaps with the 8 of the image. The 0 of the mask overlaps with the 100 of the image. The 5 of the mask overlaps with the 2 of the image, and the 1 of the mask overlaps with the 1 of the image.

Then, the result of masking for this particular point is <math> \color{Blue} \bold 1 \color{Black} \times 1 + \color{Blue} \bold 5 \color{Black} \times 2 + \color{Blue} \bold 0 \color{Black} \times 100 + \color{Blue} \bold 2 \color{Black} \times 8 = \color{Black} \bold 2 \color{Black} \bold 7</math>.

If the process is repeated for all pixels in the input image, the following masked image is obtained:

Table 6.jpg

The values shown in boldface are the results of masking presented above. The rest of the values can be obtained if the process is repeated for the other pixels as well. Thus, the array shown above is the image filtered by the <math> 2 \times 2</math> mask, and it is of the same size as the input image.

Of course, different types of filters can be constructed using two-dimensional functions, such as the Gaussian envelope function:

<math> G(x,y)=\frac{1}{2 \pi \sigma^{2}}exp(- \frac{x^{2}+y^{2}}{2 \sigma^{2}}) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;(1) </math>

where <math>\sigma^{2}</math> is the variance of the Gaussian function, and the constant <math> 1/2 \pi \sigma^{2}</math> is used for normalization purposes. In this case, a mask can be constructed by considering the central value of the mask as the point with coordinates <math>(x,y)=(0,0)</math>. The remaining mask values can be determined accordingly, as the coordinates x and y vary in the range <math>[-(M_{1}-1)/2,(M_{1}-1)/2]</math> and <math>[-(M_{2}-1)/2,(M_{2}-1)/2]</math>, respectively (here it was assumed that <math> M_{1} </math> and <math> M_{2} </math> are odd integers, so that the mask size is equal to <math> M_{1} \times M_{2}</math>). Of course, the Gaussian envelope can be defined for values of x and y ranging from <math>-\infty</math> to <math>+\infty</math>, resulting in an infinitely large mask. Practically, it only needs to be ensured that the mask is large enough to contain the most significant values of the envelope. The Gaussian envelope defined above is an omni-directional smoothing function and a lowpass filter.

The following function is an example of a filter that belongs in a more general class of functions called Gabor functions:

<math> G(x,y)=\frac{1}{2 \pi \sigma_{x} \sigma_{y}}exp(- \frac{x^{2}}{2 \sigma_{x}^{2}}-\frac{y^{2}}{2 \sigma_{y}^{2}})cos(2 \pi f_{0}x) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;(2) </math>

This filter has some directional properties (for instance, if <math> \sigma_{x} > \sigma_{y}</math>, then the filter will appear to be elongated with respect to the horizontal axis). The filter is a band-pass filter with respect to the horizontal axis, with central frequency <math>f_{0}</math>, while it is a low pass filter with respect to the vertical axis. In the case where <math>f_{0}=0</math>, the filter of equation (2) reduces to a Gaussian, elongated low-pass filter.

If it is desired to obtain a directional Gabor filter oriented in an arbitrary direction, <math>\theta</math>, clockwise, the following coordinate transformations can be performed:

<math> \begin{align} &x \rightarrow xcos(\theta)+ysin(\theta)\\ &y \rightarrow -xsin(\theta)+ycos(\theta) \;\;\;\;\;\;\;\;\;\;\;\;\;\;(3) \end{align} </math>

Some examples of Gabor filters for different <math>\sigma_{x}, \sigma_{y}</math>, and <math>f_{0}</math> parameters are shown in Figure 2, next. The gray-level intensity of the surface plots is associated to the value of the filter mask at the particular mask location (gray is 0, low and high intensities correspond to negative and positive mask values, respectively).

Figure 2: Gabor filter masks for different parameter values.

Fractal Dimension: Fractal Dimension (FD) has been often used in texture analysis. Variation is a method that is used to compute the FD of an object. It has been shown [1] that the variation method gives accurate and robust estimation of the FD of a surface. An image <math> Z(x,y)</math> of size <math>R \times R</math> can be considered as a surface of size <math>R \times R</math>, where its value at position <math>(x_{0},y_{0})</math> is <math>Z(x_{0},y_{0})</math>.

Based on the variation method, if a surface Z is a fractal, there exists at least one point where Z is nowhere or almost nowhere differentiable. In other words, if <math>P(x,y,x^{'},y^{'})</math> is the slope of the line passing through points <math>(x,y,Z(x,y))</math> and <math>(x^{'},y^{'},Z(x^{'},y^{'}))</math>, then <math>\left | P(x,y,x^{'},y^{'}) \right |</math> goes to infinity as the point <math> (x^{'},y^{'}) </math> tends toward <math>(x,y)</math>. Fractal dimension is defined as the rate in which <math>\left | P(x,y,x^{'},y^{'}) \right |</math> goes to infinity. In practice, the FD using the variation method is calculated as follows.

The <math> \varepsilon^{th}</math> variation of Z centered at the point with coordinates <math>(x,y)</math> is defined as:

<math> V_{\varepsilon}(x,y)=\underset{dist((x,y),(s,t)) \leq s}{max} Z(s,t)- \underset{dist((x,y),(s,t)) \leq s}{min} Z(s,t) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;(4) </math>

Where <math>dist((x,y),(s,t))=max(\left | x-s \right |,\left | y-t \right |)</math> and <math>\varepsilon > 0</math>. The integral of <math>V_{\varepsilon}(x,y)</math> approaches zero as <math>\varepsilon</math> approaches 0. The growth rate of this integral is directly related to the FD of Z. More specifically, the FD of the surface Z is given by:

<math> FD_{Z}=\Delta _{V}(Z)=\lim_{\varepsilon \rightarrow 0}(3-log\int \int V_{\varepsilon}(x,y)dxdy/log \varepsilon) \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;(5) </math>

where the integrals are over the surface Z.

The slope of the log-log plot of the line that is defined by <math>log\int \int V_{\varepsilon}(x,y)dxdy </math> and <math>log(\varepsilon)</math> when subtracted from 3 gives the FD of the surface. The computation of the fractal dimension of a discretized surface involves substitution of the integrals with summations.

In other words, the FD can be computed locally in an <math>R \times R</math> square window of the image as follows. The difference <math>V_{\varepsilon /R}(x,y)</math> between maximum and minimum grayscale values is computed in a small <math>T \times T</math> window centered at <math>(x,y)</math>, where <math>T=2\varepsilon +1<R</math>. Then, <math>V_{\varepsilon /R}(x,y)</math> is the <math> \varepsilon^{th}</math> variation located at the pixel in the <math>R \times R</math> window with coordinates <math>(x,y)</math>. The variation is computed in the same way for all pixels existing in the <math>R \times R</math> window, for <math>\varepsilon=1,2,3,...,\varepsilon_{max}</math> If we define <math>E_{\varepsilon /R}</math> as the average of <math>V_{\varepsilon /R}(x,y)</math> over the <math>R \times R</math> window, then the FD is equal to 3 minus the slope of the line that best fits the points <math>(log(\varepsilon /R),log(E_{\varepsilon /R}))</math> where <math>\varepsilon=1,2,3,...,\varepsilon_{max}</math>. The slope can be found using the least mean square approach.

Some additional information is provided next, in order to provide some intuitive understanding of how the FD characterizes a surface using the variation method. Essentially, the FD can be thought of as an indicator of surface roughness. A “rough” looking surface is associated to a higher FD compared to a smooth surface. The example presented in Figure 3 illustrates this concept, but for the 2D case (2D curve) instead of a surface, for simplicity purposes. It is shown that for the smooth curve case, the variation computed in the total interval as the difference between maximum and minimum value is significantly higher than the average variation computed for the two sub-intervals. More specifically, although the maximum value for the total interval and the first sub-interval are the same, the minimum value in the first sub-interval is much larger than the minimum value in the total interval. Similarly, although the minimum value for the total interval and the second sub-interval are the same, the maximum value in the second sub-interval is much smaller than the maximum value in the total interval. On the other hand, in the case of a rough surface, the variation computed in the total interval is similar to the average variation computed for the two sub-intervals.

It can be concluded that for a smooth surface, the variation changes significantly from large to smaller intervals, or from large to small <math>\varepsilon</math> values. Therefore, the slope of the line defined by points <math>(log(\varepsilon /R),log(E_{\varepsilon /R}))</math> is large. As a result, the FD defined as 3 minus the slope of the line is small. The opposite is true for a rough surface, for which the FD is large.

Figure 3: An example that illustrates the relation between the value of FD and roughness of the curve.

A careful examination of equations (4) and (5) confirm that if the image is multiplied by a constant (which implies that all image pixel values are multiplied by the same constant), or if a constant value is added to the value of all image pixels, the FD is not affected. In other words, the FD is invariant to image contrast and intensity alterations.

Feature extraction:

Feature extraction can be performed by first filtering the original image using several Gabor filters in different orientations. An example is shown in Figure 4.

Figure 4: Image filtered (top) with three different Gabor filters to obtain three different filtered versions of the image (bottom).

Then, the FD is applied to all filtered versions of the image as shown in Figure 5 next. The result of applying the FD approach is a number of images equal to the number of filtered images. The pixel in the FD image located at <math>(x,y)</math> has a value equal to the FD computed in a localized window centered at pixel <math>(x,y)</math> in the filtered image.

Figure 5: Computing the FD for each filtered version of the image.

The feature vector associated to a particular pixel is obtained by putting together all FD values corresponding to that pixel. For example, if ten Gabor filters are used, there will be ten different filtered versions of the image, and thus ten different FD values for each pixel. In this case, the feature vector associated to a specific pixel will consist of ten FD elements.

K-means algorithm

The K-means algorithm [2] is used for clustering data points or vectors into a number of clusters. For instance, assuming that a total of R vectors, <math>\bold x_{r}</math>, <math>r=1,2,...,R</math>, where <math>\bold x_{r}=\{x_{rl}\}, l=1,2,...,L</math>, of size L are available, the goal of the K-means algorithm is to group or cluster the R vectors into K groups. The steps of the K-means algorithm are presented next:

1. Initialization: A total of K vectors, <math>\bold c_{k}, k=1,2,...,K</math>, of same size as the data vectors (i.e. L) are chosen. The vectors <math>\bold c_{k}</math> can be chosen completely randomly, or can be selected from the set of R original vectors, <math>\bold x_{r}</math> The vectors are commonly called centroids. Each centroid is the representative vector of its own group.

2. Group Assignment: Each vector, <math>\bold x_{r}</math>, is assigned to one of the K groups, say <math>k_{0}</math>, if the distance of <math>\bold x_{r}</math> from the centroid associated to the <math>k_{0}-th</math> group is smaller than the distance from any other centroid. A commonly used distance measure is the Euclidean distance. More specifically, the square of the Euclidean distance of vector <math>\bold x_{r}</math> from centroid <math>\bold c_{k}</math> is defined as follows:

<math> E^{2}\{\bold x_{r}, \bold c_{k}\}=(x_{r1}-c_{k1})^2+(x_{r2}-c_{k2})^2+...+(x_{rL}-c_{kL})^2 </math>

3. Centroid Adaptation: Once all R vectors are assigned to one of the K groups, the centroids are adjusted to better represent the associated data vectors. More specifically, the k-th centroid is adapted as follows:

<math> \bold c_{k}^{(t)}=1/L_{k}\sum_{Group \; \bold l} \bold x_{l} </math>

where the summation is over those <math>\bold x_{l}</math> vectors associated to group 1. Essentially, the above equation re-calculates each one of the L elements in the centroid <math>\bold c_{k}</math> by finding the average of the corresponding element out of all vectors associated to group 1. The superscript t indicates that the particular centroid vector is the one computed at the t-th iteration.

4. Iteration/Stopping: Steps 2-3 are repeated until a specific stopping criterion is satisfied. For example, the algorithm can stop when a maximum number of iterations is reached. Alternatively, the algorithm can stop if the centroids do not change significantly. For instance, the algorithm can reach an end if the sum of K Euclidean distances between the K centroids at iteration t and the corresponding K centroids at iteration <math>t+1</math> drops below a threshold <math>T_{h}</math>:

<math> \sum_{k}E\{\bold c_{k}^{(t+1)}, \bold c_{k}^{(t)}\}<T_{h} </math>

The K-means algorithm steps 2 and 3 are presented next through an example.

11.jpg
12.jpg

Iterative K-means for texture segmentation

The main problem with using the original K-means for directly clustering the extracted feature vectors is associated to the fact that a theoretically uniform textural region does not produce features having the same or similar values, if these features are extracted from small, localized regions. For instance, a very small segment of a texture representing sand may resemble a small segment extracted from a region representing a rough metallic surface. A solution to the problem is to use large windows for feature extraction. However, in the case of texture segmentation, using large windows causes a problem close to boundaries between textures, where more than a single texture are present. In this case, the feature extraction window consists of more than one textural regions, and thus, the extracted features are not representative of any of the two regions.

A solution is to use an iterative technique. First, features are extracted in relatively small windows. Then, each of the feature maps (such as the bottom images shown in Figure 5) are smoothed out using a moving average filter of size <math>R \times R</math> (a mask of size <math>R \times R</math> in which all values are equal to <math>1/R^{2}</math>). Such smoothing ensures that features extracted from nearby regions obtain similar values. Then, the original K-means is applied to obtain a first segmentation result, i.e. a segmentation image (an image in which different textural regions are represented by different values). Then, a merging window of size <math>R \times R</math> is applied. The merging window examines all <math>R \times R</math> square regions in the segmentation image and replaces the central pixel in the square region with the value that represents the majority within that region. Then, pixels in the segmentation image are defined as either ambiguous or unambiguous. Ambiguous pixels are the ones close to a boundary between two or more textures. All other pixels are unambiguous. In order to determine which pixels are ambiguous, all windows of size <math>R \times R</math> R in the segmentation image, after the merging window has been applied, are examined. If only one pixel value is present in the window (which implies that there is only one textural region present in that window), the central pixel of the <math>R \times R</math> window is marked as unambiguous. Otherwise, it is marked as ambiguous. The process is repeated, until a user determined minimum value of R is reached. The process is shown in Figure 6 next.

Figure 6: Iterative K-means approach.

Results

Some experimental results that compare the FD with energy and the original and iterative K-means are presented next. Energy within an image window is defined as the sum of square values of all pixels within that window. The approach for obtaining energy feature vectors is similar to that of obtaining FD feature vectors. More specifically, the image is filtered by several Gabor filters, and energy is computed for all filtered versions of the original image. Since the goal here is to present information about FD based texture segmentation, no additional information about energy is provided here. Additional information can be found in [1]. Comparisons between FD and energy are provided simply in order to demonstrate the importance of invariance to image intensity and contrast.

Figure 7: Comparison between FD and energy: (a) original image, (b) segmentation of the original image using FD, (c) segmentation of the original image using energy, (d) image subjected to multiplicative noise (local contrast alteration), (e) segmentation of the distorted image using FD, (f) segmentation of the distorted image using energy.

The example of Figure 8 presents a comparison between the original and the iterative K-means. The iterative K-means retains details around the boundaries without blurring them significantly. At the same time, no erroneous small regions are present inside homogeneous textural regions. If a large smoothing window is used together with the original K-means, no erroneous small regions are present inside homogeneous textural regions as well. However, the boundaries between textures may be relatively smoothed out, and part of the details is lost.

Figure 8: Two examples that present a comparison between original and iterative K-means.

Project: Texture Segmentation Using Iterative K-means

1. Pick a texture image of your choice, consisting of several homogeneous textural regions. If such an image is not available, create one on your own by appropriately merging homogeneous textural regions.

2. Implement a Gabor filter using a standard deviation and frequency of your choice. Also, select an appropriate mask size according to your choice of standard deviation.

3. Obtain a total of eight filters by rotating the above filter to another seven directions.

4. Implement a function that provides the Fractal Dimension for every pixel in an image.

5. Apply the Fractal Dimension function to all filtered versions of the original image.

6. Obtain the Fractal Dimension vectors associated to every pixel in the image. There should be as many vectors as the number of pixels in the image.

7. Implement the original K-means algorithm to cluster the Fractal Dimension vectors into a number of clusters. Choose a number of clusters equal to the actual number of uniform textural regions in the image.

8. Repeat step 7, considering more clusters than the actual number of uniform textural regions in the image.

9. Repeat step 7, considering less clusters than the actual number of uniform textural regions in the image.

10. Repeat steps 7, 8, and 9, considering different initial vectors in the K-means algorithm.

11. Implement the iterative K-means technique.

12. Repeat steps 7-10 for the iterative K-means technique.

13. Visually compare the results obtained using the original and iterative K-means, especially in regions close to boundaries between uniform textural regions.

14. Repeat all previous steps for different textures combinations and different textural region shapes.

Assessment of student knowledge on the subject matter

The following assessment can be provided to the students both before and after they are introduced to the concepts presented in this document. It is expected that the answers to questions 1, 2, 5 and 6, and probably 4 and 7, may be at least partially known to students prior to presentation of this material, depending on their background. On the contrary, it may not be very likely that students have been previously introduced to subject matter associated to the other questions. Questions 10 and 11 require from students some critical thinking as the answers cannot be directly found in this material.

Assessment Questions:

1. What is texture segmentation in general? Can you provide some texture segmentation related applications?

2. Having a filter mask <math> M(x,y)</math>, what needs to be done in order to produce another mask which is a rotated version of <math> M(x,y)</math> by <math>30^{\circ }</math>?

3. What are some of the advantages of Fractal Dimension as a textural feature?

4. How many multiplications and additions are needed in order to filter an image of size <math>100 \times 100</math> by a mask of size <math>5 \times 5</math>?

5. Provide your own interpretation of what texture is.

6. How would you describe what clustering is?

7. What are the basic steps of the K-means algorithm?

8. What are the basic steps of the iterative K-means algorithm?

9. Why is iterative K-means useful in texture segmentation?

10. How could the texture segmentation method become rotational invariant (in the sense that the segmentation results would be practically identical even if the textural regions are rotated by an arbitrary angle)?

11. Do you think that the approach that you proposed to make the technique a rotational invariant one has some disadvantages with respect to the original technique as a trade off to the invariance property?

references

  1. 1.0 1.1 B. Dubuc, C. Roques-Carmes, C. Tricot, and S.W. Zucker, "The variation method: a techique to estimate the fractal dimension of surfaces", SPIE, Visual Communications and image processing II, 845, 241-248, 1987.
  2. Neural Networks and Learning Machines, 3rd Edition, by Simon Haykin, Prentice Hall, 2008.