Abstract: Evolution of digital devices and computers makes an increasing attraction in document image analysis. Many of the paper documents have been transferred and stored using digital devices in large manner. In this work we have done image enhancement techniques to reduce the noises from degraded document images. Here we have taken sample images from Document Image Binarization Contest (DIBCO) dataset images. We have done contrast stretching, histogram equalization, noise filtering, laplacian transformation, global and local thresholding methods to remove show-through noise, un even illumination noise, shot noise from degraded document images using OpenCV open source software and performance metrics had taken for the above methods.
Keywords- Document Image Enhancement, Contrast stretching, Histogram equalization, Laplacian transformation, Thresholding.
I. Introduction
The study of image processing is an interesting topic and has various applications in different fields. One of the applications is document image analysis. Now the world is very fastly changed to digitization. Here the information is also digitized. Documents are a universal communicating way in our day to day life. Image processing techniques are very largely useful part in document image analysis (DIA) to transmit, process, store, analyse, enhancing and recognizing the document images [1]. At different stages, these image processing methods are used to de-noise the document images from noise affected images. Degraded images are affected by uneven illumination noise, show-through noise, salt and pepper noise etc. To reduce this noise contrast stretching, histogram equalization, noise filtering, laplacian transformation, thresholding methods of image processing techniques are discussed here.
II. Related works
Document Image Enhancement is the process which is essential to have a uniform background and a good quality of printed/handwritten document images. This process aims to enhance the readable format of text in document and it allows reducing the noise. The major artifacts takes part in degraded document images are low contrast and uneven background illumination, show-through and shadow-through effects, damaged characters and noisy black borders [2]. To rectify this type of noises from documents; some image processing techniques are done using OpenCV software.
Leung et al [5], proposed a contrast enhancement method to increase the readability of text and histogram equalization method to reduce background noise from degraded document images. Deivalakshmi et al [6], proposed a median filter method to reduce the salt and pepper noise from images. Otsu [7] proposed a global threshold method by using 0th and 1st order cumulative movements of gray levels histogram. A modified Otsu’s method is approached by Cheriet et al [8] which is based on recursive application. Feng et al [9] approached a local thresholding concept to overcome the difficulties encountered in low contrast images and non-uniform illumination and random noise. Here the threshold determined by computing the local mean, minimum and variance of local window. Firdousi et al [10] provide a various type of local thresholding methods are used in document image binarization. And they explained popular methods of Sauvola’s technique, Niblack’s technique and Bernsen’s technique etc.
III. Proposed Method
A. Contrast Stretching
Contrast stretching is used here to reduce the show-through noise from document images. Show-through noise affected image is shown in the Fig1 which is taken from DIBCO dataset. Back side of the document information is interfered with front side of the page is known as show-through effect. Contrast stretching method is used to enhance the image by increasing or decreasing the intensity values. Usually background pixels are lower intensity values than the foreground pixels.
To adjust the brightness of an image, by increasing or decreasing the output pixel value. By adding a constant value with the input pixel value it gives high contrast image and subtracting a constant value with the input pixel it gives low contrast image. Equation for adjusting the brightness and contrast adjustment is mentioned below.
Where a and b are arbitrary constants which control the brightness and contrast [3].
B. Noise Filtering
Noise filtering method is used to filter out the unwanted information from an image. In this project, we implemented median filter for the document image enhancement application.
The median filter method is very well suited in removing “salt and pepper noise” or “shot noise”. This noise represents as aimlessly occurring white and black pixels here and there of an image. Here it is used to smoothing the degraded document image. It is also called rank filter and works based on re-ordering mechanism.
Consider a 3×3 kernel in the matrix image. The Median value is calculated by arranging the neighborhood pixel values in numerical order, and select the middle pixel from the sorted list, then replace the pixel value into an output image. The noisy input image and smoothened median filtered output image is shown in the Fig.4. Compared with mean filter and median filter is good at preserving the edge pixels and higher kernel value will produce good smoothing.
C. Histogram Equalization
In image enhancement technique, histogram plays an important role. It represents the characteristics plot of an image. If the histogram alters, image characteristics can also change. To remove the uneven illumination noise in document images histogram equalization is used here.
Histogram equalization alters the contrast of an image and non-linear stretch out this intensity range. Equalization maps narrow distribution of intensity values to wider distribution of intensity values. Noisy input image and equalized output image of histogram is shown in the Fig7 and Fig8.
D. Laplacian Transformation
The Laplacian of an image is denoted by f and is given by the below equation2. Laplacian operator is 2nd order derivative operator. To find out the edges in document images here we used Laplacian transformation operation. Degraded original document image is converted into gray scale image. Then the noise is removed using Gaussian smoothing filter. The output of an image is convoluted with the Laplacian mask and is shown in the Fig9. It is also named as zero crossing detector.
E. Thresholding
One of the main techniques used in image segmentation is thresholding. It is a process of converting grey-level (which contains 256 intensity values) to a binary level image (which contains 2 intensity values black and white) by selecting a single threshold value.
1) Global thresholding: This method creates binary images from grey level ones by turning all pixels to below the threshold value to zero and above the threshold value to one. Global thresholding method is a very fastest technique compared to other methods but it is not suitable for all type of document images in particular which document images contains picture object and text. Global method is expressed by the equation is below mentioned.
Where f(x, y) is the input image, t is the threshold value and T(x,y) is the threshold image.
2) Local Thresholding: It is also called as adaptive thresholding which selects an particular threshold value automatically for an image by each pixel based on the range of intensity values in its neighbourhood pixels. It examines the relationships between brightness of neighbouring pixels to adapt the thresholding according to the intensity statistics.
IV. Results and Discussion
We have presented document noise reduction methods from degraded document images using image enhancement techniques like contrast enhancement, histogram equalization, median filtering, Laplace transformation and thresholding. In contrast enhancement, high contrast image gives the better results compared to low contrast image. For smoothing the degraded document image, here we used median filter. It can also use to reduce the salt and pepper noise and shot noise from document images. Histogram equalization method is used to reduce the non-uniform illumination. In Global thresholding, we manually selected the threshold value as 150, but in local thresholding it automatically selects the threshold value according to the input pixels.
Figure1. Original Show-through noised image (DIBCO-2014)
Figure2. High Contrast Image
Figure3. Low Contrast Image
Figure4. Median Filter Input and Output Image
Figure5. Noisy Input Image (DIBCO_2014)
Figure6. Histogram Equalized Output Image
Figure7. Histogram plot of Input Image
Figure8. Histogram plot of Equalized Output Image
Figure9. Laplacian Transformation Output Image
Figure10. Global Thresholding Output Image
Figure11. Local Thresholding Output Image
Based on the above proposed image enhancement techniques, the performance metrics measure such as Mean Square Error (MSE), Peak signal to Noise Ratio (PSNR), Normalized Absolute Error (NAE) and Normalized Cross Correlation (NCC) has been carried out in the Table 1. It explains the better results obtained for de-noised document images.
TABLE I. Performance metrics measures for de-noised document images using Opencv
Proposed Methods
Performance metrics
Normalized Cross Correlation(NCC)
Mean Squared Error(MSE)
Peak Signal to Noise Ratio(PSNR)
Normalized Absolute Error(NAE)
High Contrast
1
0
99
0
Global threshold
1
10.2965
38.0039
0.0165
Adaptive Threshold
1
11.6172
37.4798
0.0198
Median Filter
1
22.3331
34.6413
0.0192
Histogram Equalized
1.0372
191.0293
25.3198
0.4872
Low Contrast
1.0043
254.9990
24.0654
0.5370
Laplacian transform
5.0212
250.8787
24.1362
37.2466
Figure12. Histogram Plot of performance metrics measures for proposed methods.
V. Conclusion
Conclusion of this work reveals that, using image enhancement techniques, we reduced the show-through noise, uneven illumination noise from document images and achieved high PSNR values for high contrast enhancement, global and local thresholding methods and median filtering methods.
References
[1] R.C. Gonzalez, R.E. Woods, “Digital Image Processing”, 3rd ed., Upper Saddle River, N.J.: Prentice Hall, 2008.
[2] D. Doermann and K. Tombre, Handbook of Document Image Processing and Recognition. London: Springer London, 2014.
[3] D. G. Bailey, Design for embedded image processing on FPGAs. Singapore: John Wiley & Sons (Asia), 2011.
[4] R. Szeliski, Computer vision algorithms and applications. London: Springer, 2011.
[5] C.-C. Leung, K.-S. Chan, H.-M. Chan, and W.-K. Tsui, “A new approach for image enhancement applied to low-contrast–low-illumination IC and document images,” Pattern Recognition Letters, vol. 26, no. 6, pp. 769–778, 2005.
[6] S. Deivalakshmi, S. Sarath, and P. Palanisamy, “Detection and removal of Salt and Pepper noise in images by improved median filter,” 2011 IEEE Recent Advances in Intelligent Computational Systems, pp. 363–368, 2011.
[7] N. Otsu, “A Threshold Selection Method from Gray-Level Histograms,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62–66, 1979.
[8] M. Cheriet, J. Said, and C. Suen, “A recursive thresholding technique for image segmentation,” IEEE Transactions on Image Processing, vol. 7, no. 6, pp. 918–921, 1998.
[9] M.-L. Feng and Y.-P. Tan, “Contrast adaptive binarization of low quality document images,” IEICE Electronics Express, vol. 1, no. 16, pp. 501–506, 2004.
[10] R. Firdousi, S. Parveen, “Local Thresholding Techniques in Image Binarization,” Internation Journal of Engineering and Computer Science, vol. 3, no. 3, pp. 4062-4065, 2014.
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Contact Essay is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Essay Writing Service Works
First, you will need to complete an order form. It's not difficult but, in case there is anything you find not to be clear, you may always call us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order formOnce we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignmentAs soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download