5792

Coupled Snakelets for Curled Text-Line Segmentation from Warped Document Images

Syed Saqib Bukhari, Faisal Shafait, Thomas Breuel

International Journal on Document Analysis and Recognition (IJDAR) , Vol: 15 , Pages: 1-16 , Springer , 2012
Camera-captured, warped document images usually contain curled text-lines because of distortions caused by camera perspective view and page curl. Warped document images can be transformed into planar document images for improving optical character recognition accuracy and human readability using monocular dewarping techniques. Curled text-lines segmentation is a crucial initial step for most of the monocular dewarping techniques. Existing curled textline segmentation approaches are sensitive to geometric and perspective distortions. In this paper, we introduce a novel curled text-line segmentation algorithm by adapting active contour (snake). Our algorithm performs text-line segmentation by estimating pairs of x-line and baseline. It estimates a local pair of x-line and baseline on each connected component by jointly tracing top and bottom points of neighboring connected components, and finally each group of overlapping pairs is considered as a segmented text-line. Our algorithm has achieved curled text-line segmentation accuracy of above 95% on the DFKI-I (CBDAR 2007 dewarping contest) dataset, which is significantly better than previously reported results on this dataset.

Show BibTex:

@article {
       abstract = {Camera-captured, warped document images usually contain curled text-lines because of distortions caused
by camera perspective view and page curl. Warped document
images can be transformed into planar document images for
improving optical character recognition accuracy and human
readability using monocular dewarping techniques. Curled
text-lines segmentation is a crucial initial step for most of
the monocular dewarping techniques. Existing curled textline segmentation approaches are sensitive to geometric and
perspective distortions. In this paper, we introduce a novel
curled text-line segmentation algorithm by adapting active
contour (snake). Our algorithm performs text-line segmentation by estimating pairs of x-line and baseline. It estimates a
local pair of x-line and baseline on each connected component by jointly tracing top and bottom points of neighboring
connected components, and finally each group of overlapping pairs is considered as a segmented text-line. Our algorithm has achieved curled text-line segmentation accuracy of
above 95% on the DFKI-I (CBDAR 2007 dewarping contest)
dataset, which is significantly better than previously reported
results on this dataset.},
       number = {}, 
       month = {}, 
       year = {2012}, 
       title = {Coupled Snakelets for Curled Text-Line Segmentation from Warped Document Images}, 
       journal = {International Journal on Document Analysis and Recognition (IJDAR)}, 
       volume = {15}, 
       pages = {1-16}, 
       publisher = {Springer}, 
       author = {Syed Saqib Bukhari, Faisal Shafait, Thomas Breuel}, 
       keywords = {},
       url = {http://www.dfki.de/web/forschung/publikationen/renameFileForDownload?filename=Bukhari-Coupled-Snakelets-IJDAR12.pdf&file_id=uploads_1257}
}