Document mosaicing

Document mosaicing is a process that stitches multiple, overlapping images of a document together in order to produce one large, high resolution composite. The document is slid under a stationary, over-the-desk camera by hand until all parts of the document are snapshotted by the camera’s field of view. As the document slid under the camera, all motion of the document is coarsely tracked by the vision system. The document is periodically snapshotted such that the successive snapshots are overlap by about 50%. The system then finds the overlapped pairs and stitches them together repeatedly until all pairs are stitched together as one piece of document.

The document mosaicing can be divided into four main processes.

In this process, the motion of the document slid under the camera is coarsely tracked by the system. Tracking is performed by a process called simple correlation process. In the first frame of snapshots, a small patch is extracted from the center of the image as a correlation template as shown in Figure 1. The correlation process is performed in the four times size of the patch area of the next frame. The motion of the paper is indicated by the peak in the correlation function. The peak in the correlation function indicates the motion of the paper. The template is resampled from this frame and the tracking continues until the template reaches the edge of the document. After the template reaches the edge of the document, another snapshot is taken and the tracking process performs repeatedly until the whole document is imaged. The snapshots are stored in an ordered list in order to pair the overlapped images in later processes.s

Feature detection is the process of finding the transformation which aligns one image with another. There are two main approaches for feature detection.

Each image is segmented into a hierarchy of columns, lines and words in order to match with the organised sets of features across images. Skew angle estimation and columns, lines and words finding are the examples of feature detection operations.

Firstly, the angle that the rows of text make with the image raster lines (skew angle) is estimated. It is assumed to lie in the range of ±20°. A small patch of text in the image is selected randomly and then rotated in the range of ±20° until the variance of the pixel intensities of the patch summed along the raster lines is maximised. See Figure 2.

In order to ensure that the found skew angle is accurate, the document mosaic system performs calculation at many image patches and derive the final estimation by finding the average of the individual angles weighted by the variance of the pixel intensities of each patch.

In this operation, the de-skewed document is intuitively segmented into a hierarchy of columns, lines and words. The sensitivity to illumination and page coloration of the de-skewed document can be removed by applying a Sobel operator to the de-skewed image and thresholding the output to obtain the binary gradient, de-skewed image. See Figure 3.

...
Wikipedia