A visual signature-based identification method of low-resolution document images and its exploitation to automate indexing of multimodal recordings

Behera, Ardhendu and Ingold, Rolf (2006) A visual signature-based identification method of low-resolution document images and its exploitation to automate indexing of multimodal recordings. Doctoral thesis, University of Fribourg, Switzerland.

This thesis investigates methods for building an efficient application system for the document-based automatic indexing and retrieval (DocMIR) of multimedia data captured from multimodal environments such as meetings, conferences, etc. Both empirical image processing, video segmentation methods and document analysis approaches are studied to bridge the gap between temporal data and static information. The proposed system focuses on two major tasks: document-based video segmentation and low-resolution document image identification. The captured audio-visual data of several hours should be fragmented into reasonable distinct smaller segments in order to provide useful access points. During a presentation, projected documents are often captured as a video stream and can be used as meaningful semantic pointers because they appear at specific time, remain in visual focus for a definite duration and summarize presenter’s discourse at that time. The existing approaches for video segmentation are not applicable in this scenario since videos are captured from low-resolution devices, such as web-cams. In order to overcome these drawbacks, the proposed feature-based segmentation technique considers the stability rather than changes in video sequences. The technique does not require any document identification methods to confirm the change. The identification of low-resolution documents is also required to link original electronic documents with the temporally segmented captured multimedia data. The proposed identification method uses a Visual Signature consisting of Layout Signature and Color Signature. This signature-based approach is considered for fast and efficient matching in order to fulfill the needs of real-time applications. It also overcomes the problem of poor resolution, noisy, complex backgrounds and varying lighting conditions of the capture environment. The visual features such as colors, their spatial distribution and layout features are extracted and structured hierarchically to form the Color Signature and Layout Signature, respectively. The matching of signature is based on both, sequential as well as multi-level linear and non-linear fusion of various visual features. The performance of the proposed technique has been compared with existing approaches using real data recorded from meetings and conferences and found to be significantly better. The high-quality performances of the above-mentioned techniques prove the usefulness of documents as an additional modality and natural interface, to interact with multimedia data captured from multimodal environments.

Item Type: Thesis (Doctoral)
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Computing and Information Systems
Related URLs:
Date Deposited: 26 Mar 2015 12:48
URI: http://repository.edgehill.ac.uk/id/eprint/6355

