الفهرس | Only 14 pages are availabe for public view |
Abstract Documents are of great importance to governmental and educational institutions because of their scientific, cultural and legal value. Degradation documents suffer from some problems such as background and foreground effects. These problems like non-uniformity of density, dirt, spots, liquid, missing words, etc. Degraded document processes must be developed. The general aim of this thesis was to develop an intelligent system for the maintenance and retrieval of documents in educational institutions. Degradation cases are enhanced in the proposed system to remove noise and abnormal spots from the background and foreground using the Adaptive Thersholding Technique. Hough transform technique is used to de-skewed documents. Maximally Stable Extermal Regions (MSER) technique is used to extract the text features for identifying them. The extracted text was spellchecked using the Levenshtein distance technique to reduce the error rate. The missing words are processed with the extracted text through the application of two methods, the first which is N-gram and the second is a set of steps to search for similarity between repetition of the missing word in terms of what was before and after this word within the same degraded document or other documents belonging to the same private area. The proposed system was tested and evaluated on degradation of printed documents that are written in the English language. Performance measures are used to evaluate the proposed system using: Mean Square Error (MSE), Peak Signal to Noise Ratio (PSNR), F-Measures, Accuracy, Negative Rate Metric (NRM), Misclassification Penalty Metric (MPM) and Distance Reciprocal Distortion Metric (DRD) are used. High performance ratios 99% have been achieved through readable text. |