الفهرس | Only 14 pages are availabe for public view |
Abstract Information retrieval (IR) is the science of searching for information in documents, searching for document themselves, searching for metadata which describe documents, or searching within database, whether relational standalone or hypertext networked databases such as the internet or World Wide Web or internet, for text, sound, images or data. Field association terms (FA terms) are the terms that indicate each subject matter category in the classification scheme. In this thesis, co-word analysis that counts and analyzes the co-occurrence of keywords in the publications on a given subject will be used to measure the relations among a selected sample of FA terms in a common field. The thesis objectives are the outline of information retrieval, co-word analysis, and power link. It is devoted to focus on the previous work of the Retrieval Precision (RP) and focuses on how to use the power link as a tool to improve the extracted field association terms from corpus by the proposed algorithm. The thesis presents a modified method to produce an improvement FA terms dictionary by using the co-word and Power link analysis. The modified method is used to calculate the levels of FA terms by giving different weights to terms according to their position in the document. The proposed method uses the power link concept as well as modifications of the rules to classify the scientific papers into its proper field. Instead of the whole document, a given document will be divided into three parts, namely the title, abstract, and body. A given term will be given a weight that depends on the location of the term in a specific document. The greatest weight will be given to the title, then the abstract, and then the body respectively. Results of used data show an improvement in precision, recall, and F-measure in perfect FA terms (Level 1), but with different data the proposed method can give an improvement in level 2 and level 3.The thesis is organized into four chapters: Chapter 1: Presents a review of definitions and concepts related to information retrieval, FA terms, co-word analysis, and presents the relation between these fields. Also, this chapter discusses the methods of IR system evaluation. Chapter 2: Presents a review of the power link analysis, real word spell checker based on power links, and the main steps of this method and its applications in various fields. Also, it presents the traditional algorithm for calculating the levels of FA terms based on power link analysis and the methods to solve spelling errors by using the concept of power link. This survey reflects that the relation between these areas did not studied before. Chapter 3: Presents the modified algorithm for calculating the perfect FA terms, and presents the Continuity and Transition theme to detect the different parts of every document. Also, it presents Python language that used to write a program for the code of the modified system. Finally, it presents the experiments applied to a set of documents (scientific researches) and the comparison between the traditional and proposed methods that presented in this chapter, which helps in evaluating the system. Chapter 4: Concludes the thesis and lists important future work. |