Title page for ETD etd-01042007-194007


Type of Document Dissertation
Author Punuru, Janardhana R
URN etd-01042007-194007
Title Knowledge-Based Methods for Automatic Extraction of Domain-Specific Ontologies
Degree Doctor of Philosophy (Ph.D.)
Department Computer Science
Advisory Committee
Advisor Name Title
Jianhua Chen Committee Chair
Donald H. Kraft Committee Member
Guoli Ding Committee Member
Sukhamay Kundu Committee Member
Edward F. Watson Dean's Representative
Keywords
  • wordnet
  • machine learning
  • semantic relations extraction
  • concept extraction
  • ontology learning
  • Knowledge-based system
Date of Defense 2006-12-08
Availability unrestricted
Abstract
Semantic web technology aims at developing methodologies for representing large amount of knowledge in web accessible form. The semantics of knowledge should be easy to interpret and understand by computer programs, so that sharing and utilizing knowledge across the Web would be possible. Domain specific ontologies form the basis for knowledge representation in the semantic web. Research on automated development of ontologies from texts has become increasingly important because manual construction of ontologies is labor intensive and costly, and, at the same time, large amount of texts for individual domains is already available in electronic form. However, automatic extraction of domain specific ontologies is challenging due to the unstructured nature of texts and inherent semantic ambiguities in natural language. Moreover, the large size of texts to be processed renders full-fledged natural language processing methods infeasible.

In this dissertation, we develop a set of knowledge-based techniques for automatic extraction of ontological components (concepts, taxonomic and non-taxonomic relations) from domain texts. The proposed methods combine information retrieval metrics, lexical knowledge-base(like WordNet), machine learning techniques, heuristics, and statistical approaches to meet the challenge of the task. These methods are domain-independent and automatic approaches.

For extraction of concepts, the proposed WNSCA+{PE, POP} method utilizes the lexical knowledge base WordNet to improve precision and recall over the traditional information retrieval metrics. A WordNet-based approach, the compound term heuristic, and a supervised learning approach are developed for taxonomy extraction. We also developed a weighted word-sense disambiguation method for use with the WordNet-based approach. An unsupervised approach using log-likelihood ratios is proposed for extracting non-taxonomic relations. Further more, a supervised approach is investigated to learn the semantic constraints for identifying relations from prepositional phrases. The proposed methods are validated by experiments with the Electronic Voting and the Tender Offers, Mergers, and Acquisitions domain corpus. Experimental results and comparisons with some existing approaches clearly indicate the superiority of our methods.

In summary, a good combination of information retrieval, lexical knowledge base, statistics and machine learning methods in this study has led to the techniques efficient and effective for extracting ontological components automatically.

Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  punuru_thesis.pdf 647.14 Kb 00:02:59 00:01:32 00:01:20 00:00:40 00:00:03

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LSU-ETD Support.