Type of Document Dissertation Author Grant, Kevin Paul URN etd-1109103-231721 Title Machine Learning Techniques for Efficient Query Processing in Knowledge Base Systems Degree Doctor of Philosophy (Ph.D.) Department Computer Science Advisory Committee
Advisor Name Title Jianhua Chen Committee Chair Donald Kraft Committee Member Robert Mathews Committee Member Sukhamay Kundu Committee Member Evangelos Triantaphyllou Dean's Representative Keywords
- machine learning
- knowledge base systems
- probabilistic heuristic estimates
- query processing
Date of Defense 2003-10-16 Availability unrestricted AbstractIn this dissertation we propose a new technique for efficient query processing in knowledge base systems. Query processing in knowledge base systems poses strong computational challenges because of the presence of combinatorial explosion. This arises because at any point during query processing there may be too many subqueries available for further exploration. Overcoming this difficulty requires effective mechanisms for choosing from among these subqueries good subqueries for further processing.
Inspired by existing works on stochastic logic programs, compositional modeling and probabilistic heuristic estimates we create a new, nondeterministic method to accomplish the task of subquery selection for query processing. Specifically, we use probabilistic heuristic estimates to make the necessary decisions. This approach combines subquery and knowledge base properties and previous query processing experience with conditional probability theory to derive a probability of success for each subquery. The probabilities of success are used to select the next subquery for further processing. The underlying, property-specific probabilities of success are learned via a machine learning process involving a set of training sample queries.
In this dissertation we present our new methodology and the algorithms used to accomplish both the training and query processing phases of the system. We also present a method for determining the minimum training set size needed to achieve probability estimates with any desired limit on the maximum size of the errors.
Filename Size Approximate Download Time (Hours:Minutes:Seconds)
28.8 Modem 56K Modem ISDN (64 Kb) ISDN (128 Kb) Higher-speed Access Grant_dis.pdf 425.83 Kb 00:01:58 00:01:00 00:00:53 00:00:26 00:00:02