Title page for ETD etd-11172006-091519


Type of Document Master's Thesis
Author Mohammed, Riyazuddin
Author's Email Address rmoham1@lsu.edu
URN etd-11172006-091519
Title Information Analysis of DNA Sequences
Degree Master of Science (M.S.)
Department Electrical & Computer Engineering
Advisory Committee
Advisor Name Title
Subhash C Kak Committee Chair
Hsiao-Chun Wu Committee Member
Xue-Bin Liang Committee Member
Keywords
  • correlation
  • entropy
  • randomness
  • information theory
  • junk DNA
  • divergence
Date of Defense 2006-10-19
Availability unrestricted
Abstract
The problem of differentiating the informational content of coding (exons) and non-coding (introns) regions of a DNA sequence is one of the central problems of genomics. The introns are estimated to be nearly 95% of the DNA and since they do not seem to participate in the process of transcription of amino-acids, they have been termed “junk DNA.” Although it is believed that the non-coding regions in genomes have no role in cell growth and evolution, demonstration that these regions carry useful information would tend to falsify this belief. In this thesis, we consider entropy as a measure of information by modifying the entropy expression to take into account the varying length of these sequences. Exons are usually much shorter in length than introns; therefore the comparison of the entropy values needs to be normalized. A length correction strategy was employed using randomly generated nucleonic base strings built out of the alphabet of the same size as the exons under question. The distance between exons and introns is calculated based on their probability distributions. We found that Zipf’s distribution was not followed by the n-tuples in DNA sequences, and a newly modified power distribution derived from the Zipf’s distribution was found by trial and error that closely modeled the codon frequencies. Correlation and divergence tests were performed. Our analysis shows that introns carry nearly as much of information as exons, disproving the notion that they do not carry any information. The entropy findings of this thesis are likely to be of use in further study of other challenging works like the analysis of symmetry models of the genetic code.
Files
  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Mohammed_thesis.pdf 358.43 Kb 00:01:39 00:00:51 00:00:44 00:00:22 00:00:01

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LSU-ETD Support.