Title page for ETD etd-06082010-092441

Type of Document Dissertation
Author Suslu, Ibrahim Hakki
Author's Email Address ihsuslu@cct.lsu.edu, isuslu1@lsu.edu
URN etd-06082010-092441
Title Choosing Between Remote I/O versus Staging in Distributed Environment
Degree Doctor of Philosophy (Ph.D.)
Department Computer Science
Advisory Committee
Advisor Name Title
Kosar, Tevfik Committee Chair
Gabrielle, Allen Committee Member
Karki, Bijaya Committee Member
Van Scotter, James R Committee Member
Ishak, Sherif Dean's Representative
  • distributed computing
  • grid computing
  • data grid
  • remote I/O
  • staging
Date of Defense 2010-05-17
Availability unrestricted
Today, scientifi c applications and experiments have become increasingly complex and more demanding in terms of their computational and data requirements. The amount of data generated and used has grown at a very rapid rate. As tens or hundreds of terabytes of data for a single application is

very common today; petabytes and even exabytes of data will be very common in a few years. One of the major challenges in distributed computing environments is how to access these large datasets remotely over the network.

Data staging and remote I/O are the most widely used data access methods for distributed applications. Application developers generally chose one over the other intuitively without making any scienti fic comparison specifi c to their applications since there is no generic model available that they

can use.

In this thesis, we develop generic models and set guidelines for the application developers which would help them to choose the most appropriate data access method for their application. We de fine the parameters that potentially aff ect the end-to-end performance of the distributed applications which need to access remote data.

To achieve our goal, we implement a series of synthetic benchmark applications to simulate di fferent data access patterns. We run these benchmark applications on diff erent distributed computing settings with di fferent parameters, such as network bandwidth, server and client capabilities, and

data access ratio. We also use di fferent remote I/O protocols to show the importance of the protocol in making a decision. We use regression analysis to develop applicable generic models for comparing diff erent data access methods, and test our models in a real life application.

The main contribution of our thesis is generic models that can be applied to most data-intensive distributed applications to decide the best data access technique for those applications. Our models provide the scientists and application developers an opportunity to choose the best data access method before actually running the application.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  susludiss.pdf 3.46 Mb 00:16:00 00:08:14 00:07:12 00:03:36 00:00:18

Browse All Available ETDs by ( Author | Department )

If you have questions or technical problems, please Contact LSU-ETD Support.