Title page for ETD etd-04282011-135154

Type of Document Master's Thesis
Author Thota, Abhinav S.
Author's Email Address athota1@tigers.lsu.edu
URN etd-04282011-135154
Title Efficient Large-Scale Replica-Exchange Simulations on Distributed Production Infrastructure
Degree Master of Science (M.S.)
Department Computer Science
Advisory Committee
Advisor Name Title
Shantenu Jha Committee Chair
Gabrielle Allen Committee Co-Chair
Supratik Mukhopadhyay Committee Member
  • Teragrid
  • LONI
  • MD
  • replica-exchange
  • HPC
  • NAMD
Date of Defense 2011-04-25
Availability unrestricted
Replica-Exchange (RE) methods represent a class of algorithms that involve a large number of loosely-coupled ensembles and are used to understand physical phenomena -- ranging from protein folding dynamics to binding affinity calculations. We develop a framework for RE that supports different replica pairing and coordination mechanisms, that can use a wide range of production cyberinfrastructure concurrently.

Additionally, our framework uses a flexible pilot-job implementation, which enables effective resource allocation for multiple replicas.

We characterize the performance of two different RE algorithms - synchronous and asynchronous - at unprecedented scales on production distributed infrastructure (Teragrid and LONI). The synchronous RE algorithm is implemented with a centralized master, while the asynchronous RE algorithm is implemented with both centralized and decentralized replica management schemes.

We evaluate the performance of the different algorithms and implementations when we scale-up the number of replicas (up to 256) on a single machine and when we scale-out across 2 and 4 machines. Both the synchronous and asynchronous algorithms perform similarly when the number of replicas is small. But as the number of replicas increase, in the synchronous RE, the synchronization cost increases the total time to completion. In the centralized asynchronous RE, the cost of managing many replicas in a centralized manner increases the time to completion but not as much as in the synchronous RE. The decentralized asynchronous RE scales much better with increasing number of replicas. When scaled-out across many machines, the performance of synchronous RE depends on whether the machines are homogeneous or heterogeneous. A heterogeneous infrastructure means increased synchronization costs. We also run tests to see if one of the algorithms is better suited to achieve more crosswalks and temperature mixing -- better sampling.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  thota_thesis.pdf 580.22 Kb 00:02:41 00:01:22 00:01:12 00:00:36 00:00:03

Browse All Available ETDs by ( Author | Department )

If you have questions or technical problems, please Contact LSU-ETD Support.