Juno: Reconfigurable Middleware for Heterogeneous Content Networking
Co-authored with Gareth Tyson, Andreas Mauthe, and Thomas Plagemann
53 views
Seen by:5 views
Seen by:Data Decomposition in Biomedical e-Science Applications
Y. Mohammed, S. Shahand, V. Korkhov, A. Luyf, B. van Schaik, M. Caan, A. van Kampen, M. Palmblad, and S. Olabarriaga. In Proceedings of the 7th IEEE International Conference on e-Science, 2011
As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains... more As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains dependent on the architecture of the used e-Science infrastructure. Such architecture is in general job-driven, i.e., a (grid) job is a sequence of commands that run on the same worker node. Making use of the infrastructure involves having a parallelized application. This is done foremost by data decomposition. In general practice of parallel programming, data decomposition depends on the programmer’s experience and knowledge about the used data and the algorithm/application. On the other hand, data mining scientists have an established foundation for data decomposition; automatic decomposition methods are already in use, methodologies and patterns are defined. Our experience in porting biomedical applications to the Dutch e-Science infrastructure shows that the used data decomposition to gain parallelism fit to some degree a subgroup of the data mining decomposition patterns, i.e., object set decomposition. In this paper we discuss porting three biomedical packages to a grid computing environment, two for medical imaging and one for DNA sequencing. We show how the data access of the applications was reengineered around the executables to make use of the parallel capacity of e-Science infrastructure.
DynaSched: a dynamic Web service scheduling and deployment framework for data-intensive Grid workflows
S. Shahand, S. Turner, W. Cai, and M. Khademi. Procedia Computer Science, 2010.
Grid computing boosts productivity by maximizing resource utilization and simplifying access to resources which are... more
Grid computing boosts productivity by maximizing resource utilization and simplifying access to resources which are shared among virtual organizations. Recently, the Grid and Web Service communities have established a set of common interests and requirements. The latest version of the Globus Toolkit implements the Web Service Resource Framework (WSRF) specifications which have been formulated to cover these interests. We leverage the Globus Toolkit to address some limitations in supporting the dynamic nature of large-scale Grid and data-intensive workflow executions.
Dynamic Web Service deployment fits well into the dynamic nature of the Grid and opens new ways of managing workflow executions on the Grid. In this article, we present the design and evaluation of a dynamic Web Service scheduling and deployment framework (DynaSched) that supports the workflow management of dynamic services. Dynamic Web Service deployment on the Grid allows jobs to be executed on the same site as where the input data is located. The empirical studies show that the designed framework decreases data-intensive workflow execution time by minimizing communication costs. We argue that the framework ensures more flexible, fault-tolerant workflows. The system is based on Open Grid Services Architecture specifications and is WSRF-compliant.
Front-ends to Biomedical Data Analysis on Grids
S. Shahand, M. Santcroos, Y. Mohammed, V. Korkhov, A. C. M. Luyf, A. van Kampen and S. Olabarriaga. In Proceedings of the HealthGrid, 2011.
The e-infrastructure for bioscience (e-BioInfra) is a platform integrating various services and middleware to... more The e-infrastructure for bioscience (e-BioInfra) is a platform integrating various services and middleware to facilitate access to grid resources for biomedical researchers at the Academic Medical Center of the University of Amsterdam. In the past six years the user interfaces with the e-BioInfra have evolved from command-line interfaces to a Java desktop application, and later to an easy-to-use web application for selected biomedical data analysis. This evolution represents improvements to accommodate the requirements of a broader range of biomedical researchers and applications. In this paper we present the current user interfaces and analyse their usage considering the typical biomedical data analysis on the e-BioInfra, the roles assumed by the users in the various phases of data analysis life-cycle, and the user profiles. We observe that in order to support a wide spectrum of user profiles, with different expertise and requirements, a platform must offer a variety of user interfaces addressed to each user profile.
Provenance for distributed biomedical workflow execution
S. Madougou, M. Santcroos, A. Benabdelkader, B.D.C. van Schaik, S. Shahand, V. Korkhov, A.H.C. van Kampen, S.D. Olabarriaga. In Proceedings of HealthGrid 2012 (HealthGrid Applications and Technologies Meet Science Gateways for Life Sciences).
Scientific research has become very data and compute intensive because of the progress in data acquisition and... more Scientific research has become very data and compute intensive because of the progress in data acquisition and measurement devices, which is particularly true in Life Sciences. To cope with this deluge of data, scientists use distributed computing and storage infrastructures. The use of such infrastructures introduces by itself new challenges to the scientists in terms of proper and efficient use. Scientific workflow management systems play an important role in facilitating the use of the infrastructure by hiding some of its complexity. Althought most scientific workflow management systems are provenance-aware, not all of them come with provenance functionality out of the box. In this paper we describe the improvement and integration of a provenance system into an e-infrastructure for biomedical research based on the MOTEUR workflow management system. The main contributions of the paper are: presenting an OPM implementation using relational database backend for the provenance store, providing an e-infrastructure with a comprehensive provenance system, defining a generic approach to provenance implementation, potentially suitable for other workflow systems and application domains and demonstrating the value of this system based on use cases presenting the provenance data through a user-friendly web interface.
Evolution of grid-based services for Diffusion Tensor Image analysis
M.W.A. Caan, S. Shahand, F.M. Vos, A.H.C. van Kampen, S.D. Olabarriaga. In Journal of Future Generation Computer Systems, 2012.
Analyzing Diffusion Tensor Image data of the human brain of large study groups is complex and demands new,... more Analyzing Diffusion Tensor Image data of the human brain of large study groups is complex and demands new, sophisticated and computationally intensive pipelines that can efficiently be executed. We present our progress over the past five years in the development and porting of the DTI analysis pipeline to a grid infrastructure. Starting with simple jobs submitted from the command-line, we moved towards a workflow-based implementation and finally into the e-BioInfra Gateway, which offers a web interface for the execution of selected biomedical data analysis software on the Dutch Grid. This gateway is currently being actively used by neuroscientists and for educational purposes.
Distributed Execution of Workflow Using Parallel Partitioning
M. Hedayat, W. Cai, S. Turner, and S. Shahand. In IEEE International Symposium on Parallel and Distributed Processing with Applications, 2009.
Grid computing is a fundamental technology for large scale distributed resource sharing. Workflow management is... more Grid computing is a fundamental technology for large scale distributed resource sharing. Workflow management is becoming one of the most important Grid services. A lot of research work has been done on different issues involved in workflow management systems. The focus of this paper is on three areas: workflow partitioning, enactment and data movement. A new workflow management system called parallel and distributed workflow management system (PDWMS) is proposed. In this system the execution of workflow is done by a network of collaborative engines. To achieve this target, the original abstract workflow (input of the system) is partitioned into parallel parts, using a new proposed partitioning algorithm. PDWMS’s data movement, which is categorized into local and global models, uses a peer-to-peer approach.

