descargar 74.15 Kb.
|
Legion: Lessons Learned Building a Grid Operating System Andrew S. Grimshaw Anand Natrajan Department of Computer Science University of Virginia Abstract: Legion was the first integrated grid middleware architected from first principles to address the complexity of grid environments. Just as a traditional operating system provides an abstract interface to the underlying physical resources of a machine, Legion was designed to provide a powerful virtual machine interface layered over the distributed, heterogeneous, autonomous and fault-prone physical and logical resources that constitute a grid. We believe that without a solid, integrated, operating system-like grid middleware, grids will fail to cross the chasm from bleeding-edge supercomputing users to more mainstream computing. This paper provides an overview of the architectural principles that drove Legion, a high-level description of the system with complete references to more detailed explanations, and the history of Legion from first inception in August of 1993 through commercialization. We present a number of important lessons, both technical and sociological, learned during the course of developing and deploying Legion. 1IntroductionGrids (once called Metasystems [20-23]) are collections of interconnected resources harnessed together in order to satisfy various needs of users [24, 25]. The resources may be administered by different organizations and may be distributed, heterogeneous and fault-prone. The manner in which users interact with these resources as well as the usage policies for the resources may vary widely. A grid infrastructure must manage this complexity so that users can interact with resources as easily and smoothly as possible. Our definition, and indeed a popular definition, is: A grid system, also called a grid, gathers resources – desktop and hand-held hosts, devices with embedded processing resources such as digital cameras and phones or tera-scale supercomputers – and makes them accessible to users and applications in order to reduce overhead and accelerate projects. A grid application can be defined as an application that operates in a grid environment or is "on" a grid system. Grid system software (or middleware), is software that facilitates writing grid applications and manages the underlying grid infrastructure. The resources in a grid typically share at least some of the following characteristics:
The above definitions of a grid and a grid infrastructure are necessarily general. What constitutes a "resource" is a deep question, and the actions performed by a user on a resource can vary widely. For example, a traditional definition of a resource has been "machine", or more specifically "CPU cycles on a machine". The actions users perform on such a resource can be "running a job", "checking availability in terms of load", and so on. These definitions and actions are legitimate, but limiting. Today, resources can be as diverse as "biotechnology application", "stock market database" and "wide-angle telescope", with actions being "run if license is available", "join with user profiles" and "procure data from specified sector" respectively. A grid can encompass all such resources and user actions. Therefore a grid infrastructure must be designed to accommodate these varieties of resources and actions without compromising on some basic principles such as ease of use, security, autonomy, etc. A grid enables users to collaborate securely by sharing processing, applications and data across systems with the above characteristics in order to facilitate collaboration, faster application execution and easier access to data. More concretely this means being able to:
This paper describes one of the major Grid projects of the last decade – Legion – from its roots as an academic Grid project to its current status as the only commercial complete Grid offering [3, 5, 6, 8-11, 14, 17-19, 22, 23, 26-29, 31-53]. Legion is built on the decades of research in distributed and object-oriented systems, and borrows many, if not most, of its concepts from the literature [54-88]. Rather than re-invent the wheel, the Legion team sought to combine solutions and ideas from a variety of different projects such as Eden/Emerald [54, 59, 61, 89], Clouds [73], AFS [78], Coda [90], CHOICES [91], PLITS [69], Locus [82, 87] and many others. What differentiates Legion from its progenitors is the scope and scale of its vision. While most previous projects focus on a particular aspect of distributed systems such as distributed file systems, fault-tolerance, or heterogeneity management, the Legion team strove to build a complete system that addressed all of the significant challenges presented by a grid environment. To do less would mean that the end-user and applications developer would need to deal with the problem. In a sense, Legion was modeled after the power grid system – the underlying infrastructure manages all the complexity of power generation, distribution, transmission and fault-management so that end-users can focus on issues more relevant to them, such as which appliance to plug in and how long to use it. Similarly, Legion was designed to operate on a massive scale, across wide-area networks, and between mutually-distrustful administrative domains, while most earlier distributed systems focused on the local area, typically a single administrative domain. Beyond merely expanding the scale and scope of the vision for distributed systems, Legion contributed technically in a range of areas as diverse as resource scheduling and high-performance I/O. Three of the more significant technical contributions were 1) the extension of the traditional event model to ExoEvents [13], 2) the naming and binding scheme that supports both flexible semantics and lazy cache coherence [11], and 3) a novel security model [16] that started with the premise that there is no trusted third party. What differentiates Legion first and foremost from its contemporary Grid projects such as Globus1 [92-99] is that Legion was designed and engineered from first principles to meet a set of articulated requirements, and that Legion focused from the beginning on ease-of-use and extensibility. The Legion architecture and implementation was the result of a software engineering process that followed the usual form of:
This is in contrast to the approach used in other projects of starting with some basic functionality, seeing how it works, adding/removing functionality, and iterating towards a solution. Secondly, Legion focused from the very beginning on the end-user experience via the provisioning of a transparent, reflective, abstract virtual machine that could be readily extended to support different application requirements. In contrast, the Globus approach was to provide a basic set of tools to enable the user to write grid applications, and manage the underlying tools explicitly. The remainder of this paper is organized as follows. We begin with a discussion of the fundamental requirements for any complete Grid architecture. These fundamental requirements continue to guide the evolution of our Grid software. We then present some of the principles and philosophy underlying the design of Legion. We then introduce some of the architectural features of Legion and delve slightly deeper into implementation in order to give an understanding of grids and Legion. Detailed technical descriptions exit elsewhere in the literature and are cited. We then present a brief history of Legion and its transformation into a commercial grid product, Avaki 2.5. We then present the major lessons, not all technical, learned during the course of the project. We then summarize with a few observations on trends in grid computing. Keep in mind that the objective here is not to provide a detailed description of Legion, but to provide a perspective with complete references to papers that provide much more detail. |