[New Zealand Digital Library] [Computer Science Technical Reports] [Query Results] ---------------------------------------------------------------------------- 1 1 Structuring Fault-Tolerant Object Systems for Modularity in a Distributed Environment Santosh K. Shrivastava Department of Computing Science, University of Newcastle upon Tyne, Newcastle upon Tyne, NE1 7RU, UK. Daniel L. McCue Xerox Corporation Webster, New York 14850, USA. Abstract The object-oriented approach to system structuring has found widespread acceptance among designers and developers of robust computing systems. In this paper we propose a system structure for distributed programming systems that support persistent objects and describe how such properties as persistence, recoverability etc. can be implemented. The proposed structure is modular, permitting easy exploitation of any distributed computing facilities provided by the underlying system. An existing system constructed according to the principles espoused here is examined to illustrate the practical utility of the proposed approach to system structuring. Index Terms Object-Oriented Systems, Persistent Objects, Migration, Replication, Fault Tolerance, Distributed Systems, Atomic Actions, Atomic Transactions. The work reported here has been supported in part by grants from the UK Science and Engineering Research Council (Grant Numbers GR/F38402, GR/F06494 and GR/H81078) and ESPRIT projects ISA (Project Number 2267) and BROADCAST (Basic Research Project Number 6360) ---------------------------------------------------------------------------- 2 2 1. Introduction One computational model that has been advocated for constructing robust distributed applications is based upon the concept of using nested atomic actions (nested atomic transactions) controlling operations on persistent (long-lived) objects. In this model, each object is an instance of some class. The class defines the set of instance variables each object will contain and the operations or methods that determine the externally visible behaviour of the object. The operations of an object have access to the instance variables and can thus modify the internal state of that object. It is assumed that, in the absence of failures and concurrency, the invocation of an operation produces consistent (class specific) state changes to the object. Atomic actions can then be used to ensure that consistency is preserved even in the presence of concurrent invocations and failures. Designing and implementing a programming system capable of supporting such 'objects and actions' based applications by utilising existing distributed system services is a challenging task. Support for distributed computing on currently available systems varies from the provision of bare essential services, in the form of networking support for message passing, to slightly more advanced services for interprocess communication (e.g., remote procedure calls), naming and binding (for locating named services) and remote file access. The challenge lies in integrating these services into an advanced programming environment. In this paper we present an architecture which we claim to be modular in nature: the overall system functionality is divided into a number of modules which interact with each other through well defined narrow interfaces. We then describe how this facilitates the task of implementing the architecture on a variety of systems with differing support for distributed computing. In the next section, we present an 'object and action' model of computation, indicating how a number of distribution transparency mechanisms can be integrated within that model. Section 3 then identifies the major system components and their interfaces and the interactions of those components. The proposed system structure is based upon a retrospective examination of a distributed system - Arjuna [11, 23, 29] - built at Newcastle. The main aspects of this system are presented in section 4 and are examined in light of the discussion in the preceding section. In this section we also describe how the modular structure of the system has enabled us to port it on to a number of distributed computing platforms. The Arjuna system thus demonstrates the practicality of the proposed approach to system structuring discussed in section 3. ---------------------------------------------------------------------------- 3 3 2. Basic Concepts and Assumptions It will be assumed that the hardware components of the system are computers (nodes), connected by a communication subsystem. A node is assumed to work either as specified or simply to stop working (crash). After a crash, a node is repaired within a finite amount of time and made active again. A node may have both stable (crash-proof) and nonstable (volatile) storage or just non-stable storage. All of the data stored on volatile storage is assumed to be lost when a crash occurs; any data stored on stable storage remains unaffected by a crash. Faults in the communication subsystem may result in failures such as lost, duplicated or corrupted messages. Well known network protocol techniques are available for coping with such failures, so their treatment will not be discussed further. We assume that processes on functioning nodes are capable of communicating with each other. To develop our ideas, we will first describe some desirable transparency properties a distributed system should support. It is common to say that a distributed system should be 'transparent' which means that it can be made to behave, where necessary, like its nondistributed counterpart. There are several complementary aspects to transparency [1]: ? Access transparency mechanisms provide a uniform means of invoking operations of both local and remote objects, concealing any ensuing network-related communications; ? Location transparency mechanisms conceal the need to know the whereabouts of an object; knowing the name of an object is sufficient to be able to access it; ? Migration transparency mechanisms build upon the previous two mechanisms to support movement of objects from node to node to improve performance or faulttolerance; ? Concurrency transparency mechanisms ensure interference-free access to shared objects in the presence of concurrent invocations; ? Replication transparency mechanisms increase the availability of objects by replicating them, concealing the intricacies of replica consistency maintenance; ? Failure transparency mechanisms help exploit the redundancy in the system to mask failures where possible and to effect recovery measures. As stated earlier, we are considering a programming system in which application programs are composed out of atomic actions (atomic transactions) manipulating persistent ---------------------------------------------------------------------------- 4 4 (long-lived) objects. Atomic actions can be nested. We will be concerned mainly with tolerating 'lower-level' hardware related failures such as node crashes. So, it will be assumed that, in the absence of failures, the invocation of an operation produces consistent (class specific) state changes to the object. Atomic actions then ensure that only consistent state changes to objects take place despite failures. We will consider an application program initiated on a node to be the root of a computation. Distributed execution is achieved by invoking operations on objects which may be remote from the invoker. An operation invocation upon a remote object is performed via a remote procedure call (RPC). Since many object-oriented languages define operation invocation to be synchronous [30], RPC is a natural communications paradigm to adopt for the support of access transparency in object-oriented languages. Furthermore, all operation invocations may be controlled by the use of atomic actions which have the properties of (i) serialisability, (ii) failure atomicity, and (iii) permanence of effect. Serialisability ensures that concurrent invocations on shared objects are free from interference (i.e., any concurrent execution can be shown to be equivalent to some serial order of execution). Some form of concurrency control policy, such as that enforced by two-phase locking, is required to ensure the serialisability property of actions. Failure atomicity ensures that a computation will either be terminated normally (committed), producing the intended results (and intended state changes to the objects involved) or aborted producing no results and no state changes to the objects. This atomicity property may be obtained by the appropriate use of backward error recovery, which can be invoked whenever a failure occurs that cannot be masked. Typical failures causing a computation to be aborted include node crashes and communication failures such as the continued loss of messages. It is reasonable to assume that once a top-level atomic action terminates normally, the results produced are not destroyed by subsequent node crashes. This is ensured by the third property, permanence of effect, which requires that any committed state changes (i.e., new states of objects modified in the atomic action) are recorded on stable storage. A commit protocol is required during the termination of an atomic action to ensure that either all the objects updated within the action have their new states recorded on stable storage (committed), or, if the atomic action aborts, no updates get recorded [5, 13]. The object and atomic action model provides a natural framework for designing fault-tolerant systems with persistent objects. In this model, a persistent object not in use is normally held in a passive state with its state residing in an object store or object database and activated on demand (i.e., when an invocation is made) by loading its state and methods from the object store to the volatile store, and associating a server process for ---------------------------------------------------------------------------- 5 5 receiving RPC invocations. Atomic actions are employed to control the state changes to activated objects, and the properties of atomic actions given above ensure failure transparency. Atomic actions also ensure concurrency transparency, through concurrency control protocols, such as two-phase locking. Access transparency is normally provided by integrating an RPC pre-processor into the program development cycle which produces "stub" code for both the application and the object implementation. A variety of naming, binding and caching strategies are possible to achieve location and migration transparencies. Normally, the persistent state of an object resides on a single node in one object store, however, the availability of an object can be increased by replicating it and thus storing it in more than one object store. Object replicas must be managed through appropriate replica-consistency protocols to ensure that the object copies remain mutually consistent. In a subsequent section we will describe how such protocols can be integrated within action based systems to provide replication transparency. We assume some primitive features from a heterogeneous distributed system: (i) The state of any object can have a context independent representation (i.e., free of references to a specific address-space). This implies that objects can be de-activated for storage or transmission over a network. (ii) Executable versions of the methods of an object are available on all the nodes of interest. This implies that objects can be moved throughout the network simply by transmitting their states. (iii) Machine-independent representations of data can be obtained for storage or transmission. This requirement is related to, but distinct from (i) in that this property enables interpretation of the passive state of an object in an heterogeneous environment. Several prototype object-oriented systems have been built, often emphasising different facets of the overall functionality. For example, systems such as Argus [14], Arjuna [11, 23, 29], SOS [28] and Guide [4] have emphasised fault-tolerance and distribution aspects, languages such as PS-Algol [3], Galileo [2] and E [25] have contributed to our understanding of persistence as a language feature, while efforts such as [12] have contributed to the understanding of the design of object stores and their relationship to database systems. We build on these efforts and describe the necessary features of a modular distributed programming system supporting persistent objects. ---------------------------------------------------------------------------- 6 6 3. System Structure 3.1. Computation model and System modules With the above discussion in mind, we will first present a simple client-server based model for accessing and manipulating persistent objects and then identify the main system modules necessary for supporting the model. As stated earlier, we will consider an application program initiated on a single node to be the root of a computation; distributed execution is achieved by invoking operations on objects which may be remote from the invoker. We assume that for each persistent object there is at least one node (say a) which, if functioning, is capable of running an object server which can execute the operations of that object (in effect, this would require that a has access to the executable binary of the code for the object's methods as well as the persistent state of the object stored on some object store). Before a client can invoke an operation on an object, it must first be connected or bound to the object server managing that object. It will be the responsibility of a node, such as a, to provide such a connection service to clients. If the object in question is in a passive state, then a is also responsible for activating the object before connecting the requesting client to the server. In order to get a connection, an application program must be able to obtain location information about the object (such as the name of the node where the server for the object can be made available). We assume that each persistent object possesses a unique, system given identifier (UID). In our model an application program obtains the location information in two stages: (i) by first presenting the application level name of the object (a string) to a globally accessible naming service; assuming the object has been registered with the naming service, the naming service maps this string to the UID of the object; (ii) the application program then presents the UID of the object to a globally accessible binding service to obtain the location information. Once an application program (client) has obtained the location information about an object it can request the relevant node to establish a connection (binding) to the server managing that object. The typical structure of an application level program is shown below: ; ; In our model, bindings are not stable (do not survive the crash of the client or server). Bindings to servers are created as objects enter the scope in the application program. If some bound server subsequently crashes then the corresponding binding is broken and not repaired within the lifetime of the program (even if the server node is functioning again); all the surviving bindings are explicitly broken as objects go out of the ---------------------------------------------------------------------------- 7 7 scope of the application program. An activated object which is no longer in use - because it is not within the scope of any client application - will not have any clients bound to its server; this object can be de-activated simply by destroying the association between the object and the server process, and discarding the volatile image of the object (recall that the object will always have its latest committed state stored in some stable object store). The disk representation of an object in the object store may differ from its volatile store representation (e.g., pointers may be represented as offsets or UIDs). Our model assumes that an object is responsible for providing the relevant state transformation operations that enable its state to be stored and retrieved from the object store. The server of an activated object can then use these operations during abort or commit processing. Further, we assume that each object is responsible for performing appropriate concurrency control to ensure serialisability of atomic actions. In effect this means that each object will have a concurrency control object associated with it. In the case of locking, each method of an object will have an operation for acquiring, if necessary, a (read or write) lock from the associated 'lock manager' object before accessing the object's state; the locks are released when the commit/abort operations are executed. We can now identify the main modules of a distributed programming system, and the services they provide for supporting persistent objects. ? Atomic Action module: provides atomic action support to application programs in form of operations for starting, committing and aborting atomic actions; ? RPC module: provides facilities to clients for connecting (disconnecting) to object servers and invoking operations on objects; ? Naming module: provides a mapping from user-given names of objects to UIDs; ? Binding module: provides a mapping from UIDs to location information such as the identity of the host where the server for the object can be made available; ? Persistent Object Support module: provides object servers and access to stable storage for objects. The relationship amongst these modules is depicted in Figure 1. Every node in the system will provide RPC and Atomic Action modules. Any node capable of providing object servers and/or (stable) object storage will in addition contain a Persistent Object Support module. A node containing an object store can provide object storage services via ---------------------------------------------------------------------------- 8 8 its Persistent Object Support module. Nodes without stable storage may access these services via their local RPC module. Naming and Binding modules are not necessary on every node since their services can also be utilised through the services provided by the RPC module. . . . ----------------------------------------------------------------------------