Uncovering Database Access Optimizations in the Middle Tier with TORPEDO
A popular architecture for enterprise applications is one of a stateless object-based server accessing persistent data through Object-Relational mapping software. The reported benefits of using Object-Relational mapping software are increased developer productivity, greater database portability and improved runtime performance over hand-written SQL due to caching. In spite of these supposed benefits, many software architects are suspicious of the "black box" nature of O-R mapping software. Discerning how O-R mapping software actually accesses a database is difficult.
The Testbed of Object Relational Products for Enterprise Distributed Objects (TORPEDO) is designed to reveal the sophistication of O-R mapping software in accessing databases in single server and clustered environments. TORPEDO defines a set of realistic application level operations that detect a significant set of database access optimizations. TORPEDO supports two standard Java APIs for O-R mapping, namely, Container Managed Persistence (CMP 2.0) and Java Data Objects (JDO). TORPEDO also supports the TopLink and Hibernate APIs. There are dozens of commercial and open-source O-R mapping products supporting these APIs. Results from running TORPEDO on different O-R mapping systems are comparable.
We provide sample results from running TORPEDO on popular O-R mapping solutions. We describe why the optimizations TORPEDO reveals are important and how the application level operations detect the optimizations.
[Appeared in Proceedings of the 21st IEEE International Conference on Data Engineering. April 2005. Tokyo, Japan.]
ACID is Good. Take it in Short Doses.ACID transactions are extremely important for creating reliable distributed applications. This article is about why ACID is good for you, why ACID doesnt work in long doses, why you shouldnt give up and what concepts, models and technologies you can take in longer doses.
[With Mark Little. Appeared on The Server Side. October, 2004]
Distributed Xbean Applications
XML has emerged as the universal standard for exchanging and externalizing data. Software products of all kinds are being upgraded to "support XML." Typically this means they can import and export XML data.But just defining standard representations for exchanging data is insufficient. The data need to be integrated with existing applications and databases and processed by programs written in some programming language. This paper describes distributed applications constructed from Xbeans. Xbeans are Java Beans that manipulate XML data. With the appropriate set of Xbeans and a Java Bean design tool, it is possible to build useful distributed applications with little or no programming.
[Appeared in Proceedings of the 2nd International Symposium on Distributed Objects and Applications, published by IEEE Computer Society Press, September 2000]
Lowering the bar of the DOM API
A few easy steps to begin accessing XML data in JavaXML is a popular way to represent data in a portable, vendor-neutral, readable format. The Document Object Model (DOM) is an application programmer's interface to XML data. Unfortunately, the DOM is a fairly complex API with a high learning curve. But if you know the DTD of the data you are accessing, it's not too difficult. This article illustrates a few easy steps to begin accessing XML data using the DOM in Java.
[IBM DeveloperWorks, March, 2000]
Build distributed applications with Java and XML
XML is a popular way to represent data in a portable, vendor-neutral, readable format. But what if you need to send XML data across a process boundary in a distributed application? Bruce Martin examines three approaches to accomplishing that in Java.
[JavaWorld, February, 2000]
Asynchronous Notifications Among Distributed Objects.
Distributed object systems typically support synchronous requests from one distributed object to another. Often, a more decoupled style of communication among distributed objects is appropriate. We describe an object service called event channel that decouples distributed object communication. We describe SunSofts implementation of the event channel and illustrate its use in a stock trading application.
[With Yeturu Aahlad, Mod Marathe and Chung Le. In Proceedings of the 2nd USENIX Conference on Object-Oriented Technologies and Systems, June 17, 1996, Toronto, Canada.]
Many relational and object-oriented database systems provide referential integrity and compound operations on related objects using relationship mechanisms. Distributed object systems are emerging to support applications that access objects across distributed, heterogeneous system boundaries. Because the fundamental assumptions of distributed, heterogeneous, federated computing systems differ from database systems, supporting object relationships in such an environment requires different approaches to the representation and manipulation of relationships than those traditionally used in database systems. This paper describes the Relationship Service for SunSofts Distributed Object Environment (DOE). We describe the fundamental assumptions of distributed object systems and motivate our design in that context.
[With R. G. G. Cattell. In Proceedings of the 20th International Conference on Very Large Data Bases, edited by Bocca, Jauke and Zaniolo, published by Morgan Kaufmann. September, 1994. pg. 730-739.]
COSS: The Common Object Services Specifications.
The Object Management Group (OMG) is promoting standards for distributed object systems among system software vendors. The OMG has currently adopted two sets of standards, known as CORBA and COSS. CORBA is the core communication mechanism which all OMG objects use: it enables distributed objects to operate on each other. COSS defines standard services that support the integration of distributed objects.
[In Proceedings of the SIGMOD Conference 1994: 479]
Long-lived Concurrent Activities.
Transactions provide programmers with execution guarantees that simplify the task of implementing concurrent activities. However, the traditional transaction model does not apply to long-lived concurrent activities. We present five examples of such activities. For each example we argue why the traditional transaction model does not apply to the activity. Through the examples we explain how the traditional transaction model can be extended to support the long-lived concurrent activities. We identify a dichotomy of extensions: three that preserve transaction isolation and one that isolates cooperation. The first three extensions provide transaction guarantees more flexibly by capturing application semantics and structure. The last extension allows localized cooperation between a group of transactions.
[With Claus Pedersen. In Distributed Object Management, edited by Ozsu, Dayal and Valduriez, published by Morgan Kaufmann. Also in Proceedings of the International Workshop on Distributed Object Management, August, 1992, Edmonton, Canada.]
An Object-Based Taxonomy of Distributed Computing Systems.
A hierarchy of questions and answers about the features of distributed computing systems leads to an overall system description and facilitates system comparisons.
[With Claus Pedersen and James-Bedford Roberts. In Readings in Distributed Computing Systems, published by IEEE Computer Society Press. Also in IEEE Computer special issue on distributed computing systems, August, 1991.]
The Separation of Interface and Implementation in C++.
A C++ class declaration combines the external interface of an object with the implementation of that interface. It is desirable to be able to write client code that depends only on the external interface of a C++ object and not on its implementation. Although C++ encapsula- tion can hide the implementation details of a class from client code, the client must refer to the class name and thus depends on the implied implementation as well as its interface.
In this paper, we review why separating an object's interface from its implementation is desirable and present a C++ programming style supporting a separate interface lattice and multiple implementation lattices. We describe minor language extensions that make the dis- tinction between the interface lattice and implementation lattice apparent to the C++ programmer. Implementations are combined using standard C++ multiple inheritance. The operations of an interface are given by the union of operations of its contained interfaces. Variables and parameters are typed by inter- faces. We describe how a separate interface lat- tice and multiple implementation lattices are realized in standard C++ code.
[ In The Evolution of C++, edited by Jim Waldo, published by MIT press. Also in Proceedings of the 3rd USENIX C++ Conference, April, 1991, Washington, D.C.]
Transaction Guarantees and Long-lived Concurrent Activities.
Transactions provide programs with the guarantees of isolation and failure atomicity. These are desirable guarantees to provide application programmers in a distributed environment since concurrency and partial failures are inherent. Programmers of object-oriented applications that execute in environments supporting transaction guarantees need not be concerned with concurrent access to shared objects.
We present an example of two long-lived concurrent activities, moving and editing a large document in a distributed object-oriented environment. We use this example to discuss the inadequacies of the traditional model and to discuss requirements for flexibly providing transaction guarantees to a class of long-lived object-oriented applications.
[With Claus Pedersen. In Proceedings of the 1990 OOPSLA/ECOOP Workshop on Transactions and Objects, Ottawa, Canada, October, 1990. Edited by Martin and Ramamritham.]
Experience with PARPC.
PARPC provides an interprocess communication mechanism based on the semantics of a procedure call. PARPC programs always execute a single logical thread of control but may execute multiple physical threads of control. PARPC provides users with a well defined, high level network process model of execution and a familiar program development model supporting heterogeneous, non-uniform environments. The administrative overhead of PARPC is minimal because users administer their own distributed programs and existing UNIX mechanisms for access control and resource accounting are utilized. Our experiences indicate that PARPC has been an effective system for the development and administration of distributed programs.
[With Walter Burkhard, Jehan-Francois Paris and Charles Bergan. In Proceedings of the 1989 Winter USENIX Technical Conference, San Diego, California, February, 1989.]
Concurrency Control vs. Concurrent Programming..
[In Proceedings of the Workshop on Object-based Concurrent Programming, 1988 OOPSLA Conference, San Diego, California, September, 1988.]
Concurrent Nested Object Computations.
This dissertation presents nested objects, a new model of concurrency that considers levels of data abstraction. Shared nested objects are instances of abstract data types that are implemented by other, lower level, shared nested objects. Nested objects are more general than transactions. Traditional one level read-write transaction models and atomic object models are special cases of the nested object model.
Nested objects allow two novel types of computations that are usually considered non-serializable and thus incorrect. Externally serializable computations leave top level objects in states that could be produced by some serial execution of the computations. However, lower level objects may be left in states that could never be produced by serial executions of the the computations.
Semantically verifiable nested object computations are truly nonserializable. Objects at all levels can be left in states that no serial execution of the computations could produce. Since operation semantics are visible at all levels, the correctness of nonserializable computations can be argued. Semantically verifiable, nonserializable computations are achieved by weakening conflict specifications.
A practical nested object scheduling algorithm is presented. An experimental implementation demonstrates that maintaining additional semantic information is manageable and beneficial. The algorithm provides a performance improvement over two-phase locking for a benchmark program.
Nested objects are an attractive paradigm for modeling concurrent activities. Nested objects retain the desirable specification properties of transactions but are more general; end-less computations can be modeled and long lived computations can execute efficiently. Traditional program verification techniques can be used to show the correctness of nested object specifications. The modeling capabilities of nested objects are illustrated with a solution to the Dining Philosopher's problem.
[Ph.D. dissertation, Department of Computer Science and Engineering, University of California, San Diego, June, 1988.]
Modeling Concurrent Activities with Nested Objects.
Concurrent activities have been formally modeled by two different approaches; either by modeling shared control flow or by modeling shared data. Hoare's communicating sequential processes embrace the first approach while transaction models in data base and distributed systems embrace the second. Modeling control requires global reasoning about the ordering of shared events; transaction models only require local reasoning but are unable to capture some concurrent activities that can be modeled as communicating sequential processes. Nested objects do not suffer from either of these drawbacks. We present our model for nested objects and demonstrate it by expressing three variations of the dining philosophers problem. Finally, we show how nested objects may exhibit non-serializable behavior and still be considered correct.
[In Proceedings of the 7th International Conference on Distributed Computing Systems, Berlin, West Germany, September, 1987. Published by IEEE Press.]
PARPC: A System for Parallel Procedure Calls.
A parallel procedure call executes a procedure in n different address spaces in parallel. The calling code remains blocked while the n procedures execute. As each procedure result becomes available, the caller is unblocked to execute a statement to process the result of the returned call. After executing the statement, the caller reblocks to wait for the next result. This continues until the caller breaks out of the parallel remote procedure call or until no further results are available.
Programs making parallel procedure calls are modeled as network processes. A network process is a tree of local processes that communicate by making parallel procedure calls. We report on the design and implementation of PARPC, a system for developing and executing such programs on a network of UNIX hosts.
[With Charles Bergan and Brian Russ. In Proceedings of the 1987 International Conference on Parallel Processing, St. Charles, Illinois, August, 1987. Published by the Pennsylvania State University Press.]
The Gemini Replicated File System Test-bed.
The Gemini system is a replicated file system test-bed designed for local area networks and is built using ordinary host UNIX machines. Gemini was designed as a means to empirically test consistency and recovery schemes for replicated files in a distributed environment. A principle objective is to protect files against a fixed number of host and network failures while maintaining consistent data. Gemini replicated files are implemented as several copies of ordinary files that reside on distinct hosts.
We present the Gemini system test-bed design and discuss three consistency and recovery schemes: voting with witnesses, dynamic voting, and semi-synchronous control. Empirical and analytic performance results are presented.
[With Walter Burkhard, Jehan-Francois Paris, In Proceedings of the 3rd International Conference on Data Engineering, Los Angeles, California, February, 1987, Published by IEEE Press. Also in Information Sciences, an International Journal.]
Experience with A Module Package in Producing Production Quality PASCAL Programs.
A Module Package (AMP) is a preprocessor to a PASCAL compiler to support data encapsulation and modular system development. Experience with AMP in developing a software product at Amdahl Corporation has demonstrated its utility and robustness.
[With Sally Warren and Charles Hoch. In Proceedings of the 6th International Software Engineering Conference, Tokyo, Japan, September, 1982, Published by IEEE Press.]
Patents
Method and Apparatus for Managing Relationships Among Objects in a Distributed Object Environment.
A method and apparatus for managing relationships among objects in a distributed object environment includes a method and apparatus for determining whether two or more object references refer to identical objects; a method and apparatus for providing a unique identifier for an object; a method and apparatus for checking role types for the formation of relationships; and a method and apparatus for caching role and object locations in roles in a relationship. In the method and apparatus for determining whether two or more object references refer to the same object, a unique object identifier is compared to determine if the objects referred to by the object references are identical. The unique identifier is provided by concatenating information identifying the machine address of the process that created the object in addition to the process ID, the time of creation and a process counter. In the method and apparatus for checking role types, information including the number of roles and is passed to a relationship factory which determines whether the number, types and cardinality of the roles passed to the factory are consistent with the relationship object to be created. The method and apparatus also includes caching of object references and roles for objects related to a given object in that object's role. The methods and apparatus of the invention thus provide valuable tools for managing relationship among objects in a distributed object environment efficiently.
[With Jeff Dinkins, Mark Hapner. Patent Number 5,815,710. Awarded September 29, 1998.]
Method and Apparatus for Determining Equality of Objects in a Distributed Object Environment.
A method and apparatus for managing relationships among objects in a distributed object environment includes a method and apparatus for determining whether two or more object references refer to identical objects; a method and apparatus for providing a unique identifier for an object; a method and apparatus for checking role types for the formation of relationships; and a method and apparatus for caching role and object locations in roles in a relationship. In the method and apparatus for determining whether two or more object references refer to the same object, a unique object identifier is compared to determine if the objects referred to by the object references are identical. The unique identifier is provided by concatenating information identifying the machine address of the process that created the object in addition to the process ID, the time of creation and a process counter. In the method and apparatus for checking role types, information including the number of roles and is passed to a relationship factory which determines whether the number, types and cardinality of the roles passed to the factory are consistent with the relationship object to be created. The method and apparatus also includes caching of object references and roles for objects related to a given object in that object's role. The methods and apparatus of the invention thus provide valuable tools for managing relationship among objects in a distributed object environment efficiently.
[With Jeff Dinkins, Mark Hapner. Patent Number 6,697,877. Awarded February 24, 2004.]