The SPIN Distributed Transaction Manager

Last Modified : Sat Mar 29 12:34:04 1997




1. Executive Summary

This transaction manager provides a file I/O with transaction properties. It's function is very simliar to Camelot.

Distributed transaction support is still rudimentary.

Overall structure.

2. Roadmap

src
Extension source.
lib
This directory contains the system call interface library libtrans.a and its source codes. This is used by user space apps that use the service. Also contains RVM emulator.
malloc
Includes persistent trans_malloc and trans_free, a persistent heap management routines.
rvmbench
RVM SOSP benchmark. This is a very simple debit/credit benchmark used in the RVM SOSP paper.
oo7
The OO7 benchmark.

OSF/1 version

The transaction service is designed to run on both SPIN and OSF(UNIX). On OSF, the extension and the user app are linked together to become single binary file. The OSF version is only for debugging. To create the OSF version, edit the toplevel Makefile and uncomment
#export TRANSTARGET=osf
and comment out
export TRANSTARGET=spin

Testing the Transaction System

3. Using the Transaction System

There are sets of two interfaces, one for in-kernel extension and the other for user space applications.

3.1 Extension Interface

There are two interfaces that are visible to clients.
Transaction.i3
Transaction management is separated from the actual file management. This makes it easy to add a new kind of transactional I/O services without changing Transaction.[im]3. (but the log manager doesn't understand new kind of storage manager easily)
Storage.i3
Storage actually provides the file I/O facility.

3.2 User space application interface

TransSyscall
There is also an RVM compatible layer.

2. Design

2.1 Logging

The transaction property is guaranteed using the write ahead log(WAL). Both the storage manager and the transaction manager uses the log. Storage uses it to make things consistent, and trans manager uses it to support 2PL. Modifications to storage managers are kept either using undo-redo protocol or redo procotol. Which protocol to use is determined by the flag given to Transaction.Begin.

2.2 Distributed Transactions

The API when the client is on a remote site is same as local case. First, client has to call Storage.pin. This call always makes RPC to the server.

Storage.pin will map the server storage contents on the client memory. There is an internal optimization to check the version of pages in the region, and if those on the client are up to date, they are not transferred.

On transaction commit, all the undo-redo logs are transmitted to the server. Server modifies it's own storage contents using that log. Thus, after commit, the server and the client have same contents.

Currently, TCP is used for the communication protocol.

2.3 Recovery

Log recovery is a kind of weird. The log manager first opens a log file, and immediately starts the recovery. It looks at each log record, and upcalls the local storage manager or the transaction manager to do undo or redo. When it detects unterminated transactions (ex, prepare, but not committed), it forks off a thread to poll the storage manager or transaction manager to resolve it.

There are both advantages and disadvantages in this approach.

The advantage is that the recovery gets quicker, because we need only one scan through the log file. Also, we can truncate the log file automatically, because once recovery is complete, we are sure that the log used in the recovery is not used.

The downside is that the log manager has to know who are in the world. This means that you can't add a new type of the storage manager without changing the log manager recovery code.

The primary motivation for this design is the log truncation.

3. Modules

The transaction manager contains many modules. Here is the big picture.
Transaction
Transaction manages all the transactions initiated on the local host. "Initiated" means you called Transaction.Begin on the local host and started the transaction.
TransLocal
This is a synonym of Transaction.T, and this represents a transaction initiated on the local host.
TransRemote
Transaction module also acts as a manager for remote connections. TransRemote is an internal transaction that represents a
Storage
Storage is an abstract object representing storage devices participating in local transactions. It can be local or proxy.
StorageLocal
Local storages are the real storages. They maintain a disk on the local host. It also exports the methods for local clients.
StorageProxy
Proxy storage is a channel of communication to a storage on remote host.
StorageRemote
Sounds confusing, but there is a module called StorageRemote. This acts as a RPC server side stub. This module accepts a call from StorageProxy of a remote host, and forwards it to the local storage.
WAL
WAL stands for Write Ahead Log. This module provides generic logging service.
LockMan, the lock manager.

yasushi@cs.washington.edu