Distributed transaction support is still rudimentary.
The service can be used both by user process and the extension. From the user process, the interface really looks like Camelot. From the extension, the interface is a little unnatural because all data transfer has to be done using collection of page sized buffers.
It has its own cache manager and memory object pager. No VM overhead when it's used from the user space!
When the process crashes, all the outstanding transactions are safely aborted.
Log manager is separate from either storage manager or transaction manager. It's interface is similar to Quicksilver, i.e., its low level.
Lock manager is separate from storage manager. This makes easy to use the lock manager in other data services.
Although there is only one data server in the system(storage manager) the transaction manager is designed in such a way that it can control any kind of services.
It is strictly two phase; and no shortcircuiting like in Quicksilver is allowed.
libtrans.a and its source codes.
This is used by user space apps that use
the service. Also contains RVM emulator.
trans_malloc and
trans_free, a persistent heap management
routines.
Makefile
and uncomment
#export TRANSTARGET=osfand comment out
export TRANSTARGET=spin
!>script -b
!>script trans.rc
!>sphinx exec ~/spin/user/trans/rvmbench/spin/rvmbench
See also rvmbench documentation.
!>script -b
!>script trans.rc
!>sphinx exec ~/spin/user/trans/oo7/spin/OO7 -g -r /efs/rds_data ~/spin/user/trans/oo7/Config.tiny
!>sphinx exec ~/spin/user/trans/oo7/spin/OO7 -b -r /efs/rds_data ~/spin/user/trans/oo7/Config.tiny 1 t1 t2a t2b
See also oo7 documentation.
The transaction property is guaranteed using the write ahead log(WAL).
Both the storage manager and the transaction manager uses
the log. Storage uses it to make things consistent, and trans manager
uses it to support 2PL. Modifications to storage managers are kept
either using undo-redo protocol or redo procotol. Which protocol to use
is determined by the flag given to Transaction.Begin.
The API when the client is on a remote site is same as local case.
First, client has to call Storage.pin.
This call always makes RPC to the server.
Storage.pin will map the server storage contents on the
client memory. There is an internal optimization to check the version
of pages in the region, and if those on the client are up to date,
they are not transferred.
On transaction commit, all the undo-redo logs are transmitted to the server. Server modifies it's own storage contents using that log. Thus, after commit, the server and the client have same contents.
Currently, TCP is used for the communication protocol.
Log recovery is a kind of weird. The log manager first opens a log
file, and immediately starts the recovery. It looks at each log
record, and upcalls the local storage manager or the transaction
manager to do undo or redo. When it detects unterminated transactions
(ex, prepare, but not committed), it forks off a thread to poll the
storage manager or transaction manager to resolve it.
There are both advantages and disadvantages in this approach.
The advantage is that the recovery gets quicker, because we need only
one scan through the log file. Also, we can truncate the log file
automatically, because once recovery is complete, we are sure that
the log used in the recovery is not used. The downside is that the log manager has to know who are in the world.
This means that you can't add a new type of the storage manager without
changing the log manager recovery code. The primary motivation for this design is the log truncation.
2.3 Recovery
3. Modules
The transaction manager contains many modules. Here is the
big picture.
yasushi@cs.washington.edu