File Data Cache Documentation
Contents:
The file data cache stores recently accessed file data in pages
of
physical memory. It can
potentially cache data for any file system. It provides an easy
interface by which any file system can add, delete, read, and write
cached files. The default implementation of the cache uses a global
LRU policy to choose victim blocks for replacement. However, the
replacement mechanism is implemented with events that can be caught
and handled to define different replacment schemes. Although mapping
is not yet implemented, the cache is designed to provide a mappable
memory object per cached file. In this initial implementation, all
physical pages of memory are pinned -- the cache is not paged.
The following is a description of the components that make up the
cache implementation. It is organized by module/interface name.
-
FileDataCache: the external
interface to the cache. FileDataCache allows clients to
interact with the cache using a file-like interface. FileDataCache
carries out file system requests by dividing the region to be accessed
into cache blocks and then calling
per-file access methods for those blocks.
- Buffer: the
container for pages of data. Buffer encapsulates
physical
pages of memory. Buffers are analogous to cache blocks.
Buffer is a subtype of DoubleList.EltT,
allowing it to easily be stored in linked lists.
Buffers are the basic unit of allocation, deallocation, and
replacement in the cache.
-
BaseMObjCache.i3: the container for unallocated
buffers. Initially, all buffers are stored in the
BaseMObjCache. As file data is cached, these buffers are allocated
to per-file containers. When files abandon
allocated buffers, they are returned to the free pool in the
BaseMObjCache.
-
FileMObjCache.i3: the per-file container for buffers.
When a file is added to the cache, a
FileMObjCache object is created. The FileMObjCache maintains
the buffers for a file along with any metadata for the file (e.g. size,
backing file system, etc.) and buffer (e.g. locks, offset into file, etc.).
This data is maintained in an array of block descriptors.
- Victim.i3: controls
cache block replacement. Victim exports an interface to control
cache block replacement. Victim tracks all
allocated buffers. Whenever a buffer is accessed, Victim is
notified by the per-file container. When
new data must be cached and no buffers are available in the
free pool, the file container asks Victim
to steal a block. Victim then causes a file container to release
an allocated buffer. It chooses the file container and the buffer based
on the reported access patterns. The default implementation of Victim
is a global LRU strategy. However, the Victim interface was designed
so that its events could be interposed and different schemes could be
developed.
- DoubleList.i3:
implements a doubly-linked list, providing O(1) insertion and
deletion. DoubleLists are used to maintain the
free pool, as well as in the default
implementation of Victim to maintain the LRU list.
Members of DoubleLists must be subtypes of DoubleList.EltT,
such as buffers. Although
FileMObjCache.Ts are not subtypes of
DoubleList.EltT, they contain a public DoubleList.EltT
subtype field that allows them to be inserted (somewhat awkwardly) into
DoubleLists. Although this functionality is not used, it may be
helpful in a dynamically installed implementation of
Victim. Both FileMObjCache.T and BaseMObjCache.T
contain DoubleList fields, providing for easily accessible per-file
or per-cache lists.
The cache locking scheme can be somewhat confusing. The expected locks
are present to avoid data races when multiple threads access
DoubleLists as well as the
file-to-FileMObjCache table in
FileDataCache.m3. FileMObjCache, however,
has a more interesting locking scheme. FileMObjCache locks are on
three levels:
- Block: there is a
ReaderWriterLock associated with each block
descriptor. ReaderLocks allow multiple readers and no writers, while
WriterLocks allow only a single writer. ReaderLocks must be obtained
when the data or metadata of a block is being read. WriterLocks must be
obtained when the data or metadata of a block is being written, and if the
block is being swapped in or swapped out. Holding a ReaderLock or WriterLock
does not inhibit access to other blocks in the file.
- File Metadata: there is a MUTEX associated with each
FileMObjCache.T. This MUTEX is used to
protect against concurrent access to the metadata associated with a file
(including size, number of resident pages, number of hits and misses, etc.).
The MUTEX must be grabbed when reading or writing file metadata. It should
not be held while, e.g., swapping out a block. In fact, holding the MUTEX
does not prevent access to blocks.
- File Access: there is a monitor-like object associated with each
FileMObjCache.T. It can be used to restrict
access to the entire file, such as to delete it from the cache or
restructure its array of block descriptors. This lock allows for multiple
readers and writers unless access is explicitly disallowed. When access is
explicitly disallowed, any ongoing accesses can run to completion.
There are two obstacles to caching a file system. The first is
ensuring that file objects are of the proper type, and the second is
ensuring that the cache is interposed between the file system and the
disk (or other storage device).
File objects must be of the proper type because of the cache's reliance
on Modula-3
table generic. A table is used in
FileDataCache to translate from a file system
File.T object to a
FileMObjCache.T object. Since File.T is the
table key, the File.i3 interface
must have a Hash() and Equal() procedure. The File.i3 does not have these
procedures. Instead, the File.T object has a public datafield of type
FileId.T, and the
FileId.i3 interface does have
Hash() and Equal() procedures. Unfortunately, my experiences have
indicated that the FileId field does not provide unique identification
for files (for some file systems, every file had a FileId field consisting
entirely of zeroes).
There are many approaches to fixing the file type problem. A few are
suggested below:
There are also several approaches to interposing the cache
into the filing service. A few are suggested below:
- The first approach is currently in use and corresponds to
the first approach to fixing the file type problem above.
The file system to be cached is copied. Its existing
read() and write() methods are used as the swapin and
swapout parameters to AddFile() in the
FileDataCache
interface.The read() and write() methods are then
replaced with methods that invoke cache routines. An
example of this approach is the CacheUFS file system in /spin/ddion/spin/user/fs/cufs.
- A second approach is much more efficient in its code
reuse. Rather than copying an entire file system, a new
File object can be
created that is a subtype of the File object of the file
system being cached. The read() and write() methods can
be overridden to invoke the cache, and the supertype
read() and write() methods can be used as the swapin and
swapout parameters to AddFile().
- A third approach uses the
dispatcher
to interpose on the read() and write() methods of a file,
inserting calls to the cache. Unfortunately, this
approach is not yet possible since the dispatcher is
unable to interpose on object methods.
The cache implementation comes with a shell interface. Currently,
the shell commands are very limited. A collection of cache
statistics is available with the command:
cache stat
at the SPIN shell. In order to keep statistics, the KeepStats
constant must be TRUE at compile time.
There are a number of enhancements that would improve the file data
cache implementation. These are not limited to the following:
- Invalidation by timeout: Sometimes a block of data should
be invalidated or refreshed after a period of time. NFS is an example
of a file system that invalidates its cache with timeouts. Incorporating
timeouts into the cache implementation would not be difficult. In fact,
since the SPIN kernel provides an interface to setting alarms, it
would be quite easy. Timeouts could be implemented directly into the
default cache, with the timeout value set as a parameter to the AddFile()
call. Alternatively, and preferably, timeouts could be implemented
entirely through the Victim interface. By interposing a new Victim
implementation via event-handling, a timeout mechanism could be dynamically
installed. The timer would begin when the alloc buffer event is raised.
It would be cleared at each ref event. If the alarm fires, the Victim
module could cause the FileMObjCache module to release the offending buffer.
- Asynchronous Flushes: In the currently implementation, writes
are stored in the cache until the block is purged or the flush() method is
invoked. A more interesting approach would be to provide an asynchronous
flush. This would allow a flush to be invoked frequently without hurting
the latency of file operations.
- Cleanup file containers: FileMObjCache.T objects currently are
not cleaned up. Once AddFile() is called, FileMObjCache.T objects live in
the cache table until they are explicity deleted. This is an unnecessary
waste of memory. An alternate approach would be to move FileMObjCache.T
objects into an alternate data structure once they have no resident pages.
If this alternate structure reaches a certain threshold, FileMObjCache.T
objects are purged and picked up by the kernel garbage collector. The
alternate data structure could be a doubly-linked list. FileMObjCache.T
objects contain a DoubleList.EltT field that enables them to belong to a
list structure. The list could be ordered by least recently abandoned,
making the victim choice clear.
ddion@cs.washington.edu
Last Modified: 21 June 1997