Overview
FileStorm is a simple file storage API that can be used where the Java Content Repository is overkill, BLOBs in
database are not appropriate, and FTP does not quite do the trick. FileStorm allows applications to perform CRUD operations
over content through a single interface, locally or remotely.
Features and Limitations
- Multiple implementations: file system, cached, HTTP.
- Embeddable.
- Easy to extend and customize.
- Does not support transactions.
Design
FileStorm is designed as follows:
- The Store interface
specifies CRUD-like operations that are performed over content.
- The StoreFactory
class allows creating instances of the different Store implementations in the framework.
Implementations
The framework provides different implementations of the Store interface. They
are explained one by one below.
FileStore
The FileStore
class implements the Store interface over the file system. A base directory must be provided,
under which file operations will be performed. To create an instance of FileStore, use the
constructor or the StoreFactory.
The FileStore supports hierarchical naming. Therefore, names under which content is bound
can be hierarchical (corresponding to file paths). Here is an example of how to use the FileStore
class.
Store store = StoreFactory.newFileStore(new File("working"), 1024);
// input corresponds to a java.io.InputStream
store.put("documents/file.doc", input);
|
CachingFileStore
The CachingFileStore
extends the FileStore, on top of which it implements caching logic. That is: the CachingFileStore
interacts with a delegate Store with which it synchronizes its state (the delegate acting as a master copy).
The caching store's content is kept in the file system (thereby using the functionality provided by the parent FileStore
class). A caching timeout is provided a instantiation time: content is automatically refreshed when the timeout occurs.
Here is an example:
Store master = StoreFactory.newFileStore(new File("working/master"), 1024);
Store cache = StoreFactory.newCachingFileStore(
new File("working/master"),
master,
1024,
30000);
// input corresponds to a java.io.InputStream
master.put("documents/file.doc", input);
input = cache.get("documents/file.doc");
// process input...
|
The above usage is not "standard": usually, a caching store is used for read-only operations, and the master
is used directly by admin applications to perform the write operations.
HTTP File Store
The StoreServlet and
StoreClient classes interact to provide a
Store that is accessible over the network.
Note that the servlet can also be conveniently accessed through web browsers.
The architecture is as follows:
- The StoreServlet wraps a FileStore, handling get/put/delete HTTP requests and
translating them to operations on the wrapped instance.
- The StoreClient implements the Store interface over an HTTP client that connects
with the servlet.
The Servlet
The servlet is configured as follows (see the javadoc):
<?xml version="1.0"?>
<web-app>
<display-name>Web App Example</display-name>
<servlet>
<servlet-name>testServlet</servlet-name>
<display-name>FileStorm Test Servlet</display-name>
<servlet-class>org.sapia.filestorm.http.StoreServlet</servlet-class>
<init-param>
<param-name>store.basedir</param-name>
<param-value>${user.dir}/etc/servlet</param-value>
</init-param>
<!-- this one is optional - defaults to 1024 -->
<init-param>
<param-name>store.bufsize</param-name>
<param-value>500</param-value>
</init-param>
<init-param>
<param-name>store.response.caching.seconds</param-name>
<param-value>30</param-value>
</init-param>
<init-param>
<param-name>store.put.enabled</param-name>
<param-value>true</param-value>
</init-param>
<init-param>
<param-name>store.delete.enabled</param-name>
<param-value>true</param-value>
</init-param>
</servlet>
<servlet-mapping>
<servlet-name>testServlet</servlet-name>
<url-pattern>/*</url-pattern>
</servlet-mapping>
</web-app>
|
The initialization parameters are passed to the constructor of the FileStore class. As was mentioned, the servlet maps
get/put/delete requests to the corresponding methods on the store instance. The path information (HttpServletRequest.getPathInfo()) is interpreted as
the name of the resource for which the operation should be performed. For example, given a store servlet published at http://localhost:8080/store
(where store is the context path), and the following URL: http://localhost:8080/store/documents/file.doc,
the name of the resource will be interpreted as being documents/file.doc.
The HTTP Client
On the client side hand, you use a StoreClient to interact with the servlet, which allows using the usual Store
interface:
Store store = StoreFactory.newStoreClient("http://localhost:8080/store");
// input corresponds to a java.io.InputStream
store.put("documents/file.doc", input);
|
Browser Access
The servlet can be accessed through a web browser by typing the URL corresponding to the file that is desired. For example,
given our above-configured servlet, we could type: http://localhost:8080/store/documents/file.doc
in the browser location.
The servlet uses Sun's activation framework in order to determine the MIME content type of stored files. In addition, it should
be insisted upon that GET requests at the servlet can be spared by setting a caching parameter (see the servlet's javadoc and above
example configuration). The value of such a caching parameter is used to set the Cache-Control
response header, which will ensure that client browsers are effectively performing caching of the downloaded files.
As a last node, since allowing access to the servlet directly from web browsers can pause security problems in the case of PUT and DELETE
requests, these are disabled by default. In order to enable them, set the appropriate servlet initialization parameters (as illustrated
in the web.xml given above).
Conclusion
FileStorm is a simple API that can easily be extended and used in numerous different ways:
- Wrap a StoreClient in a CachingFileStore to improve performance
on read operations
.
- Use rsync for replication between distributed FileStores organized in a master-slave
topology (the master is used for admin/read/write operations), the slaves are used for load-balancing of read operations
.
- etc.
|