Cantaloupe offers a sophisticated and customizable caching subsystem that is capable of meeting a variety of needs while remaining easy to use. Several tiers of cache are available:
Cantaloupe can provide caching hints to clients using a Cache-Control
response header, which is configurable via the
cache.client.*
keys in the configuration file. To enable this header, set the cache.client.enabled
key to true
.
The default settings look something like this:
These are reasonable defaults that tell clients they can keep cached images for 2,592,000 seconds (30 days).
The Cache-Control
header will be returned only in HTTP 2xx responses.
Commonly, source images will be served from a local filesystem using FilesystemResolver. There, they are already as local as they can be, so there would be no point in caching them (although a derivative cache could still be of great benefit).
As explained in the Resolvers section, though, images do not have to be served from a local filesystem—they can also be served from a remote web server, cloud storage, or wherever. The source cache can be beneficial when one of these non-filesystem sources performs poorer than ideal. Setting cache.server.source
to FilesystemCache
will cause all source images from non-FilesystemResolvers to be automatically downloaded and stored in the source cache.
Another reason for a source cache is to work around the incompatibility between certain processors and resolvers. Some processors are only capable of reading source images located on the filesystem. By setting StreamProcessor.retrieval_strategy
to CacheStrategy
, and then configuring FilesystemCache, the source cache will be utilized to deal with incompatible processor/resolver situations by automatically pre-downloading source images, This makes it possible to use something like OpenJpegProcessor with AmazonS3Resolver.
The source cache is integrated into the larger caching architecture, so all of the information about modes of operation and maintenance is applicable to both the source and derivative caches.
Note that unlike the derivative cache, there is only one available source cache implementation—FilesystemCache—and it will be used independently of the derivative cache.
The derivative cache caches post-processed images in order to spare the computational expense of processing the same image request over and over again. Derivative caches are pluggable, in order to enable different cache stores.
Derivative caching is recommended in production, as it will greatly reduce load on the server and improve response times accordingly. There are other ways of caching derivatives, such as by using a caching reverse proxy, but the built-in derivative cache is custom-tailored for this application and easy enough to set up.
Derivative caching is disabled by default. To enable it, set cache.server.derivative.enabled
to true
, and set cache.server.derivative
to the name of a cache, such as FilesystemCache.
The derivative cache can be bypassed on a per-request basis by supplying a cache=false
query parameter in the URL. When this parameter is present, the derivative cache will not be read from, nor written to, whether or not it is enabled. The Cache-Control
header will also be omitted from responses.
The info cache (added in version 3.4) caches image info objects in the Java heap, independently of the derivative cache. When both are enabled, the info cache acts as a "level 1" cache in front of the "level 2" derivative cache:
The info cache can be enabled or disabled via the cache.server.info.enabled
configuration key.
The info cache is cluster-safe: when multiple instances are sharing the same derivative cache, there will never (for more than a short period of time) be an info in an instance's info cache that isn't also present in the derivative cache.
The info cache does not respect the cache.server.ttl_seconds
configuration key. Its content never expires.
The maximum size of the info cache is hard-coded to a reasonable percentage of the maximum heap size, and is not configurable. As infos are very small, the maximum size is unlikely to ever be reached.
The info cache is not persisted; its contents will be lost when the application exits.
The source and derivative caches can be configured to operate in one of two ways:
cache.server.resolve_first = true
)cache.server.resolve_first = false
)Because cached content is not automatically deleted after becoming invalid, there will likely be a certain amount of invalid content taking up space in the cache at any given time. Without periodic maintenance, the amount can only grow. If this is a problem, it can be dealt with manually or automatically.
To purge all expired content, launch with the -Dcantaloupe.cache.purge_expired
option.
To purge all content, expired or not, launch with the -Dcantaloupe.cache.purge
option.
To purge all content related to a given identifier, expired or not launch with the -Dcantaloupe.cache.purge=identifier
option.
When any of these arguments are present at launch, the application will run in a special mode in which the web server will not be started, and exit when done. Any of these tasks can be run in a separate process, on the live cache store, while the main server instance remains running.
See the HTTP API documentation for more information.
Since version 2.2, a "cache worker" is available that will periodically clean and purge expired items from the cache automatically. (See the cache.server.worker.*
configuration options.)
Most caches (with the exception of HeapCache) age-limit their content based on last-accessed or last-modified time. Depending on the amount of source content served, the varieties of derivatives generated, the time-to-live setting, and how often maintenance is performed, the cache may grow very large. Its size is not tracked, as this would be either expensive, or, for some cache implementations, impossible. Managing the cache size is therefore the responsibility of the administrator, and it can be accomplished by any combination of:
cache.server.ttl_seconds
configuration key);FilesystemCache caches content in a filesystem tree. The tree structure looks like:
FilesystemCache.pathname
/
FilesystemCache.dir.depth
and FilesystemCache.dir.name_length
.Cache files are created with a .tmp extension and moved into place when closed for writing.
This cache is process-safe: it is safe to point multiple server instances at the same cache directory.
HeapCache, available since version 3.4, caches derivative images and metadata in the Java heap, which is the main area of RAM available to the JVM. This is the fastest of the caches, with the main drawback being that it cannot be shared across instances.
Unlike most of the other caches, this one does not age-limit content. When the target size (HeapCache.target_size
) has been exceeded, the minimum number of least-recently-accessed items are purged that will reduce it back down to this size. (The configured target size may be safely changed while the application is running.)
Because this cache is not time-limited, cache.server.ttl_seconds
does not apply, and, if enabled, the cache worker will have nothing to do, so will remain idle.
When using this cache, ensure that your heap is able to grow large enough to accommodate the desired target size (using the -Xmx
VM option), and that you have enough RAM to accommodate this size.
This cache can persist its contents to disk using the HeapCache.persist
and HeapCache.persist.filesystem.pathname
configuration keys. When persistence is enabled, the contents of the cache will be written to a file at shutdown, and loaded back in at startup. If persistence is disabled, the cache contents will be lost when the application exits.
Some thought was given to storing cached data using the same on-disk format used by FilesystemCache, so that persisted data would be compatible between these caches. Unfortunately, this is not possible because of the one-way hashing used in the FilesystemCache format.
JdbcCache caches derivative images and metadata in relational database tables. To use this cache, a JDBC driver for your database must be installed on the classpath.
JdbcCache is tested with the H2 database. It is known to not work with the official PostgreSQL driver, as of version 9.4.1207. Other databases may work, but are untested.
JdbcCache can be configured with the following options:
JdbcCache.url
jdbc:postgresql://localhost:5432/mydatabase
.JdbcCache.user
JdbcCache.password
JdbcCache.image_table
JdbcCache.info_table
JdbcCache will not create its schema automatically—this must be done manually using the following commands, which may have to be altered slightly for your particular database:
JdbcCache uses write transactions and is process-safe: it is safe to point multiple server instances at the same database tables.
AmazonS3Cache caches derivative images and metadata into an Amazon Simple Storage Service (S3) bucket.
AmazonS3Cache is configured (excepting credentials) using the following configuration keys:
AmazonS3Cache.bucket.name
AmazonS3Cache.bucket.region
us-east-1
. Can be commented out or left blank to use a default region. (See S3 Regions.)AmazonS3Cache.object_key_prefix
See the Credentials Sources information for AmazonS3Resolver. AmazonS3Cache works the same way, except that the credentials-related configuration keys, if you choose to use them, are different:
AmazonS3Cache.access_key_id
AmazonS3Cache.secret_key
AzureStorageCache caches derivative images and metadata into a Microsoft Azure Storage container. It can be configured with the following options:
AzureStorageCache.account_name
AzureStorageCache.account_key
AzureStorageCache.container_name
AzureStorageCache.object_key_prefix
RedisCache, available since version 3.4, caches derivative images and metadata using the Redis data structure store. It supports the following configuration options:
RedisCache.host
RedisCache.port
RedisCache.ssl
RedisCache.password
RedisCache.database
Unlike the other caches, cache policy is configured on the Redis side, and cache.server.ttl_seconds
will have no effect with this cache. Likewise, if enabled, the cache worker will have nothing to do, so will remain idle.