3.3

Getting Started Configuration Endpoints Images Resolvers Processors Caching Access Control Metadata Color Profiles Overlays Redaction Delegate Script Logging Deployment Remote Management

This version of the manual refers to an earlier version of the software.

Resolvers

Selection
Implementations

Resolvers locate and provide access to source images. A resolver translates an identifier in a request URL to an image locator, such as a pathname, in the particular type of underlying storage it is written to interface with. It can then check whether the underlying object exists and is accessible, and if so, provide access to it to other image server components. The rest of the image server does not need to know where an image resides, be it on the filesystem, in a database, on a remote web server, or in cloud storage—it can simply ask the current resolver, whatever it may be, to provide a generic interface from which an image can be read.

The interface in question may be a stream or a file. All resolvers can provide stream access, but only FilesystemResolver can provide file access. This distinction is important because not all processors can read from streams.

Selection

In a typical configuration, one resolver will handle all requests. It is also possible to configure the image server to select a resolver dynamically upon each request depending on the image identifier.

Static Resolver

When the resolver.static configuration key is set to the name of a resolver, that resolver will be used for all requests.

Dynamic Resolvers

When a static resolver is not flexible enough, it is also possible to serve images from different sources simultaneously. For example, you may have some images stored on a filesystem, and others stored on Amazon S3. If you can differentiate their sources based on their identifier in code—either by analyzing the identifier string, or performing some kind of service request—you can use the delegate script mechanism to write a simple Ruby method to tell the image server which resolver to use for a given request.

To enable dynamic resolver selection, set the resolver.delegate configuration key to true. Then, implement the get_resolver(identifier) method in the delegate script, which takes in an identifier and returns a resolver name (like FilesystemResolver, HttpResolver, etc.) For example:

module Cantaloupe
  def self.get_resolver(identifier)
    # Here, you would perform some kind of analysis on `identifier`:
    # parse it, look it up in a web service or database...
    # and then return the name of the resolver to use to serve it.
    'FilesystemResolver'
  end
end

See the Delegate Script section for general information about the delegate script.

Which Resolver Should I Use?

I want to serve images located…
On a filesystem…	…and the identifiers I use in URLs will correspond predictably to filesystem paths	FilesystemResolver with BasicLookupStrategy
On a filesystem…	…and filesystem paths will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier	FilesystemResolver with ScriptLookupStrategy
On a local or remote web server…	…and the identifiers I use in URLs will correspond predictably to URL paths	HttpResolver with BasicLookupStrategy
On a local or remote web server…	…and URL paths will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier	HttpResolver with ScriptLookupStrategy
As binaries or BLOBs in a SQL database		JdbcResolver
In the cloud		AmazonS3Resolver or AzureStorageResolver

Implementations

FilesystemResolver

FilesystemResolver maps a URL identifier to a filesystem path, for retrieving images on a local or attached filesystem. In addition to being the most compatible resolver, this is also the most efficient resolver and may or may not be the only option for serving very large source images.

For images with extensions that are missing or unrecognized, this resolver will check the "magic number" to determine type, which will add some overhead. It is therefore slightly more efficient to serve images with extensions.

FilesystemResolver supports two distinct lookup strategies, defined by the FilesystemResolver.lookup_strategy configuration option.

BasicLookupStrategy

BasicLookupStrategy locates images by concatenating a pre-defined path prefix and/or suffix. For example, with the following configuration options set:

# Note trailing slash!
FilesystemResolver.BasicLookupStrategy.path_prefix = /usr/local/images/
FilesystemResolver.BasicLookupStrategy.path_suffix =

An identifier of image.jpg in the URL will resolve to /usr/local/images/image.jpg.

It's also possible to include a partial path in the identifier using URL-encoded slashes (%2F) as path separators. subdirectory%2Fimage.jpg in the URL would then resolve to /usr/local/images/subdirectory/image.jpg.

If you are operating behind a reverse proxy that is not capable of passing encoded URL characters through without decoding them, see the slash_substitude configuration key.

To prevent arbitrary directory traversal, BasicLookupStrategy will strip out ..{path separator} and {path separator}.. from identifiers before resolving the path.

Note: it may be dangerous to not use path_prefix. The shallower the path, the more of the filesystem that will be exposed.

ScriptLookupStrategy

Sometimes, BasicLookupStrategy will not offer enough control. Perhaps you want to serve images from multiple filesystems, or perhaps your identifiers are opaque and you need to perform a database or web service request to locate the corresponding images. With this lookup strategy, you can tell FilesystemResolver to invoke a method in your delegate script and capture the pathname it returns.

The delegate script method, get_pathname(identifier), will take in an identifier string and should return a pathname, if available, or nil, if not. (See the Delegate Script section for general information about the delegate script.) Examples follow:

Example 1: Query a PostgreSQL database to find the pathname corresponding to a given identifier

require 'java'

java_import 'org.postgresql.Driver'
java_import 'java.sql.DriverManager'

module Cantaloupe
  module FilesystemResolver
    JDBC_URL = 'jdbc:postgresql://localhost:5432/mydatabase'
    JDBC_USER = 'myuser'
    JDBC_PASSWORD = 'mypassword'

    # DriverManager is actually java.sql.DriverManager
    # (https://docs.oracle.com/javase/8/docs/api/java/sql/DriverManager.html);
    # we are calling Java API via JRuby.
    # By making the connection static, we can avoid reconnecting every time
    # get_pathname() is called, which would be slow and expensive.
    @@conn = DriverManager.get_connection(JDBC_URL, JDBC_USER, JDBC_PASSWORD)

    def self.get_pathname(identifier)
      begin
        # Note the use of prepared statements to prevent SQL injection.
        sql = 'SELECT pathname FROM images WHERE identifier = ? LIMIT 1'
        stmt = @@conn.prepare_statement(sql)
        stmt.set_string(1, identifier.to_s)
        results = stmt.execute_query
        results.next
        pathname = results.getString(1)
        return (pathname.length > 0) ? pathname : nil
      ensure
        stmt.close if stmt
      end
    end
  end
end

Note that several common Ruby database libraries (like the mysql and pgsql gems) use native extensions. These won't work in JRuby. Instead, the course of action above is to use the JDBC API via the JRuby-Java bridge. For this to work, a JDBC driver for your database will need to be installed on the Java classpath, and referenced in a java_import statement.

Example 2: Query a web service to find the pathname corresponding to a given identifier

This very simple imaginary web service will return a pathname in the response body if an image was found, and an empty response body if not.

require 'net/http'
require 'cgi'

module Cantaloupe
  module FilesystemResolver
    def self.get_pathname(identifier)
      uri = 'http://example.org/webservice/' + CGI.escape(identifier)
      uri = URI.parse(uri)

      http = Net::HTTP.new(uri.host, uri.port)
      request = Net::HTTP::Get.new(uri.request_uri)
      response = http.request(request)
      return nil if response.code.to_i >= 400

      (response.body.length > 0) ? response.body.strip : nil
    end
  end
end

Example 3: Query Solr to find the pathname corresponding to a given identifier

In this variation on the previous example, our web service is Solr, and instead of parsing a text string out of the response body, we treat the body as Ruby code.

require 'net/http'
require 'cgi'

module Cantaloupe
  module FilesystemResolver
    def self.get_pathname(identifier)
      uri = 'http://localhost:8983/solr/collection1/select?q=' +
          CGI.escape('id:"' + identifier + '"') +
          '&amp;fl=pathname_si&amp;wt=ruby'
      uri = URI.parse(uri)

      http = Net::HTTP.new(uri.host, uri.port)
      request = Net::HTTP::Get.new(uri.request_uri)
      response = http.request(request)
      return nil if response.code.to_i >= 400

      results = eval(response.body)['response']['docs']
      results.any? ? results.first['pathname_si'] : nil
    end
  end
end

HttpResolver

HttpResolver maps a URL identifier to an HTTP or HTTPS resource, for retrieving images from a web server.

It is preferable to use this resolver with source images with recognizable file extensions. For images with an extension that is missing or unrecognizable, it will issue an HTTP HEAD request to the server to check the Content-Type header. If the type cannot be inferred from that, an error response will be returned.

HttpResolver supports two distinct lookup strategies, defined by the HttpResolver.lookup_strategy configuration option.

BasicLookupStrategy

BasicLookupStrategy locates images by concatenating a pre-defined URL prefix and/or suffix. For example, with the following configuration options set:

# Note trailing slash!
HttpResolver.url_prefix = http://example.org/images/
HttpResolver.url_suffix =

An identifier of image.jpg in the URL will resolve to http://example.org/images/image.jpg.

If you are operating behind a reverse proxy that is not capable of passing encoded URL characters through without decoding them, see the slash_substitude configuration key.

ScriptLookupStrategy

Sometimes, BasicLookupStrategy will not offer enough control. Perhaps you want to serve images from multiple URLs, or perhaps your identifiers are opaque and you need to run a database or web service request to locate them. With this lookup strategy, you can tell HttpResolver to invoke a method in your delegate script and capture the URL it returns.

The delegate script method, get_url(identifier), will take in an identifier string and should return a URL, if available, or nil, if not. See the Delegate Script section for general information about the delegate script, and the FilesystemResolver ScriptLookupStrategy section for examples of similar scripts.

JdbcResolver

JdbcResolver maps a URL identifier to an RDBMS BLOB field, for retrieving images from a relational database. It does not require a custom schema and can adapt to any schema. The downside of that is that some delegate script methods must be implemented in order to obtain the information needed to run the SQL queries.

Cantaloupe does not include any JDBC drivers, so a driver JAR for the desired database must be obtained separately and saved somewhere on the classpath.

The JDBC connection is initialized by the JdbcResolver.url, JdbcResolver.user, and JdbcResolver.password configuration options. If the user or password are not necessary, they can be left blank. The connection string must use your driver's JDBC syntax:

jdbc:postgresql://localhost:5432/my_database
jdbc:mysql://localhost:3306/my_database
jdbc:microsoft:sqlserver://example.org:1433;DatabaseName=MY_DATABASE

Consult the driver's documentation for details.

Then, the resolver needs to be told:

The database value corresponding to a given identifier
The media type corresponding to that value
The SQL statement that retrieves the BLOB value corresponding to that value

Database Identifier Retrieval Method

This method takes in an unencoded URL identifier and returns the corresponding database value of the identifier.

module Cantaloupe
  module JdbcResolver
    get_database_identifier(url_identifier)
      # If URL identifiers map directly to values in the database, simply
      # return url_identifier. Otherwise, you could transform it, perform
      # a service request to look it up, etc.
      url_identifier
    end
  end
end

Media Type Retrieval Method

This method returns a media (MIME) type corresponding to the value returned by the get_database_identifier method. If the media type is stored in the database, this example will return an SQL statement to retrieve it.

module Cantaloupe
  module JdbcResolver
    def get_media_type
      'SELECT media_type ' +
          'FROM some_table ' +
          'WHERE some_identifier = ?'
    end
  end
end

If the URL identifier will always have a known, valid image extension, like .jpg, .tif, etc., this method can return nil, and Cantaloupe will infer the media type from the extension.

BLOB Retrieval SQL Method

The get_lookup_sql method returns an SQL statement that selects the BLOB value corresponding to the value returned by the get_database_identifier method.

module Cantaloupe
  module JdbcResolver
    def get_lookup_sql
      'SELECT image_blob_column '
          'FROM some_table '
          'WHERE some_identifier = ?'
    end
  end
end

AmazonS3Resolver

AmazonS3Resolver maps a URL identifier to an Amazon Simple Storage Service (S3) object, for retrieving images from Amazon S3. It can be configured with the following options:

AmazonS3Resolver.access_key_id: An access key associated with your AWS account. (See AWS Security Credentials.)
AmazonS3Resolver.secret_key: A secret key associated with your AWS account. (See AWS Security Credentials.)
AmazonS3Resolver.bucket.name: Name of the bucket containing the images to be served.
AmazonS3Resolver.bucket.region: Name of a region to send requests to, such as us-east-1. Can be commented out or left blank to use a default region. (See S3 Regions.)
AmazonS3Resolver.lookup_strategy: The strategy to use to look up images based on their URL identifier. See below.

BasicLookupStrategy

BasicLookupStrategy locates images by passing the URL identifier as-is to S3, with no additional configuration necessary or possible.

ScriptLookupStrategy

When your URL identifiers don't match your Amazon S3 object keys, ScriptLookupStrategy is available to tell AmazonS3Resolver to capture the object key returned by a method in your delegate script.

The delegate script method, get_s3_object_key(identifier), will take in an identifier string and should return an S3 object key string, if available, or nil, if not. See the Delegate Script section for general information about the delegate script, and the FilesystemResolver ScriptLookupStrategy section for an example of a similar script.

AzureStorageResolver

AzureStorageResolver maps a URL identifier to a Microsoft Azure Storage blob, for retrieving images from Azure Storage. It can be configured with the following options:

AzureStorageResolver.account_name: The name of your Azure account.
AzureStorageResolver.account_key: A key to access your Azure Storage account.
AzureStorageResolver.container_name: Name of the container from which to serve images.
AzureStorageResolver.lookup_strategy: The strategy to use to look up images based on their URL identifier. See below.

BasicLookupStrategy

BasicLookupStrategy locates images by passing the URL identifier as-is to Azure Storage, with no additional configuration necessary or possible.

ScriptLookupStrategy

When your URL identifiers don't match your Azure Storage blob keys, ScriptLookupStrategy is available to tell AzureStorageResolver to capture the blob key returned by a method in your delegate script.

The delegate script method, get_azure_storage_blob_key(identifier), will take in an identifier string and should return a blob key string, if available, or nil, if not. See the Delegate Script section for general information about the delegate script, and the FilesystemResolver ScriptLookupStrategy section for an example of a similar script.