4.0

Getting Started Configuration Endpoints Images Sources Processors Caching Access Control Delegate Script Overlays Redaction Logging Deployment Remote Management

This version of the manual refers to an earlier version of the software.

Sources

Sources were called resolvers in versions prior to 4.0.

Selection
Implementations

Sources provide access to source images. A source translates an identifier in a request URI to a source image locator, such as a pathname, in the particular type of underlying storage it is written to interface with. After checking whether the underlying object exists and is accessible, it can then provide access to it to other application components in a generalized way, so that the rest of the application does not need to know where an image resides, be it on the filesystem, a remote web server, or wherever.

Sources may provide access as a stream or a file. All sources can provide stream access, but only FilesystemSource can provide file access. This distinction is important because not all processors can read from streams. See Image Considerations for background on why this is an issue.

Selection

In a typical configuration, one source will handle all requests. It's also possible to select a source dynamically depending on the image identifier.

Static Source

When the source.static configuration key is set to the name of a source, that source will handle all requests.

Dynamic Sources

When a static source is not flexible enough, it is also possible to serve images from different sources from the same application instance. For example, you may have some images stored on a filesystem, and others stored in an S3 bucket. If you can differentiate their sources based on their identifier in code—either by analyzing the identifier string, or performing some kind of service request—you can use the delegate script mechanism to write a method to tell the image server which source to use for a given request.

To enable dynamic source selection, set the source.delegate configuration key to true, and implement the source() delegate method. For example:

class CustomDelegate
  def source(options = {})
    identifier = context['identifier']

    # Here, you would perform some kind of analysis on `identifier`:
    # parse it, look it up in a web service or database...
    # and then return the name of the source to use to serve it.
    'FilesystemSource'
  end
end

See the Delegate Script section for general information about the delegate script.

Which Source Should I Use?

I want to serve images located…

On a filesystem…	…and the identifiers I use in URLs will correspond predictably to filesystem paths	FilesystemSource with BasicLookupStrategy
On a filesystem…	…and filesystem paths will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier	FilesystemSource with ScriptLookupStrategy
On a local or remote web server…	…and the identifiers I use in URLs will correspond predictably to URL paths	HttpSource with BasicLookupStrategy
On a local or remote web server…	…and URL paths will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier	HttpSource with ScriptLookupStrategy
In S3…	…and the identifiers I use in URLs will correspond predictably to object keys	S3Source with BasicLookupStrategy
In S3…	…and object keys will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier	S3Source with ScriptLookupStrategy
In Azure Storage		AzureStorageSource
As binaries or BLOBs in a SQL database		JdbcSource

Implementations

FilesystemSource

FilesystemSource maps a URL identifier to a filesystem path, for retrieving images on a local or mounted filesystem. This is the most efficient and compatible source and may be the only option for serving very large source images.

Lookup Strategies

Two distinct lookup strategies are supported, defined by the FilesystemSource.lookup_strategy configuration option.

BasicLookupStrategy

BasicLookupStrategy locates images by concatenating an identifier with a pre-defined path prefix and/or suffix. For example, with the following configuration options set:

# Note trailing slash!
FilesystemSource.BasicLookupStrategy.path_prefix = /usr/local/images/
FilesystemSource.BasicLookupStrategy.path_suffix =

An identifier of image.jpg in the URL will resolve to /usr/local/images/image.jpg.

It's also possible to include a partial path in the identifier using URL-encoded slashes (%2F) as path separators. subdirectory%2Fimage.jpg in the URL would then resolve to /usr/local/images/subdirectory/image.jpg.

If you are operating behind a reverse proxy that is not capable of passing encoded URL characters through without decoding them, see the slash_substitute configuration key.

To prevent arbitrary directory traversal, BasicLookupStrategy will recursively strip out ../, /.., ..\, and \.. from identifiers before resolving the path.

Consider making the path to which FilesystemSource.BasicLookupStrategy.path_prefix is set as deep as possible. The shallower the path, the more of the filesystem that will be exposed.

ScriptLookupStrategy

Sometimes, BasicLookupStrategy will not offer enough control. Perhaps you want to serve images from multiple filesystems, or perhaps your identifiers are opaque and you need to perform a database or web service request to locate the corresponding images. With this lookup strategy, you can tell FilesystemSource to invoke a delegate method and capture the pathname it returns.

The delegate method, filesystemsource_pathname(), returns a pathname if available, or nil if not. (See the Delegate Script section for general information about the delegate script.) Examples follow:

Example 1: Query a PostgreSQL database to find the pathname corresponding to a given identifier

require 'java'

java_import 'org.postgresql.Driver'
java_import 'java.sql.DriverManager'

class CustomDelegate

  JDBC_URL = 'jdbc:postgresql://localhost:5432/mydatabase'
  JDBC_USER = 'myuser'
  JDBC_PASSWORD = 'mypassword'

  # By making the connection static, we can avoid reconnecting every time
  # the method is called, which would be expensive.
  # See: https://docs.oracle.com/javase/8/docs/api/java/sql/DriverManager.html
  @@conn = DriverManager.get_connection(JDBC_URL, JDBC_USER, JDBC_PASSWORD)

  def filesystemsource_pathname(options = {})
    identifier = context['identifier']
    begin
      # Note the use of prepared statements, which are safer than
      # string concatenation.
      sql = 'SELECT pathname FROM images WHERE identifier = ? LIMIT 1'
      stmt = @@conn.prepare_statement(sql)
      stmt.set_string(1, identifier)
      results = stmt.execute_query
      results.next
      pathname = results.getString(1)
      return pathname.present? ? pathname : nil
    ensure
      stmt&.close
    end
  end

end

Note that several common Ruby database libraries (like the mysql and pgsql gems) use native extensions. These won't work in JRuby. Instead, the course of action above is to use the JDBC API via the JRuby-Java bridge. For this to work, a JDBC driver for your database must be installed on the Java classpath, and referenced in a java_import statement.

Example 2: Query a web service to find the pathname corresponding to a given identifier

This very simple imaginary web service will return a pathname in the response body if an image was found, and an empty response body if not.

require 'net/http'
require 'cgi'

class CustomDelegate
  def filesystemsource_pathname(options = {})
    identifier = context['identifier']
    uri = 'http://example.org/webservice/' + CGI.escape(identifier)
    uri = URI.parse(uri)

    http = Net::HTTP.new(uri.host, uri.port)
    request = Net::HTTP::Get.new(uri.request_uri)
    response = http.request(request)
    return nil if response.code.to_i >= 400

    response.body.present? ? response.body.strip : nil
  end
end

Format Inference

Like all sources, FilesystemSource needs to be able to figure out the format of a source image before it can be served. It uses the following strategy to do this:

If the file's filename contains an extension, the format is inferred from that.
If unsuccessful, and the identifier contains a filename extension, the format is inferred from that.
If unsuccessful, an attempt is made to infer a format from the file's magic bytes.

HttpSource

HttpSource maps a URL identifier to an HTTP or HTTPS resource, for retrieving images from a web server. It uses a Jetty HTTP client internally.

Lookup Strategies

HttpSource supports two distinct lookup strategies, defined by the HttpSource.lookup_strategy configuration option.

BasicLookupStrategy

BasicLookupStrategy locates images by concatenating an identifier with a pre-defined URL prefix and/or suffix. For example, with the following configuration options set:

# Note trailing slash!
HttpSource.BasicLookupStrategy.url_prefix = http://example.org/images/
HttpSource.BasicLookupStrategy.url_suffix =

An identifier of image.jpg in the URL will resolve to http://example.org/images/image.jpg.

It's possible to include a partial path in the identifier using URL-encoded slashes (%2F) as path separators. subpath%2Fimage.jpg in the URL would then resolve to http://example.org/images/subpath/image.jpg.

It's also possible to use a full URL as an identifier by leaving both of the above keys blank. In that case, an identifier of http%3A%2F%2Fexample.org%2Fimages%2Fimage.jpg in the URL will resolve to http://example.org/images/image.jpg.

If you are operating behind a reverse proxy that is not capable of passing encoded URL characters through without decoding them, see the slash_substitute configuration key.

ScriptLookupStrategy

Sometimes, BasicLookupStrategy will not offer enough control. Perhaps you want to serve images from multiple URLs, or perhaps your identifiers are opaque and you need to run a database or web service request to locate them. With this lookup strategy, you can tell HttpSource to invoke the httpsource_resource_info() delegate method and capture the URL it returns.

See the Delegate Script section for general information about the delegate script, and the FilesystemSource ScriptLookupStrategy section for examples of similar methods.

Authentication

HTTP Basic authentication is supported.

When using BasicLookupStrategy, auth info is set globally in the HttpSource.BasicLookupStrategy.auth.basic.username and HttpSource.BasicLookupStrategy.auth.basic.secret configuration keys.
When using ScriptLookupStrategy, auth info can be returned from the delegate method.

Format Inference

Like all sources, HttpSource needs to be able to figure out the format of a source image before it can be served. It uses the following strategy to do this:

If the path component of the URI contains an extension, the format is inferred from that.
If unsuccessful, and the identifier has a filename extension, the format is inferred from that.
If unsuccessful, a GET request is sent containing a Range header specifying a small range of data from the beginning of the resource.
1. If the response includes a Content-Type header, and its value is specific enough (i.e. not application/octet-stream), a format will be inferred from that.
2. Otherwise, a format will be inferred from the magic bytes in the response body.

JdbcSource

JdbcSource maps a URL identifier to a BLOB field, for retrieving images from a relational database. It does not require a custom schema and can adapt to any schema. The drawback is that some delegate methods must be implemented in order to obtain the information needed to run the SQL queries.

The application does not include any JDBC drivers, so a driver JAR for the desired database must be obtained separately and saved somewhere on the classpath.

The JDBC connection is initialized by the JdbcSource.url, JdbcSource.user, and JdbcSource.password configuration options. If the user or password are not necessary, they can be left blank. The connection string must use your driver's JDBC syntax:

jdbc:postgresql://localhost:5432/my_database
jdbc:mysql://localhost:3306/my_database
jdbc:microsoft:sqlserver://example.org:1433;DatabaseName=MY_DATABASE

Consult the driver's documentation for details.

Then, the source needs to be told:

The database value corresponding to a given identifier
The media type corresponding to that value
The SQL statement that retrieves the BLOB value corresponding to that value

Database Identifier Retrieval Method

This method takes in an unencoded URL identifier and returns the corresponding database value of the identifier.

class CustomDelegate
  def jdbcsource_database_identifier(options = {})
    # If URL identifiers map directly to values in the database, simply
    # return the identifier from the request context. Otherwise, you
    # could transform it, perform a service request to look it up, etc.
    context['identifier']
  end
end

Media Type Retrieval Method

This method returns a media (MIME) type corresponding to the value returned by the jdbcsource_database_identifier() method. If the media type is stored in the database, this example will return an SQL statement to retrieve it.

class CustomDelegate
  def jdbcsource_media_type(options = {})
    'SELECT media_type ' +
        'FROM some_table ' +
        'WHERE some_identifier = ?'
  end
end

This method may return nil; see Format Inference.

BLOB Retrieval SQL Method

This method returns an SQL statement that selects the BLOB value corresponding to the value returned by the jdbcsource_database_identifier() method.

class CustomDelegate
  def jdbcsource_lookup_sql(options = {})
    'SELECT image_blob_column '
        'FROM some_table '
        'WHERE some_identifier = ?'
  end
end

Format Inference

Like all sources, JdbcSource needs to be able to figure out the format of a source image before it can be served. It uses the following strategy to do this:

If the media type retrieval method returns either a recognized media type, or an SQL query that can be invoked to obtain a recognized media type, the corresponding format will be used.
If the source image's URI identifier has a recognized filename extension, the format will be inferred from that.
Otherwise, the blob retrieval SQL will be executed to obtain a small range of data from the beginning of the resource, and an attempt will be made to infer a format from its "magic bytes."

S3Source

S3Source maps a URL identifier to an Amazon Simple Storage Service (S3) object, for retrieving images from S3. S3Source can work with both AWS and non-AWS S3 endpoints.

Credentials Sources

S3Source can obtain its credentials from several different sources. It will first consult the S3Source.access_key_id and S3Source.secret_key keys in the application configuration. If those are not set, it will fall back to the strategy employed by the AWS SDK's DefaultAWSCredentialsProviderChain; see its class documentation for details.

Lookup Strategies

BasicLookupStrategy

BasicLookupStrategy locates images by concatenating an identifier with a pre-defined path prefix and/or suffix. For example, with the following configuration options set:

# Note trailing slash!
S3Source.BasicLookupStrategy.path_prefix = path/prefix/
S3Source.BasicLookupStrategy.path_suffix =

An identifier of image.jpg in the URL will resolve to path/prefix/image.jpg within the bucket.

It's also possible to include a partial path in the identifier using URL-encoded slashes (%2F) as path separators. subpath%2Fimage.jpg in the URL would then resolve to path/prefix/subpath/image.jpg.

If you are operating behind a reverse proxy that is not capable of passing encoded URL characters through without decoding them, see the slash_substitute configuration key.

ScriptLookupStrategy

When your URL identifiers don't match your S3 object keys, ScriptLookupStrategy is available to tell S3Source to capture the object key returned by a method in your delegate script. The s3source_object_info() method returns a hash containing bucket and key keys, if an object is available, or nil if not. See the Delegate Script section for general information about the delegate script, and the FilesystemSource ScriptLookupStrategy section for examples of similar methods.

Format Inference

Like all sources, S3Source needs to be able to figure out the format of a source image before it can be served. It uses the following strategy to do this:

If the object key has a recognized filename extension, the format will be inferred from that.
Otherwise, if the source image's URI identifier has a recognized filename extension, the format will be inferred from that.
Otherwise, a GET request will be sent with a Range header specifying a small range of data from the beginning of the resource.
1. If a Content-Type header is present in the response, and is specific enough (i.e. not application/octet-stream), a format will be inferred from that.
2. Otherwise, a format will be inferred from the magic bytes in the response body.

AzureStorageSource

AzureStorageSource maps a URL identifier to a Microsoft Azure Storage blob, for retrieving images from Azure Storage. It can be configured with the following options:

Configuration

AzureStorageSource.account_name: The name of your Azure account.
AzureStorageSource.account_key: A key to access your Azure Storage account.
AzureStorageSource.container_name: Name of the container from which to serve images.
AzureStorageSource.lookup_strategy: The strategy to use to look up images based on their URL identifier. See below.

As of version 4.0, this source also supports shared access signatures as an alternative to hard-coding the account name, account key, and container name.

Lookup Strategies

BasicLookupStrategy

BasicLookupStrategy locates images by passing the URL identifier as-is to Azure Storage, with no additional configuration possible.

ScriptLookupStrategy

When your URL identifiers don't match your blob keys, ScriptLookupStrategy is available to tell AzureStorageSource to capture the blob key returned by a method in your delegate script.

The delegate method, azurestoragesource_blob_key(), returns a blob key string if available, or nil if not. See the Delegate Script section for general information about the delegate script, and the FilesystemSource ScriptLookupStrategy section for examples of similar methods.

Format Inference

Like all sources, AzureStorageSource needs to be able to figure out the format of a source image before it can be served. It uses the following strategy to do this:

If the blob key has a recognized filename extension, the format will be inferred from that.
Otherwise, if the source image's URI identifier has a recognized filename extension, the format will be inferred from that.
Otherwise, a HEAD request will be sent. If a Content-Type header is present in the response, and is specific enough (i.e. not application/octet-stream), a format will be inferred from that.
Otherwise, a GET request will be sent with a Range header specifying a small range of data from the beginning of the resource, and a format will be inferred from the magic bytes in the response body.