Sources provide access to source images, translating request URI identifiers into a source image locators, such as pathnames, in a particular type of underlying storage. After verifying that an underlying object exists and is accessible, a source can provide access to it to other application components in a generalized way.
All sources can provide access to streams from which to read a resource, but only FilesystemSource can provide access to files. This distinction is important because not all processors can read from streams.
Of the sources that provide stream accesss, not all support random access. Random access can be of enormous benefit when using certain processor/source format combinations that can fully exploit it, which are currently:
Source | Type | Random Access |
---|---|---|
FilesystemSource | FileSource | ✓ |
HttpSource | StreamSource | ✓* |
S3Source | StreamSource | ✓* |
AzureStorageSource | StreamSource | ✓* |
JdbcSource | StreamSource | × |
* Using chunking
In a simple configuration, one source supplies all requests. But it's also possible to select a source dynamically depending on the image identifier.
When the source.static
configuration key is set to the name of a source, that source will supply all requests.
When a static source is not flexible enough, it is also possible to serve images from different sources. For example, you may have some images stored on a filesystem, and others stored in an S3 bucket. If you can differentiate their sources based on their identifier in code—either by analyzing the identifier string, or performing some kind of service request—you can implement a delegate method to tell the image server from which source it should obtain the image.
To enable dynamic source selection, set the source.delegate
configuration key to true
, and implement the source()
delegate method. For example:
I want to serve images located…
On a filesystem… | …and the identifiers I use in URLs will correspond predictably to filesystem paths | FilesystemSource with BasicLookupStrategy |
…and filesystem paths will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier | FilesystemSource with ScriptLookupStrategy | |
On a web server… | …and the identifiers I use in URLs will correspond predictably to URL paths | HttpSource with BasicLookupStrategy |
…and URL paths will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier | HttpSource with ScriptLookupStrategy | |
In S3… | …and the identifiers I use in URLs will correspond predictably to object keys | S3Source with BasicLookupStrategy |
…and object keys will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier | S3Source with ScriptLookupStrategy | |
In Azure Storage… | …and the identifiers I use in URLs will correspond predictably to object keys | AzureStorageSource with BasicLookupStrategy |
…and object keys will need to be looked up (in a SQL database, search server, index file, etc.) based on their identifier | AzureStorageSource with ScriptLookupStrategy | |
As binaries or BLOBs in a SQL database | JdbcSource |
FilesystemSource maps a URL identifier to a filesystem path. This is the most compatible source, and usually the most efficient as well.
Two distinct lookup strategies are supported, defined by the FilesystemSource.lookup_strategy
configuration option.
BasicLookupStrategy locates images by concatenating an identifier with a pre-defined path prefix and/or suffix. For example, with the following configuration options set:
An identifier of image.jpg in the URL will resolve to /usr/local/images/image.jpg.
It's also possible to include a partial path in the identifier using URL-encoded slashes (%2F
) as path separators. subdirectory%2Fimage.jpg in the URL would then resolve to /usr/local/images/subdirectory/image.jpg.
If you are operating behind a reverse proxy that is not capable of passing encoded URL characters through without decoding them, see the slash_substitute
configuration key.
To prevent arbitrary directory traversal, BasicLookupStrategy will recursively strip out ../, /.., ..\, and \.. from identifiers before resolving the path.
FilesystemSource.BasicLookupStrategy.path_prefix
to the deepest possible path. The shallower the path, the more of the filesystem that will be exposed.
Sometimes, BasicLookupStrategy will not offer enough control. Perhaps you want to serve images from multiple filesystems, or perhaps your identifiers are opaque and you need to perform a database or web service request to locate the corresponding images. With this lookup strategy, you can tell FilesystemSource to invoke a delegate method and capture the pathname it returns.
The delegate method, filesystemsource_pathname()
, should return a pathname if available, or nil
if not. Examples follow:
Note that several common Ruby database libraries (like the mysql and pgsql gems) use native extensions. These won't work in JRuby. Instead, the course of action above is to use the JDBC API via the JRuby-Java bridge. For this to work, a JDBC driver for your database must be available on the Java classpath, and referenced in a java_import
statement.
This very simple mock web service returns a pathname in the response body when an image exists, and an empty response body if not.
Like all sources, FilesystemSource needs to be able to figure out the format of a source image before it can be served. It uses the following strategy to do this:
HttpSource maps a URL identifier to an HTTP or HTTPS resource, for retrieving images from a web server. It uses an OkHttp client internally.
HttpSource supports two distinct lookup strategies, defined by the HttpSource.lookup_strategy
configuration option.
BasicLookupStrategy locates images by concatenating an identifier with a pre-defined URL prefix and/or suffix. For example, with the following configuration options set:
An identifier of image.jpg in the URL will resolve to http://example.org/images/image.jpg.
A partial path can be included in the identifier by URL-encoding the path separator slashes (%2F
). subpath%2Fimage.jpg in the URL would then resolve to http://example.org/images/subpath/image.jpg.
It's also possible to use a full URL as an identifier by leaving both of the above keys blank. In that case, an identifier of http%3A%2F%2Fexample.org%2Fimages%2Fimage.jpg in the URL will resolve to http://example.org/images/image.jpg.
If you are operating behind a reverse proxy that is not capable of passing encoded URL characters through without decoding them, see the slash_substitute
configuration key.
Sometimes, BasicLookupStrategy will not offer enough control. Perhaps you want to serve images from multiple URLs, or perhaps your identifiers are opaque and you need to run a database or web service request to locate them. With this lookup strategy, you can tell HttpSource to invoke the httpsource_resource_info()
delegate method and capture the request info (URL and optionally authentication credentials and/or request headers) it returns.
See the FilesystemSource ScriptLookupStrategy section for examples of similar methods.
HTTP Basic authentication is supported.
HttpSource.BasicLookupStrategy.auth.basic.username
and HttpSource.BasicLookupStrategy.auth.basic.secret
configuration keys.Like all sources, HttpSource needs to be able to figure out the format of a source image before it can be served. It uses the strategy below to do this.
HEAD
response contains a Content-Type
header with a recognized value that is specific enough (not application/octet-stream
, for example), a format is inferred from that.HEAD
response contains an Accept-Ranges: bytes
header, a GET
request is sent containing a Range
header specifying a small range of data from the beginning of the resource, and a format is inferred from the magic bytes in the response entity.Since version 4.1, this source supports random access by requesting small chunks of image data as needed, as opposed to all of it. This may improve efficiency—possibly massively—when reading small portions of large images in certain formats (see below). Conversely, it may reduce efficiency when reading large portions of images.
In order for this technique to work:
HttpSource.chunking.enabled
configuration key must be set to true
;Range
header, as advertised by the presence of an Accept-Ranges: bytes
header in a HEAD
response;Note that when chunking is in effect, the processor.stream_retrieval_strategy
configuration key is ignored, effectively behaving as if it were set to StreamStrategy
. (See Retrieval Strategies.) Chunking is meant to obviate the expense of the other strategies.
The HTTP client library used by this source changed in version 5.0. The old client (Jetty) used $JAVA_HOME/jre/lib/security/cacerts
as its trust store, whereas the new client (OkHttp) uses the value of the javax.net.ssl.trustStore
VM argument. After migrating from 4.1, if you encounter errors like, "PKIX path building failed," try setting the value of that VM argument to the aforementioned path, or the path of some other trust store.
S3Source maps a URL identifier to a Simple Storage Service (S3) object. S3Source can work with both AWS and non-AWS S3 endpoints.
Credentials are obtained from the following sources in order of priority:
aws.accessKeyId
and aws.secretKey
system propertiesAWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
environment variablesS3Source.access_key_id
and S3Source.secret_key
keys in the application configurationAWS_CONTAINER_CREDENTIALS_RELATIVE_URI
environment variable is set and the security manager has permission to access it)BasicLookupStrategy locates images by concatenating an identifier with a pre-defined path prefix and/or suffix. For example, with the following configuration options set:
An identifier of image.jpg in the URL will resolve to path/prefix/image.jpg within the bucket.
It's also possible to include a partial path in the identifier using URL-encoded slashes (%2F
) as path separators. subpath%2Fimage.jpg in the URL would then resolve to path/prefix/subpath/image.jpg.
If you are operating behind a reverse proxy that is not capable of passing encoded URL characters through without decoding them, see the slash_substitute
configuration key.
When your URL identifiers don't match your S3 object keys, ScriptLookupStrategy is available to tell S3Source to capture the object key returned by a method in your delegate class. The s3source_object_info()
method should return a hash containing bucket
and key
keys, if an object is available, or nil
if not. See the FilesystemSource ScriptLookupStrategy section for examples of similar methods.
Like all sources, S3Source needs to be able to figure out the format of a source image before it can be served. It uses the following strategy to do this:
GET
request is sent with a Range
header specifying a small range of data from the beginning of the resource.
Content-Type
header is present in the response, and is specific enough (i.e. not application/octet-stream
), a format is inferred from that.This source supports random access since version 4.1. See the HttpSource section for an explanation of this feature.
AzureStorageSource maps a URL identifier to a Microsoft Azure Blob Storage blob.
BasicLookupStrategy locates images by passing the URL identifier as-is to Azure Storage, with no additional configuration possible.
When your URL identifiers don't match your blob keys, ScriptLookupStrategy is available to tell AzureStorageSource to capture the blob key returned by a method in your delegate class.
The delegate method, azurestoragesource_blob_key()
, should return a blob key string if available, or nil
if not. See the FilesystemSource ScriptLookupStrategy section for examples of similar methods.
Like all sources, AzureStorageSource needs to be able to figure out the format of a source image before it can be served. It uses the following strategy to do this:
HEAD
request is sent. If a Content-Type
header is present in the response, and is specific enough (i.e. not application/octet-stream
), a format is inferred from that.GET
request is sent with a Range
header specifying a small range of data from the beginning of the resource, and a format is inferred from the magic bytes in the response body.This source supports random access since version 4.1. See the HttpSource section for an explanation of this feature.
JdbcSource maps a URL identifier to a BLOB field in a relational database. It does not require a custom schema and can adapt to any schema, but some delegate methods must be implemented in order to obtain the information needed to run the SQL queries.
The application does not include any JDBC drivers, so a driver JAR for the desired database must be obtained separately and saved somewhere on the classpath.
The JDBC connection is initialized by the JdbcSource.url
, JdbcSource.user
, and JdbcSource.password
configuration options. If the user or password are not necessary, they can be left blank. The connection string must use your driver's JDBC syntax:
jdbc:postgresql://localhost:5432/my_database
jdbc:mysql://localhost:3306/my_database
jdbc:microsoft:sqlserver://example.org:1433;DatabaseName=MY_DATABASE
Consult the driver's documentation for details.
Then, the source needs to be told:
This method takes in an unencoded URL identifier and returns the corresponding database value of the identifier.
This method should return a media (MIME) type corresponding to the value returned by the jdbcsource_database_identifier()
method. If the media type is stored in the database, this example will return an SQL statement to retrieve it.
This method may return nil
; see Format Inference.
This method should return an SQL statement that selects the BLOB value corresponding to the value returned by the jdbcsource_database_identifier()
method.
Like all sources, JdbcSource needs to be able to figure out the format of a source image before it can be served. It uses the following strategy to do this: