This manual refers to a future version of the software that is still in development. It may be incomplete or inaccurate and the software may change before a final release.

Metadata

Overview

Some image formats support embedded metadata, which may contain information about image characteristics, as well as user- and/or device-supplied information about authoring, copyright, camera settings, and so on. This metadata may be encoded in a standard cross-format encoding like EXIF, IPTC IIM, or XMP, or it may use an encoding that is specific to a particular image format. More than one encoding may be present in the same file.

During a response, an entirely new image is generated and returned to the client. In some cases it might be desired that this image contain some subset of metadata from the source image, or all of it, or that new metadata be added to it.

Standards

EXIF
Typically written by devices like cameras and scanners and not meant to be edited. Uses the TIFF file format (consisting of "directories," "tags," and "fields") for serialization, and many terms overlap with the ones in the "Baseline TIFF" specification. Terms can also be represented in XMP.
IPTC IIM
Serialization and vocabulary developed by the IPTC. Simple binary encoding with a fixed vocabulary. The serialization is not used much anymore, but the vocabulary has been migrated to the "IPTC Core" schema used in XMP.
XMP
RDF-based encoding developed by Adobe. Widely supported across many formats and even non-image media files. May contain terms from the EXIF and IPTC standards as well as from other vocabularies. The current de facto embedded metadata standard.
Native standards
Some image formats define their own metadata standards. PNG and TIFF define small term vocabularies, for example.
EXIF IPTC IIM XMP Native
GIF × × ×
JPEG ×
JPEG2000 × ×
PNG × ×
TIFF Stored in baseline IFD and several sub-IFDs Stored in sub-IFD 33723 Stored in sub-IFD 700 Baseline IFD tags

Processor Support

Not all processors are metadata-aware; see the table of processor-supported features.

Copying & Mutating

While Cantaloupe is capable of reading several different source metadata formats, it can write only XMP. This simplifies the metadata API and reduces crosswalking challenges, as XMP is flexible enough to express EXIF and IIM with no information loss. XMP can also be embedded into many image formats and is widely supported by imaging software.

XMP is serialized as RDF/XML. A very simple XMP packet might look like:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
                     xmlns:aux="http://ns.adobe.com/exif/1.0/aux/"
                     xmlns:xmp="http://ns.adobe.com/xap/1.0/"
                     xmlns:photoshop="http://ns.adobe.com/photoshop/1.0/"
                     xmlns:dc="http://purl.org/dc/elements/1.1/">
        <aux:Lens>5.4-10.8mm</aux:Lens>
        <aux:FlashCompensation>0/1</aux:FlashCompensation>
        <aux:Firmware>Firmware Version 1.00</aux:Firmware>
        <aux:OwnerName>Ansel Adams</aux:OwnerName>
        <xmp:CreateDate>2002-07-14T09:01:42</xmp:CreateDate>
        <xmp:ModifyDate>2002-07-14T09:01:42</xmp:ModifyDate>
        <xmp:CreatorTool>Photos 1.5</xmp:CreatorTool>
        <photoshop:DateCreated>2002-07-14T09:01:42</photoshop:DateCreated>
        <dc:subject>
            <rdf:Bag>
                <rdf:li>Mountains</rdf:li>
                <rdf:li>Scenery</rdf:li>
                <rdf:li>Landscapes</rdf:li>
            </rdf:Bag>
        </dc:subject>
    </rdf:Description>
</rdf:RDF>

The following sections use this packet for a crash course on practical metadata manipulation. Note that the XMP standard is quite complex and is based on RDF, which itself is complex, so what follows is not implied to be a replacement for the several hundred pages of reading needed to fully grasp those respective standards.

Verbatim Copying

The above XMP packet can be copied verbatim from a source image into derivative images quite easily:

def metadata(options = {})
  context['metadata']['xmp_string']
end

Adding Properties

Continuing with the example above, we assume that the source image's IPTC metadata contains a copyright statement that we'd like to copy into derivative images' XMP data.

According to the IPTC IIM standard, the tag most likely to contain a copyright statement is CopyrightNotice (p. 39).

def metadata(options = {})
  metadata = context['metadata']
  iptc = metadata['iptc']
  if iptc
    copyright = iptc.find{ |d| d['tagName'] == 'CopyrightNotice' }
    if copyright
      # TODO: write this
    end
  end
end

Modifying Properties

In this example, we want to remove any dc:title property that may be present in the source XMP, and add our own. We use Jena to do this.

java_import java.io.StringWriter

def metadata(options = {})
  metadata = context['metadata']
  model = metadata['xmp_model']
  if model
    # Search for dc:title properties.
    prop = model.createProperty('http://purl.org/dc/elements/1.1/title')
    it = model.listStatements(nil, prop, nil)

    # Remove them.
    it.removeNext while it.hasNext

    # Add a custom dc:title property.
    res = model.createResource
    obj = model.createLiteral('Hello world', false)
    stmt = model.createStatement(res, prop, obj)
    model.add(stmt)

    # Write the model to XML and return it.
    writer = nil
    begin
      writer = StringWriter.new
      model.write(writer);
      return writer.toString
    ensure
      writer&.close
    end
  end
  nil
end

Implementation Notes

  • After metadata is initially read from a source image, it may be cached, in which case subsequent requests will read it from the cache rather than the source image. In this case, changes to the source image's metadata will not be reflected in the application until the cached metadata becomes invalid and is re-read. If you need to change a source image's metadata, you should manually purge any cached content relating to it afterwards.
  • IPTC IIM supports many different character encodings, but Cantaloupe supports only UTF-8, ASCII, and ISO Latin 1, all of which get converted to UTF-8 internally.
  • XMP sidecar files are not supported.