Metadata embedded in image files

8th December 2014

Q – What is metadata in the context of digital asset management and multi-media libraries?

A – Information associated with a digital asset (i.e. an image, video, audio file etc…) for the purpose of describing it and its various attributes.

Classes of metadata

Broadly, metadata will comprise some or all of the following classes of information: Descriptive, Technical and Administrative.

Storing metadata

Metadata can be embedded within the digital asset, or stored separately in an associated database, often both methods will be used. This article is specifically about metadata embedded (stored) within the image file itself.

Technical outline

Metadata is embedded separately from the pixel data that makes up the actual image itself. Adobe’s TIFF format set out the original method for embedding metadata in image files, which has since been adopted by others. The schema field data (IPTC-IIM and/or Exif) is stored as blocks and this is referred to as Image Resource Block (IRB) format. Sets of IRB data can be nested together allowing multiple schemas in the same file. There can be drawbacks, however, in terms of size limits within the file header.

To get round this, XMP was introduced by Adobe in 2001. Based on XML (Extensible Mark-up Language), it is a more flexible storage method and offers more space for information.

Unlike Image Resource Block format, there are no limits on languages characters or data size because of the way in which XMP carries the information alongside its accompanying image in what’s termed a ‘sidecar’ file. Storing metadata together with image data in this way provides complete encapsulation (akin to the glue that used to stick it to the back of the photograph), which means both types of data can be shared and exchanged reliably as one unit. Metadata that is stored in the image file format is referred to as embedded metadata.

Embedded metadata standards

There are several embedded metadata standards in use, including:

These schemas began life as a multimedia Information Interchange Model (IIM), created by the International Press and Telecommunications Council to aid news and media organisations when captioning and cataloguing early digital images. It was later adopted by Adobe for use in Photoshop.

This original ‘legacy’ schema: IPTC-IIM is the most widely used and includes fields identifying an image’s creator and/or Rights holder, capture time, location, caption, headline, title, copyright notices and so on.

IPTC Core and IPTC Extension build on this by including more descriptive and administrative information, a robust data format in the shape of XMP (Extensible Metadata Platform) and fields supporting the needs of stock photography and cultural heritage organisations.

EXIF, which is both a storage format and a schema, includes technical information about an image and capture method, such as exposure settings, capture time, GPS location and model of camera (or device) that took it.

Here’s a screen shot of some EXIF data.


Why is embedded metadata useful?

First and foremost, most of it can be captured automatically with no effort required whatever, and it stays with the image permanently unless deliberately removed.

In the context of digital asset management and online libraries, good systems will be able to extract these data and map them to fields in the database to enable them to be searchable and displayed as required. Similarly, it should be possible from within a digital asset management system to overwrite or create new embedded metadata.

