Horizon defines metadata simply to be data about data; more specifically, it is ancillary data that provides information for intelligent interpretation of a primary set of data. In the case of an image dataset which might contain temperature measurements, the metadata could include information like the measurement units (e.g. Celsius), the instruments used, the date and time of the measurements, and the parameters describing the domain or world coordinate system within which the image data reside. Metadata as part of a dataset allows the dataset to be ``self-describing'', so that an application can adjust its handling of a dataset according to the values of the metadata. Of course, there must be some convention assumed by users on the meaning of the metadata.
Self-description is a critical feature when transporting the data
between different visualization and analysis tools which have
different models or conventions for handling data. The metadata,
thus, can provide a guide for converting the data between the
different conventions. Thus, data formats specifically defined to be
used for transporting scientific data, such as FITS and
HDF
provide support for metadata.
Like the HDF and FITS formats, Horizon refines the definition of
metadatum to be name-value pair where the name is a string
identifying the information contained in the metadatum and the value
is the the actual value. The value can be of any type, primitive
(e.g. integer, floating-point, etc.) or complex (e.g. a Java Object).
A collection of metadatum objects is, of course, a metadata set. Note
that metadata names are conceptually different from the fields in a
class definition when considered in the context of an object-oriented
program: metadata should be arbitrarily accessible by their name at
runtime, which is not (in general) the case for class field names.
For instance, it is not straight-forward for an application to allow
the user to request at runtime the value of any field of any arbitrary
object by typing the object and field names into a text input
box.
The metadata name, of course, is a short identifier to represent some
concept; for instance, ``instrument'' might be used as the name for
the metadatum providing the name of the instrument used to make the
measurement. One caveat that may not be immediately obvious (but is
an issue of much study and discussion) is that the mapping of the
metadatum name to a value always assumes an agreement between
the application that created the metadatum and the application that
accesses it as to the conceptual meaning of the metadatum
name.
In general, there is no
way for dataset to be completely self-describing in the absense of any
assumptions about what the data means. The obvious side-effect of
this situation is the inconsistant use of metadata between data
formats and applications that have been developed independent of each
other. The different systems might use different names to mean the
same thing (say, ``elevation'' and ``altitude'') and the same name to
mean different things (say ``elevation'' to mean altitude in feet
versus angle from the horizon in degrees).
The total set of mappings of metadatum names to definitions used by a
system is often called the system's schema. Identifying the
schemata
used by two systems provides
some hope of transporting data between the systems. Doing so,
however, requires that the schemata be sufficiently well-defined to
allow a translation. This need has inspired format standard bodies to
create detailed ``lexicons'' listing accepted metadata names and
their meanings.