next up previous contents
Next: The Horizon Metadata Class Up: The Horizon Metadata Model Previous: The Horizon Metadata Model

What are Metadata?

 

Horizon defines metadata simply to be data about data; more specifically, it is ancillary data that provides information for intelligent interpretation of a primary set of data. In the case of an image dataset which might contain temperature measurements, the metadata could include information like the measurement units (e.g. Celsius), the instruments used, the date and time of the measurements, and the parameters describing the domain or world coordinate system within which the image data reside. Metadata as part of a dataset allows the dataset to be ``self-describing'', so that an application can adjust its handling of a dataset according to the values of the metadata. Of course, there must be some convention assumed by users on the meaning of the metadata.

Self-description is a critical feature when transporting the data between different visualization and analysis tools which have different models or conventions for handling data. The metadata, thus, can provide a guide for converting the data between the different conventions. Thus, data formats specifically defined to be used for transporting scientific data, such as FITS and HDFgif provide support for metadata.

Like the HDF and FITS formats, Horizon refines the definition of metadatum to be name-value pair where the name is a string identifying the information contained in the metadatum and the value is the the actual value. The value can be of any type, primitive (e.g. integer, floating-point, etc.) or complex (e.g. a Java Object). A collection of metadatum objects is, of course, a metadata set. Note that metadata names are conceptually different from the fields in a class definition when considered in the context of an object-oriented program: metadata should be arbitrarily accessible by their name at runtime, which is not (in general) the case for class field names. For instance, it is not straight-forward for an application to allow the user to request at runtime the value of any field of any arbitrary object by typing the object and field names into a text input box.gif

The metadata name, of course, is a short identifier to represent some concept; for instance, ``instrument'' might be used as the name for the metadatum providing the name of the instrument used to make the measurement. One caveat that may not be immediately obvious (but is an issue of much study and discussion) is that the mapping of the metadatum name to a value always assumes an agreement between the application that created the metadatum and the application that accesses it as to the conceptual meaning of the metadatum name.gif In general, there is no way for dataset to be completely self-describing in the absense of any assumptions about what the data means. The obvious side-effect of this situation is the inconsistant use of metadata between data formats and applications that have been developed independent of each other. The different systems might use different names to mean the same thing (say, ``elevation'' and ``altitude'') and the same name to mean different things (say ``elevation'' to mean altitude in feet versus angle from the horizon in degrees).

The total set of mappings of metadatum names to definitions used by a system is often called the system's schema. Identifying the schematagif used by two systems provides some hope of transporting data between the systems. Doing so, however, requires that the schemata be sufficiently well-defined to allow a translation. This need has inspired format standard bodies to create detailed ``lexicons'' listing accepted metadata names and their meanings.


next up previous contents
Next: The Horizon Metadata Class Up: The Horizon Metadata Model Previous: The Horizon Metadata Model

Ray Plante
Mon Aug 25 15:16:12 CDT 1997