Previous Table of Contents Next


Part III
Intermediate Languages

4  Intermediate Languages

Chapter 4
Intermediate Languages

by Ron K. Cytron

Most who program computers with some regularity are comfortable with authoring computer programs in source languages, such as Pascal or Java. Such languages offer extensible data and control abstractions that are conducive to algorithmic expression. However, most computers lack any native comprehension of such high-level languages but instead rely on compilers to translate source programs into some target machine language, the instructions of which typically operate on a more modest scale. Compilers and other programming-language translation tools bridge the “semantic gap” between high- and low-level program representations. For example, a compiler might accept Java programs as input and produce machine instructions for an Intel architecture.

In fact, many computer application programs can be regarded as language translators, in the sense that they accept an input source language and produce an output target language. For this discussion, compilers remain the primary focus; other examples include

  Text processors, which accept text and formatting specifications and produce an image to be viewed or printed
  Query processors, which accept a high-level query language and emit SQL as an intermediate language
  Theorem provers, which accept a theorem’s specification and emit ML (meta-language) as an intermediate language

Although input-to-output translation is the primary, observable task of language translators, many such systems have well-defined intermediate points of arrival at which some intermediate language (IL) is produced. This chapter examines the nature of ILs and their importance in translator design, portability, cost, and efficiency.

4.1. Overview

The early C++ compilers did not produce machine code directly (Stroustrup, 1994). As shown in Figure 4.1, the C++ language was translated from source to standard C, with the resulting C program compiled to machine code.


Figure 4.1.  Using Cfront to translate C++ to C.

In fact, because C++ programs could use the standard C preprocessor, source programs were first translated by C’s preprocessor Cpp and then translated into standard C by Cfront. Thus, in the trek from C++ to machine code, there are two well-defined points of arrival:

  From the perspective of the C and C++ programming languages, which include preprocessor directives, the output of Cpp is an intermediate step; although no name is formally given to this “language,” it is a subset of C and of C++. This IL, which is preprocessor-directive-free, simplifies construction of the rest of the compiler, which need not worry about preprocessor directives.
  From the perspective of Cfront, standard C is an intermediate language, interposed between C++ and machine code.

The use of Cfront in the translation of C++ was short-lived, but it nonetheless illustrates how an existing language can be recruited as an IL to facilitate prototyping a new language.

As another example, consider the steps by which LaTeX (the language in which this chapter was authored; Lamport, 1995) is translated into print, as shown in Figure 4.2.


Figure 4.2.  Translating LaTeX into print.

The LaTeX text-processing system does not produce print images directly; instead, LaTeX is translated into a more basic representation called TeX, which is in turn translated into a device-independent representation called dvi, which may in turn undergo several translations before arriving in print. Depending on how one views the usefulness of C in its own right, the emergence of LaTeX and its use of TeX as an IL is quite different from that of C++. Although TeX is well conceived as a typesetting tool, it lacks facilities for expressing documents in terms of their types and components. LaTeX offers such features, producing TeX as its output. In a now-standard use of ILs to enhance portability, TeX does not target any one printer but produces instead dvi. Thus, TeX need not undergo modification to accommodate a new kind of printer; instead, a new program is written to translate the (vastly simpler) dvi IL into the printer’s “instruction set.”

As seen with the preceding examples, use of ILs in a translation system carries some cost:

  The intermediate languages must be defined. Failure to define these languages accurately can have the same consequences as imprecise definitions of programming languages.
  Tools that process the intermediate forms must be constructed. Because such tools often operate beyond the user’s focal plane (e.g., Cpp), great care must be taken to make such tools as unnoticeable as possible. Most users of C or C++ are unaware that the Cpp preprocessor is invoked.
  Connections must be made between levels. For example, if errors occur in source inputs specified in C++ or LaTeX, the error messages should reference the original source lines and not the line number of an intermediate representation beyond the user’s view.

The extra steps associated with use of an intermediate language raise justifiable concerns of efficiency: A given system that uses ILs may not enjoy the performance of a competing product that avoids ILs and takes a more “direct” approach. The benefits and cost of an IL must be analyzed and compared; whereas gratuitous levels of intermediate representation are unwise, thoughtful system designs include ILs to simplify the task at hand as well as reduce the cost of adapting and maintaining the given system. In support of this, the rest of this chapter examines some principles and case studies concerning the role of ILs in an effective programming language translation system.


Previous Table of Contents Next