Other advantages of using a between IL include
- An IL allows various system components to interoperate by facilitating access to information about the program undergoing translation. For example, the IL may contain symbolic information such as variable names, variable types, and source line numbers; such information could be useful in the debugger. Similarly, program development tools such as class browsers and performance profilers, operating at different points in the software development cycle, can share and utilize program information through the IL.
- An IL simplifies testing of the systems components. Components that fit in after IL construction can be tested by (artificial) creation of IL text.
- Considering the two designs shown in Figure 4.3, work that would otherwise be duplicated in a systems front and back ends can instead be performed in the compilers middle end,1 provided that such work can be performed on the IL. For this reason, the between IL pattern works best when most of the language translation effort can be accomplished on the IL itself because this makes the source- and target-specific portions simpler.
1The term middle end has entered into common usage and refers to whatever can be pulled out of a compilers front and back ends, which are language and target specific, respectively.
- A carefully designed and publicly accessible intermediate language offers the opportunity for other software systems to interface with the IL-bearing product, either by accepting the products IL as input for some other task or by acting as a surrogate provider of the IL. In the compiler-suite example, a vendor could sell program optimization methods by interfacing with the compilers IL to analyze and transform the intermediate representation to obtain improved performance.
- In a research setting, the IL can simplify the pioneering and prototyping of new ideas by providing the necessary infrastructure. Consider a compiler writer who wishes to experiment with new methods for eliminating computational redundancy. The task of developing a complete compiler de novo is daunting; if the method can instead be attempted on a compilers IL, then construction of front and back ends can be avoided. Moreover, if the system is multisource or multitarget, then deploying the optimization at the IL level can obtain benefits for multiple languages and multiple target platforms.
- The IL can make the compiler more portable, and numerous compilers have been developed using ILs for this reason (Chow & Ganapathi, 1983; Ottenstein, 1984):
Source Language
| Intermediate Language
|
|
Pascal
| Pcode
|
Java
| Java VM
|
Ada
| Diana
|
|
Well return to the issue of portability when we consider the Pascal Pcode system in greater detail.
In summary, intermediate languages play an important role in reducing the cost and complexity of translation systems. ILs work best when their role in a system can be considered with respect to present and future uses so that the IL design is appropriately general yet efficient. Finally, when an IL can be designed without a legacy of constraints from existing systems, there is typically much greater flexibility in defining the IL.
4.3. Principles of IL Design
Before examining some actual intermediate languages, it is worth considering some normative criteria for assessing the quality of an ILs design; these are expressed in the following list in terms of a between IL for a compiler suite, such as the one shown in Figure 4.3, but the principles can easily be extended to other IL situations:
- The IL should be a bona fide language and not just an aggregation of data structures. For example, IBM uses an intermediate language called XIL (OBrien, OBrien, Hopkins, Shepherd, & Unrau, 1995) in some compilers. Although those compilers are very effective, the XIL is not formally a language in the sense that programs are represented as strings of symbols. Instead, the front and back ends have architected interfaces for obtaining and shipping information. Consider the issue of alias information, which specifies when two variable names might reference the same object. Rules concerning aliasing are language specific. In XIL, such information is provided by the front end, not as a piece of text, but instead through a procedural interface for resolving specific alias queries. The result is an efficient system, but there is no string that fully represents the program at an intermediate level. Thus, one cannot transmit an IL form without bundling the front end and its internal structures that represent aliasing.
- The semantics of the IL should be cleanly defined and readily apparent. A good test of this criterion is the ease with which an interpreter can be written for the IL. A good example of such an IL is Pascals Pcode, which well examine later in more detail. A worse example is GNUs RTL (Stallman, http://www.fsf.org):
People frequently have the idea of using RTL stored as text in a file as an interface between a language front end and the bulk of GNU CC. This idea is not feasible. GNU CC was designed to use RTL internally only. Correct RTL for a given program is very dependent on the particular target machine. And the RTL does not contain all the information about the program. (Section 16.18)
It turns out that GNU has another intermediate representation, but it is poorly documented (Stallman, http://www.fsf.org):
The proper way to interface GNU CC to a new language front end is with the tree data structure. There is no manual for this data structure, but it is described in the files tree.h and tree.def. (Section 16.18)
If GNUs ILs have these problems, why are the GNU compilers in such widespread use? They are popular because they are easily re-targeted even though they are not easily re-sourced. Admittedly, the act of re-targeting a compiler occurs more frequently than its re-sourcing.
- The ILs representation should not be overly verbose. Although some expansion is inevitable, the IL-to-source token ratio should be as low as possible. Compression of IL representation has grown in importance with the increase of program transmission on the World Wide Web. Moreover, vendors such as Microsoft often keep portions of their software in IL format to decrease the time needed to launch an application because the IL format can be considerably more compact than native code.
- The IL should have a human-readable form because humans will inevitably want to examine the IL.
- The IL should be easily and cleanly extensible, although it is often difficult to predict the impact an unknown source, target, or language modification can have on the IL.
- The IL should be sufficiently general to represent the important aspects of multiple front-end languages and back-end targets.
In summary, designing a good IL is truly an engineering endeavor: Utility and generality must be considered along with efficiency.
|