4.5. Cooperation Between Scheme and C
In designing a Scheme interpreter to be embedded in other applications, there are a number of interesting issues to consider. Guile makes a number of concessions in order to coexist comfortably with C code:
- Scheme implementations must optimize tail calls. The Scheme language places heavy requirements on function calls; for example, all loops are written in terms of recursive function calls. The language definition requires implementations to detect and optimize tail calls to allow iterative computations to execute in constant space. (Any thorough presentation of Scheme discusses this topic in more detail.)
There are a number of ways to meet this requirement, but most make it difficult to mix Scheme and C functions, a critical goal for Guile. Guile avoids the issue by having the interpreter optimize tail calls between Scheme functions but implement tail calls to and from C functions as ordinary C function calls. This arrangement allows developers to write new Scheme-visible C functions in the obvious way and simplifies interlanguage function calls.
- Scheme implementations must support call/cc. The notorious call-with-current-continuation function (also known as call/cc) allows the user to treat continuations as first-class objects. Because Guile allows C functions and Scheme functions to call each other freely, a Guile continuation may include both C and Scheme stack frames. For simplicity, Guiles implementation of call/cc copies the entire C stack into the heap; invoking a continuation copies the stack back from the heap and uses the longjmp function to reactivate it. This implementation has a number of drawbacks: It is certainly not guaranteed to work by the definition of the C language, and it makes call/cc rather slow. However, it has proved portable to a very wide variety of machines in practice, and the performance of call/cc does not seem to be critical to most Guile applications. The overwhelming advantage of this approach is that it places no special requirements on the code of Scheme-visible C functions.
- Scheme implementations must provide garbage collection. For a Scheme implementation to free the storage occupied by an object, it must prove that the object will never be accessed by the program in the future. If C functions are allowed to hold pointers to Scheme objects in local variables (a very desirable ability), then the implementation must be able to discover all pointers on the C stack to Scheme objects in the heap.
Many languages offering automatic storage management encounter this issue and use a variety of solutions; some interpreters use reference counting, whereas others require C functions to add pointers to their local variables to a linked list upon entry and remove them from the list before returning. In theory these approaches are precise and sufficient, but in practice they are unmaintainable; even expert programmers will forget to adjust reference counts appropriately or neglect to register a local variable with the collector.
To avoid these problems, Guile uses conservative marking to find pointers on the stack to Scheme objects. Guiles representation for Scheme objects allows the garbage collector to efficiently and correctly distinguish valid pointers to heap objects from non-pointers. At garbage-collection time, Guile scans the C stack word-by-word, looking for any values that could possibly be valid pointers into the heap, and then traces the possibly referenced objects using ordinary precise marking techniques.
There are a number of objections to this approach: The collector may misinterpret a non-pointer as a pointer and thus retain objects which could be freed, or the C compiler might perform an optimization that creates pointers stored in an unusual format. However, these problems do not seem to arise in practice. Conservative techniques are becoming widely accepted in the industry, and there is extensive literature on their performance.
In the spectrum between purely precise and purely conservative marking techniques, Guiles collector sits at the safer end because it uses precise marking for all pointers in the heap and requires conservative marking only for the stack and continuation objects. In practice, we have found it reliable.
|