Copyright © 1968, 1972, 2004 P.J. Brown, R.D. Eager
Permission is granted to copy and/or modify this document for private use only. Machine readable versions must not be placed on public web sites or FTP sites, or otherwise made generally accessible in an electronic form. Instead, please provide a link to the original document on the official ML/I web site.
The first edition of this manual was published by the Cambridge University Mathematical Laboratory under the somewhat verbose title Technical Memorandum 68/1: the use of ML/I in implementing a machine-independent language to bootstrap itself from machine to machine.
This second edition has been re-published at the University of Kent, and contains minor additions and corrections to the first edition.
Readers should note that there now exists an alternative and usually better way of implementing ML/I, which is described in the manual Implementing software using the LOWL language.
The bulk of the text has remained unchanged since the first edition, the
following being the only facilities that have been significantly
extended: READ, OUTPUTID, MDERPR, LAYCHAIN
(all concerned with startlines), SUBROUTINE (use of a stack for
return addresses), HETABLES (S-variables) and initialisation
(S-variables and GHSHPT). Minor textual changes have been avoided
where possible. For example semicolon is still used to terminate
operation macros (though most implementations now use newline instead),
~ is assumed to be the insert marker (rather than % as in
later manuals) and {NL} (rather than the layout keyword
NL) is used to stand for the newline character within structure
representations.
There have also been changes to the character set used in listings,
paper tapes, etc. due to the differences in the codes used by the
ICL 4130, where such items are now produced.
This edition has been converted to electronic form using Texinfo, and the opportunity taken to make some minor changes which do not, however, merit a new edition.
The text has been updated to reflect current versions and usage of ML/I.
For example, newline is now used to terminate operation macros, and
examples of structure representations etc. have been updated to reflect
this; also, % is assumed to be the insert marker. Source code in
L is now distributed in ASCII, and the opportunity has been taken to use
more meaningful characters for the `and' operator (now &), the
`or' operator (now |), and the character representing newline (now
$).
Some substantive errors (present even in the first edition) have been corrected, as well as transcription errors introduced in the second edition.
Indexes have also been added.
In order to make it easier to transfer ML/I from machine to machine most of the logic of ML/I has been coded in a `machine independent' language which has been specially designed for the purpose. This language is called L and is intended to have the following properties:
However, certain parts of the logic of ML/I are manifestly machine-dependent, and hence there is no point in trying to describe them in a machine-independent way. Machine-dependent operations include I/O, the hashing function and type conversion routines. Hence the logic of ML/I is divided into two parts as follows:
In a typical implementation of ML/I the MI-logic might map into 3000 orders and the MD-logic into 500.
The use of the language L enables any existing implementation of ML/I to be used to create an implementation for a new machine. The procedure for doing this is as follows:
The procedure outlined above can be used not only for ML/I itself but for any piece of software. Assume that it is desired to apply the procedure to another piece of software, which will be called S. Then a new language, M, is designed, which has similar properties to L except that it is for describing the logic of S rather than ML/I. In practice M may be much the same as L but whereas L has special statements for stacking and for following down chains, which are special operations heavily used in the logic of ML/I, M may have statements for performing operations peculiar to S. Typical such operations might be accessing the particular type of dictionary used in S or manipulating the particular type of list structure used in S.
In most cases once mapping macros to convert L into some object language had been written, it would only require a few extra macros and a few deletions of existing macros to derive a system of macros for mapping M into the same object language.
The efficiency of the macro-generated code depends on how much trouble has been taken in writing the mapping macros.
If efficient code is to be generated then these macros must be designed to recognise all sorts of special situations and, as a result, the macros will be larger and will take much longer to debug. However, even if little effort is made to perform optimisation the generated code will not normally be grossly inefficient; the inefficiency will normally be between 5 and 50 per cent. The reason why the generated code should be reasonably efficient is that L has been specially designed for describing the logic of ML/I, and the basic statements in L therefore correspond to the most heavily used operations in ML/I. Hence even if special cases are not optimised, the resultant code should be much better than the code that would result if the logic of ML/I had been described in some predefined high-level language because the facilities offered in the high-level language would be unlikely to correspond to the most heavily used operations of ML/I and so some of these operations would need to be described in a rather clumsy way (for example when using PL/I, algorithms involving string-manipulation can often only be expressed in a logically clumsy way and, even if the PL/I produced highly optimised code, the resultant code would still be much less efficient than would result if there had been special string manipulation operators tailored to the problem in hand).
To summarise, it is better to tailor the software-writing language to the software rather than vice versa.
Of course it is always possible to code the MI-logic of ML/I by hand for the object machine. This has the advantage that the resultant code should be more efficient, though the implementor would need to have a thorough knowledge of the working of ML/I to gain much in this direction. Against this, the following advantages can be claimed for converting the MI-logic by macro-generation.
It is impossible to give any firm estimate of the number and size of the mapping macros necessary to perform an L-map, since this depends so much on the object language. In general the higher level the object language, the fewer features of L that require complicated mapping macros. For example, arithmetic expressions in L require very little translation to be made into expressions in PL/I whereas extensive translation is necessary to map them into assembly language.
Another variable factor is the degree to which names of variables, labels and subroutines in L require translation. Often no translation is necessary, but FORTRAN, for example, would require the identifiers representing labels to be mapped into numbers.
Hence although each L-map will normally have a mapping macro corresponding to each statement in L, the number of mapping macros needed to deal with sub-components of L will vary considerably between L-maps. However if a figure is required for the time to perform an L-map, experience so far indicates an average time to perform an L-map into an assembly language of about six man weeks plus the time taken for the implementor to learn to use ML/I and the object language, if he does not know them already.
The remainder of this manual is devoted to a description of L and a description of the MD-logic of ML/I. Hints on the writing of an L-map are provided. It is hoped that the discussion will be of interest to three types of reader:
The characters used in L consist of the upper case letters
A-Z, the digits 0-9 and the following:
, = ' [ ] & | ( ) + - / *
together with the following characters which occur only within character string literals:
; $
In addition the layout characters space, tab and newline are used. The only significance of layout characters is:
Apart from this, layout characters are used redundantly to improve the layout of the logic. Redundant tabs and newlines often appear between statements and redundant spaces often occur adjacently to statement delimiters.
The notation used for describing the syntax of L is the same
notation as that used in the ML/I manual except that the brackets
<< and >> are used instead of [ and ], since
the latter occur naturally. To illustrate the notation, the syntax of
the ML/I MCGO statement would be written:
MCGO {arg 1} << (IF ) {arg 2} (= ) {arg } ? >>;
(UNLESS) (GR )
(etc.)
Within the description of the semantics of L the notation of
ML/I inserts is used, with % as the insert marker as in the
ML/I User's Manual. Hence %A1. refers to the first
argument, %A2. to the second, and so on. T1 is used to
represent the number of arguments and so %AT1. refers to the last
argument.
In order to aid the writing of an L-map, structure representations are specified for all the elements of L. However in many cases delimiter structures can be written in many possible ways, and hence the structure representations specified should be viewed as suggestions rather than as absolute requirements.
Identifiers are used in L for the names of variables, labels and subroutines. Each identifier is a sequence of between three and six characters, the first of which is a letter and the remainder of which are either letters or digits.
The MI-logic is divided into pieces called SECTIONs. These are
delimited by the SECTION and ENDSECT statements in the
following way:
SECTION name, subsidiary comment << statement *>> ENDSECT name
where the name following ENDSECT is the same as that following
SECTION. The name serves no other purpose than to identify the
SECTION. The SECTIONs of the logic and their names are as follows:
a. VARS.Variable declarations. b. INVALS.Initialisation before main logic is entered. c. MAIN.Main logic. d. MAINSUBS.Main subroutines. e. OPMACS.Operation macros. f. DEFSUBS.Subroutines for setting up definitions. g. ERR.Error routines. h. ENVPR.Logic to print out environment. i. MACNAMES.Operation macro names. j. DELS.Delimiters and keywords.
All statements in the VARS SECTION are declarative statements,
which are described in
Declarative Statements and Initial Values.
All statements in the MACNAMES and DELS SECTIONs,
which are called collectively the data SECTIONs are data-defining
statements, which are described in
The Data SECTIONs.
The remaining SECTIONs are called collectively the
program SECTIONs and contain only executable statements, which are
described in Executable Statements.
The main purpose of SECTIONs is to give some flexibility in the organisation of an L-map. The order of SECTIONs may be changed freely if desired and some SECTIONs may be hand-coded and some may even be totally ignored in some L-maps. Other possible uses of SECTIONs are:
There are, in addition to SECTION and ENDSECT, two other
layout statements, namely PRGSTART and PRGEND. Each of
these occurs once in the logic, PRGSTART at the very start and
PRGEND at the very end. Neither statement has any arguments. In
most L-maps these statements will either be deleted or be
converted into control statements for the object language.
The structure representations of the four layout statements are as follows:
SECTION, NL
ENDSECT NL
PRGSTART NL
PRGEND NL
L contains three types of comment, all of which occur freely throughout the logic. The types are:
/+ text +/
Headings always occupy a line by themselves.
// text //
Subsidiary comments of this form occur as arguments to the
SECTION, SUBROUTINE and BLOCKDEC statements. As
well as this, subsidiary comments can occur by themselves, in which case
they occupy a single line, though they may be preceded by tabs.
/- text -/
As an example of a statement prefix, an assignment statement on which arithmetic overflow may occur is written:
/- OVP -/ SET ...
In an L-map where it was desired to take special action on such an assignment statement,
/- OVP -/ SET
would be recognised as a macro name. The list of possible statement prefixes is given in Appendix A. However this list may be extended at any time, if some L-map requires certain statements to be identified. In order that new prefixes should not upset existing L-maps, each L-map should contain the skip definition:
MCSKIP / WITH - - WITH /
to delete all prefixes. This skip would, of course, be overridden for those prefixes it was desired to recognise. For example, if
/- OVP -/ SET
was a macro name, then this would naturally override the skip.
The delimiter structures for comments are:
/ WITH + + WITH /
/ WITH - - WITH /
/ WITH / / WITH /
Comments should always be defined as skips or straight-scan macros.
A basic decision to be taken on each L-map is whether the generated object language program is to be made to `look nice'. If not, all comments can be deleted. If so, comments must be mapped into object language comments and a fair amount of trouble must be taken with newlines, tabs and spaces in order that the format of the object code will look reasonable.
A program in L is similar to a program in most other programming languages; it consists of a sequence of statements, some of which may be labelled. In L a label is written:
[ identifier ]
Labels are applied to data-defining statements as well as executable statements. In the former case they are called data labels and in the latter case program labels.
Several statements in ML/I use the stacks. To understand the use of these it is necessary to consider the way ML/I makes use of storage.
When ML/I is loaded into a machine it may be loaded in one contiguous chunk or, provided the loader can take care of the necessary linkage, the MD-logic and the various SECTIONs of the MI-logic can be split off from one another. Some of them might even reside on backing storage. In addition to the storage required for its logic, ML/I requires a fairly large area of contiguous storage for workspace, which is used for its two stacks. These two stacks are the forwards stack, which starts at the beginning of workspace and works towards the end, and the backwards stack, which starts at the end and works towards the beginning (if the two stacks ever meet, a process is aborted for lack of storage). The two stacks are used to contain macro definitions, etc., and to preserve information in recursive situations.
In a batch processing environment the workspace will normally occupy all the free storage of the machine, but in a multi-programming environment the size of the workspace required, which depends mainly on the number and size of macro definitions, might be specified by the user.
This chapter describes the components that make up the arguments of statements in L. These arguments may consist of constants, variables or indirect addresses. In some cases these are combined to form arithmetic expressions. For most L-maps it will not be necessary to convert the form of variable names but macros will be needed for the following purposes:
Constants and variables occur in all SECTIONs of the logic, but indirect addresses and expressions only occur in the program SECTIONs.
The following data types, which apply to constants, variables and indirect addresses, occur in L: pointer, switch, character, number. These data types are described below.
INVOCT (the count of calls) and variables used
as line counts or in calculating the values of macro expressions. These
latter are potentially infinite. There is no point in extending the
precision of numerical data just to cater for these few variables as
overflow is unlikely and not important even if it does occur. Most
implementations ignore overflow, but, just in case it is desired to
detect it on some particular implementation, assignment statements on
which overflow is possible have been prefixed:
/- OVP -/
Numerical data is subject to the same operations as pointer data.
TRUE and FALSE are also used, but these
are represented as 1 and 0, respectively). Switch variables are subject
to the following operations: assignment, and'ing, or'ing, comparison for
identity, passing as argument.
Data types need only be differentiated on an L-map if they are stored in different ways. It makes an L-map very much easier if all data types are represented in the same way. This will normally be convenient on a word-oriented machine, but it would be rather wasteful, for instance on System/360 to represent switches or characters as full words. Failing complete identity of data types, the following equivalences make an L-map easier:
The first consideration is important as it obviates the need to examine the data types when generating code for arithmetic expressions.
It is very undesirable to represent characters in a `packed' form in such a way that they occupy less than the basic storage unit of the object machine, since this involves considerable problems with code generation, addressing and alignment.
Examples of how data types have been treated for various object languages are:
FIXED(15) BINARY STATIC
Variables in L are represented by identifiers. All variables
are represented are declared in the VARS SECTION. Variables may
be of type pointer, number or switch. There are no character variables.
The last two characters of the name of a variable represent its type as
follows:
SW- means switch, e.g.
MASKSW,INSW.PT- means pointer, e.g.
SPT,PRP2PT.CH- is not used.
Anything else means number, e.g. TYPE, IDLEN.
All variables are scalars and have global scope. There is never any need for storage to be assigned to variables dynamically.
Many constants in L of type number or switch are represented as integers. These integers are always in the range 0-9 except for one case noted in Hash-Tables and their Definition.
However in many cases the value or the representation of a constant in L will depend on the object language or on the object machine. These constants are represented in L by constant-defining macros, each of which will be replaced by the appropriate value during an L-map. The various constant-defining macros are described in the following Sections.
OF and length-defining macrosBefore describing the OF macro it is necessary to describe the
six length-defining macros, which act as subsidiary macros to the
OF macro. The length-defining macros all represent positive
integer constants as follows:
LPT = number of units of storage occupied by a pointer.
LNM = number of units of storage occupied by a number.
LSW = number of units of storage occupied by a switch.
LCH = number of units of storage occupied by a character.
LICH = 1/LCH. Special action is necessary if LCH is not 1.
LHV = number of units of storage occupied by the hash-table
(see Hash-Tables and their Definition).
Special action is necessary if any of these constants are not integers
(one solution may be to `devalue' the storage unit). The length-defining
macros only occur within the argument of the OF macro, but it
will usually be convenient to give them global scope.
To illustrate sample values of these constants, in the PDP-7
implementation the first five all had value one, whereas in the
System/360 implementation LPT and LNM had value four but
LSW and LCH had value one.
The specification of the OF macro is as follows:
Purpose
Designates numerical constants that are dependent on the amount of storage occupied by individual data types.
General Form
OF ( argument )
Structure Representation
OF WITHS ( )
Examples
OF(LPT)OF(2*LPT - LSW)
Restrictions
The argument is such that if all the length-defining macros are replaced by their numerical values, then what results is a macro expression involving only constants (this macro expression will in practice have a positive result).
Action
OFshould be replaced by the result of the macro expression occurring as its argument. For most L-maps the following macro should suffice:MCDEF OF WITHS ( ) AS <%%A1..>
Purpose
Designates a character constant.
General Form
'character(s)'
Structure Representation
' '
(if it is a macro, it should be a straight-scan macro, and the argument
should be referenced as %WB1.).
Examples
'A''$'' ''MCDEF'
Restrictions
Where quote is used in the program SECTIONs its argument is always a single character. However it may have any atom as an argument if it occurs in the data SECTIONs. The characters that can occur within the argument are the upper case letters A to Z, a space, or any of the following:, = ( ) + - / */ ; $Each character stands for itself except
$, which stands for newline.
Notes
- If the object language contains some facility for literal character constants then the quote macro should not require much work during an L-map. Only
$will need replacing and this can be achieved by:MCDEF <' WITH $ WITH '> AS < internal code for newline>at least within the program SECTIONs.
- The character set of an implementation is arbitrary except that it must contain all the characters that can occur within the argument to the quote macro (or their equivalents), together with the digits 0 to 9. Furthermore the character set must not contain `shift' characters, and the internal representation of characters may need to be changed if such characters normally exist.
AD and BLOCK macrosThe AD and BLOCK macros are used to represent constants of
type pointer. Since the BLOCK macro is only used within block
moving statements, its description is delayed until these statements are
described in Block Moving Statements.
The AD macro is described below.
Purpose
Designates address of data label.
General Form
AD( identifier )PT
Structure Representation
AD WITHS ( ) WITHS PT
Example
AD(KSPACE)PT
Restrictions
The identifier is the name of a data label.
Action
AD should be replaced by a literal representing the value of the
address.
Notes
In some L-maps AD may prove very difficult to encode.
It may help to build it into the expression evaluation macro
(see Arithmetic Expressions).
Some constants in L are represented by identifiers. This is
done either for mnemonic purposes (e.g. the constant TRUE) or
because the value of a constant will vary between different machines
(e.g. the constant STOPCODE). A complete list of such constants
follows, together with a description of what values should be used to
replace them.
There are two possible switch constants:
| a. | TRUE. | This has value one.
|
| b. | FALSE. | This has value zero.
|
There are two possible pointer constants:
| a. | ZEROPT. | This may have any value which is less
than or equal to any possible pointer value.
|
| b. | NULLPT. | This may have any value that does not
correspond to a possible value of a pointer.
|
In most L-maps both ZEROPT and NULLPT will have
value zero.
There is one possible character constant:
| a. | STOPCODE. | This is a special marker, not
representing any possible character, which uniquely identifies the end
of the source text. STOPCODE is used by the READ
statement, see The READ Statement.
|
The following constants represent the sizes of the blocks of variables
declared using the BLOCKDEC statement
(see Declarative Statements and Initial Values).
| a. | SDBSZ. | This represents the number of storage
units used by the SDB block.
|
| b. | OPDBSZ. | This represents the number of storage
units used by the OPDB block.
|
| a. | ALLSZ. | This represents the number of storage
units used by the ALL block.
|
| a. | EDBSZ. | This represents the number of storage
units used by the EDB block.
|
The following mnemonic markers are used to identify the various types of operation macro and insert:
| e. | OPMK. | This identifies an ordinary operation
macro and has value 0.
|
| f. | LOCMK. | This identifies a local NEC
macro and has value 1.
|
| g. | UINSMK. | This identifies an unprotected insert
and has value 2.
|
| h. | PINSMK. | This identifies a protected insert and
has value 3.
|
| i. | STRMK. | This identifies a straight-scan macro
and has value 4.
|
The following markers may follow the specification of a delimiter (in the case of this set of markers, as for the previous set, the implementor need not, in fact, be concerned with their meanings; he only needs to know what to replace them by).
| j. | ENDCHN. | This identifies a closing delimiter
and has value zero. It also serves as an end-of-chain marker.
|
| k. | EXCLMK. | This identifies an exclusive
delimiter.
|
| l. | WITHMK. | This is used in implementing the
WITH keyword in ML/I.
|
| m. | WTHSMK. | This is used in implementing the
WITHS keyword in ML/I.
|
| n. | SPCSMK. | This is used in implementing the
SPACES keyword in ML/I.
|
Four distinct values should be chosen for EXCLMK, WITHMK,
WTHSMK and SPCSMK. Each value should be a number larger in
absolute value than the largest possible workspace size
(see Use of Storage and Stacks).
Finally there are two constants dependent on the value N described in Chapter 6 of the ML/I User's Manual. The value of N is dependent on the width of listings but a typical value would be 30.
| o. | TEXMAX. | This has value 2N.
|
| p. | HTMAX. | This has value N-4.
|
In addition to constants and variables, statements in L may
involve indirect addresses. These are represented by the IND
macro, which is described below.
Purpose
Specifies indirect address.
General Form
IND ( arithmetic expression ) (CH)
(NM)
(PT)
(SW)
(Arithmetic expressions are defined in Arithmetic Expressions).
Structure Representation
IND WITH ( OPT ) WITH CH OR ) WITH NM OR ) WITH PT OR ) WITH SW ALL
Examples
IND(SPT)CHIND(HASHPT + TEMP - OF(LCH))SWIND(AD(KSPACE)PT)NM
Restrictions
The arithmetic expression must not itself contain an indirect address. The result of the arithmetic expression is a pointer.
Action
IND specifies an item of the data type given by the last two
letters, which is accessed indirectly via the pointer. This item will
lie in workspace or in the data SECTIONs.
Notes
The mapping macro forINDnormally requires to be carefully planned. It should only be called from within another macro and this macro should pass down to theINDmacro an indication as to what to do with the indirectly addressed item (e.g. add it, store into it, etc.). It is often convenient to buildINDinto the expression evaluation macro, which will be outlined later.
It is necessary at this stage to define a notation that will be needed in subsequent Sections to specify the form and type of arguments. In this notation an argument will be specified by writing:
form-type
where form consists of a sequence of one or more of the following letters:
V- for variable
C- for constant
I- for indirect address
and type consists of one or more of the letter pairs CH,
NM, PT, SW. These letter pairs represent the data
types. form and type indicate the various forms and types
an argument may take. For example VI-SW means a variable or
indirect address of type switch and C-NMPT means a constant of
type number or pointer.
Several statements in L allow arithmetic expressions as arguments. There is no explicit expression evaluation macro, but an artificial macro for this purpose will need to be set up as outlined below. The specifications of arithmetic expressions are as follows:
General Forms
<<I-NMPT ?>> <<(+) V-NMPT *?>> <<(+) C-NM >>
(-) (-)
where at least one constituent is not null.C-PT
Examples
-6IND(SPT)NM + IDLEN - 3- SKVAL + OF (LPT+LSW)SPT - ARGNO - SDBSZAD (KSPACE)PT
Action
Evaluate the arithmetic expression by the normal rules of arithmetic. The result of the expression will be a number or a pointer (adding or subtracting a number or a pointer yields a pointer, and subtracting two pointers yields a number).
Notes
Calls of the macro for expression evaluation must be artificially set up by other macro calls. Thus for example theSETmacro might call:EXPR %AT1.where
EXPRwas the expression evaluation macro. In this caseEXPRmight have the structure representation:OPT EXPR OR EXPR WITHS - ALL N1 OPT + N1 OR - N1 OR NL ALL(
EXPRwould need to be called on a subsequent pass to theSETmacro, or the technique for creating dynamic macro calls described in Chapter 7 of the ML/I User's Manual would need to be used.) It might be found desirable to build theINDandADmacros into theEXPRmacro by making theEXPRmacro recognise occurrences of these within its arguments.
This chapter describes the statements that occur in the program SECTIONs. These are divided into five categories as follows:
L contains two types of routine, namely subroutines and
linkroutines. These are declared using the SUBROUTINE and
LINKROUTINE statements and are both called using the CALL
statement. Declarations of subroutines or linkroutines are always global
and are hence never nested within other declarations. Declaration need
not precede use, although it would be possible to rearrange the logic of
ML/I to make this so.
The following Sections describe subroutines and linkroutines together
with how they are declared and how they return when their actions are
completed. After this the CALL statement will be described.
Subroutines in L either have a single parameter, which may be of type number, switch or pointer, or no parameter at all. In addition some subroutines require an exit label to be supplied with each call. An exit label specifies a label to which the subroutine is to go if some special condition arises.
The calls of subroutines always match their declarations in that if a subroutine has a parameter it is always called with an argument of the same type and if it is declared to have an exit label it is always called with one. A subroutine never changes the value of its parameter.
A return from a subroutine is accomplished by the RETURN FROM or
EXIT FROM statements. These two statements, together with the
SUBROUTINE statement, are described below.
SUBROUTINE statementPurpose
Declares a subroutine.
General Form
SUBROUTINE identifier << (PARPT) ?>> <<EXIT subsidiary comment ?>>
(PARNM)
(PARSW)
<< statement *>>
ENDSUB
Structure Representation
Preferably two separate macros.
SUBROUTINE OPT ( ) N1 OR N1 EXIT N2 OR N2 NL ALLENDSUB NL
Examples
The following are examples of first lines of subroutine declarations:
SUBROUTINE CMPARE (PARPT) EXIT //COMPARISON FAILS//SUBROUTINE NOARGS
Restrictions
Subroutines are never called recursively.
Action
Set%A1.as the name of a subroutine. At the start of the subroutine generate code to assign the argument (if any) to the parameter and, if necessary, preserve the exit label and the return address in order that they may be available to theRETURN FROMandEXIT FROMstatements.
Notes
- The last two letters of the parameter indicate its type.
PARPT,PARSWandPARNMare declared in theVARSSECTION of the logic just like other variables. They could in fact be equated to one another in an L-map where types were not differentiated, as the logic of ML/I is such that there is never more than one parameter in existence at any one time, i.e. if a subroutineAwith a parameter calls a subroutineBwith a parameter then the logic is such that it does not matter if the parameter ofAis clobbered as a result of the call ofB.- There are basically two possible techniques for preserving return addresses. Either each subroutine can have a unique storage location for preserving its return address or a stack of return addresses (which must be entirely separate from the other stacks) can be used. A problem with the latter approach is that some
GOTOstatements jump out of subroutines into the main logic without passing through the normal exit mechanism. To help solve this problem each label referenced by such aGOTOis preceded by a statement prefixCSS, which can be mapped into instructions to clear the subroutine stack.- The comment following
EXITis merely an aid to the reader of the logic.
RETURN FROM statementPurpose
Return from a subroutine.
General Form
RETURN FROM identifier
Structure Representation
RETURN WITHS FROM NL
Example
RETURN FROM CMPARE
Restrictions
RETURN FROMstatements always lie within a subroutine, the name of which is given by%A1.
Notes
Return from the subroutine to the point of call.
EXIT FROM statementPurpose
Returns from a subroutine to an exit label.
General Form
EXIT FROM identifier
Structure Representation
EXIT WITHS FROM NL
Example
EXIT FROM CMPARE
Restrictions
EXIT FROMstatements always lie within a subroutine, the name of which is given by%A1., and this subroutine has an exit label.
Action
Go to the program label supplied as an exit label in the call of the subroutine.
Notes
Exit labels may be implemented in many different ways. Two possibilities are:
- To treat the exit label as an extra argument of the subroutine.
- (when mapping into assembly language)
to code a statement of form:CALL SUB EXIT LABas
CALL SUB GO TO LABIn this case the
EXIT FROMstatement should cause a return to the point of call and theRETURN FROMstatement, within a subroutine with an exit label, should cause a return to the point of call plus a suitable offset to skip over theGO TOstatement.
The logic contains only one linkroutine, which is named STKARG.
The mechanism of linkroutines is included in L to cater for
recursion, and STKARG may be regarded as a recursive subroutine.
It has no parameter or exit label and is called by the statement:
CALL STKARGThe difference between linkroutines and subroutines is that in subroutines the preserving of the return address, if necessary, is the responsibility of an L-map whereas in a linkroutine the return address must be assigned to the variable
LINKPT when the routine
is called. LINKPT is one of the variables automatically preserved
by the logic in recursive situations, and hence the implementor does
not need to bother about the problem. A return from STKARG is
accomplished by the LINK BACK statement.
If the object language is an assembly language then STKARG will
normally be represented as a subroutine and will present few problems.
However in L-maps into high-level languages it will usually
not prove possible to represent STKARG as a subroutine because:
STKARG, unlike a subroutine, contains a jump to a point outside
itself, from where a jump back into STKARG subsequently occurs
(see The GO TO Statement).
To surmount these difficulties it may be necessary to represent
STKARG as a piece of labelled code which is `called' by setting
LINKPT as the return address (or as an index to a
switch, in the ALGOL sense, of possible return
addresses) and then performing the statement:
GO TO STKARG
Moreover it may be necessary to treat LINKPT as a special type
of pointer since it points at the program rather than the data.
LINKROUTINE statementPurpose
Declares a linkroutine.
LINKROUTINE identifier << statement >> ENDSUB
Structure Representation
AssumingENDSUBto be a separate macro, the structure representation of theLINKROUTINEstatement isLINKROUTINE NL
Example
LINKROUTINE STKARG . . . ENDSUB
Action
Set the identifier as a subroutine entry point or a label, as
appropriate, and generate code to store the return address in
LINKPT.
Notes
It is possible, by definingCALL STKARGas a macro, to setLINKPTat the point of call instead of when the linkroutine is entered.
LINK BACK statementPurpose
Returns from a linkroutine.
General Form
LINK BACK
Structure Representation
LINK WITHS BACK NL
Restrictions
LINK BACKoccurs only within the linkroutineSTKARG.
Action
Return to the address contained in LINKPT.
CALL statementPurpose
Calls a routine.
General Forms
CALLidentifier<< EXITprogram label?>>CALLidentifier(arithmetic expression)(NM)program label
(PT)
<< EXIT?>>CALLidentifier(VIC-SW)SW << EXITprogram label?>>
Structure Representation
CALL OPT ( ) N1 OR N1 EXIT N2 OR N2 NL ALL
Examples
CALL CMPARE ( IDPT + 1 )PT EXIT NOTFNDCALL NOARGS
Restrictions
The call must correspond to someSUBROUTINEorLINKROUTINEdeclaration in the MI-logic, or to a subroutine in the MD-logic.
Action
Call the routine. If it has an argument, load the value (not the address) of the argument into some fixed place (e.g. an accumulator) in order that it can be assigned to the corresponding parameter. If there is an exit label, make this available to the called routine.
Notes
In some L-maps it may be convenient to assign the argument to the corresponding parameter at the point of call, rather than to load it into an accumulator and then store this accumulator into the parameter when the routine is entered.
There are two compound statements in L, namely IF and
CHAIN FROM. These are described below.
IF statementPurpose
Conditional statement.
General Form
IFconditionTHENstatementIFconditionTHEN
<<statement*>>
END
Structure Representation
See later.
Examples
IF SPT = IDPT THEN GO TO ENDIDIF IND(SPT)CH NE 'A' THEN
. . .
END
Restrictions
See next Section for the form of a condition.
In form a) the statement is never anIForCHAIN FROMstatement. Statements of form b) are never nested within one another.
Action
Perform the statement(s) if the condition holds.
Notes
- It is probably best to consider
ENDas a separate macro fromIF.- In form a) the closing newline acts as a closing delimiter of both the
IFstatement and the statement followingTHEN. There are three possible ways of dealing with this:
- Turn form a) into form b) (or something of similar syntax) on a prepass.
- Treat newline as an exclusive delimiter of all statements except
IFandCHAIN FROM.- Make
IFa straight-scan macro and, when dealing with form a), insert the last argument in the following way:MCDEF ZZ AS <%WAT1. > ZZThe first approach is probably the simplest.
- Many L-maps will wish to recognise
THEN GO TOas a special case, in order to generate better code.
IF ConditionGeneral Form
The condition of anIFstatement can have the following form:
relationrelation<< &relation*>>relation<< |relation*>>where
&means `and' and|means `or' (note that `and' and `or' cannot both occur in the same condition).
A relation can have the following forms:
arithmetic expression (GR) VCI-PTNM
(GE)
(LE)
(= )
(NE)VI-SW (= ) VIC-SW
(NE)I-CH (= ) IC-CH
(NE)
GRmeans `greater than',GEmeans `greater than or equal to',LEmeans `less than or equal to', andNEmeans `not equal to'.
Examples
IF SPT - IDPT GE 6 | IND(PARPT)SW NE 3 THEN ...IF AAA = 1 & BSW NE 2 & CCC + DDD GR EEE THEN ...
Restrictions
In form a) of a relation the arithmetic expression never contains a
constant as a constituent, i.e. constants only appear after rather
than before a relational operator. If an indirect address appears after
the relational operator, this always takes the form
IND(V-PT)...
Action
Test whether the condition holds.
Notes
The data types to be compared can be found by examining the last two characters of the first argument.
IF StatementThe structure representation corresponding to the IF statement
will vary considerably between L-maps. If
THEN GO TO was to be recognised as a special case then
a prepass might perform the transformation:
MCDEF THEN WITHS GO WITHS TO AS THENGO MCSKIP D,THEN WITHS NL MCDEF THEN NL AS <<THEN> %A1. END >
In this case the delimiter structure corresponding to the IF and
END statements could consequently be:
IF N1 OPT GR OR GE OR = OR NE OR LE ALL OPT THEN WITH NL OR THENGO NLOR | N1 OR & N1 ALL
END NL
CHAIN FROM statementPurpose
Follows down a chain where the next link is given by its offset from the current link.
General Form
CHAIN FROM arithmetic expression EXIT program label << statement *?>> ENDCH
ENDCHis regarded as a separate statement in that it may be preceded by the placing of a program label.
Structure Representation
Two macros:
CHAIN WITHS FROM EXIT NLENDCH NL
Example
CHAIN FROM IDPT + OF(LNM) EXIT JC1
SET IND(CHANPT)NM = 0
ENDCH
Restrictions
The arithmetic expression yields a pointer as value. Calls of the
CHAIN FROM statement are never nested within one another.
Action
TakingCHAIN FROMas two macros as indicated in its suggested structure representation, the first macro is equivalent to:SET CHANPT = %A1. IF CHANPT = NULLPT THEN GO TO %A2. [Lab] SET CHLINK = IND(CHANPT)NMand the
ENDCHmacro is equivalent to:IF CHLINK NE ENDCHN THEN SET CHANPT = CHANPT + CHLINK GO TO Lab ENDwhere
Labis some generated label unique to the call ofCHAIN FROM, andENDCHNandNULLPTare the constant-defining macros described in Constants Represented by Identifiers.
L contains three statements for moving about contiguous blocks
of information. These statements are MOVE FROM, MSTACK
FROM and MUNSTACK FROM. Each statement requires three arguments
as follows:
(in the case of MUNSTACK FROM, argument b) is implicit). Argument
c) is always represented by an arithmetic expression, yielding a
positive (non-zero) numerical value, and each of arguments a) and b) may
be represented in either of two ways, depending on the area pointed at:
VARS SECTION then the argument
is specified by:
BLOCK ( name )
where the name is the name of a block of variables declared by
the BLOCKDEC statement
(see Declarative Statements and Initial Values).
BLOCK can be
regarded as a constant-defined macro of type pointer. It is only used
within arguments to block moving statements. Many L-maps will
treat BLOCK in exactly the same way as the AD macro
(see The AD and BLOCK macros).
The descriptions of the three block moving statements are written in
terms of loops using the local variables
LV1, LV2, LV3 and LV4.
However an L-map need not follow
the exact descriptions of these statements, provided that the overall
effect is not altered. In particular the variables
LV1, LV2, LV3 and LV4 need not be used. In
fact the whole purpose of the block moving statements is to allow
efficient code specially tailored to the object machine to be inserted.
MOVE FROM StatementPurpose
Moves a block of information from one place to another.
General Form
MOVE FROM arg A TO arg B LENG arg C << BACKWARDS ?>>
Structure Representation
MOVE WITHS FROM TO LENG OPT BACKWARDS N1 OR N1 NL ALL
Example
MOVE FROM BLOCK(SDB) TO IDPT + OF(LNM) LENG SDBSZ
Restrictions
See general description of block moving statements.
Action
- In the
BACKWARDScase, which is used only when two fields may overlap and upset a forwards move, the action is equivalent to:SET LV1 = %A1. SET LV2 = LV1 + %A3. - OF(LCH) SET LV3 = %A2. + %A3. - OF(LCH) [XX] SET IND(LV3)CH = IND(LV2)CH IF LV2 NE LV1 THEN SET LV2 = LV2 - OF(LCH) SET LV3 = LV3 - OF(LCH) GO TO XX END- If the
BACKWARDSoption is not specified, a forwards move is performed in the following way:SET LV2 = %A1. SET LV3 = %A2. SET LV1 = LV2 + %A3. - OF(LCH) [YY] SET IND(LV3)CH = IND(LV2)CH IF LV2 NE LV1 THEN SET LV2 = LV2 + OF(LCH) SET LV3 = LV3 + OF(LCH) GO TO YY END
MSTACK FROM StatementPurpose
Stacks a block of information.
General Form
MSTACK FROM arg A LENG arg C ON (FSTACK)
(BSTACK)
Structure Representation
MSTACK WITHS FROM LENG ON NL
Example
MSTACK FROM IDPT LENG IDLEN ON FSTACK
Restrictions
See general description of block moving statements.
Action
- If
%A3.isBSTACK, the action is equivalent to the statements:CALL DECLF ( %A2.)NM MOVE FROM %A1. TO LFPT LENG %A2.- If
%A3.isFSTACK, the action is equivalent to the statements:SET LV4 = FFPT CALL BUMPFF (%A2.)NM MOVE FROM %A1. TO LV4 LENG %A2.
MUNSTACK FROM StatementPurpose
Unstacks a block of information from the backwards stack.
General Form
MUNSTACK FROM arg A TO arg B LENG arg C FROM BSTACK
Structure Representation
MUNSTACK WITHS FROM TO LENG FROM WITHS BSTACK NL
Example
MUNSTACK FROM STAKPT TO BLOCK(EDB) LENG 3
Restrictions
See general description of block moving statements.
Action
The action is equivalent to the statements:MOVE FROM %A1. TO %A2. LENG %A3. SET LFPT = %A1. + %A3.(note that%A1.might beLFPTand hence the ordering of the two above statements should not be changed to try and improve efficiency.)
There are three statements in L that deal with I/O, namely
READ, OUTPUTID and PRTEXT. These statements deal
respectively with the input text, the output text and error messages. In
addition certain of the machine-dependent subroutines deal with the
production of error messages
(see Subroutines for Error Messages).
READ and
OUTPUTID each occur only once in the logic. However they have
been represented as statements in L rather than as
machine-dependent subroutines in order that, for reasons of efficiency,
they can be replaced by in-line code on an L-map (if the
object language has a macro facility, it may be convenient, for ease of
changing, to map I/O statements into object language macros).
ML/I has a single stream of input. However there may be several streams of output as follows:
Alternatives b) and c) are optional and need not be available on all implementations of ML/I.
READ StatementPurpose
Reads in the input text.
General Form
READ
Structure Representation
READ NL
Action
Read one character of input and translate it to internal code. If appropriate check that the character is legal and, if not, perform the action:CALL ERSICand replace the character by the error character for the implementation. Then place the character on the top of the forwards stack by performing the following action:
SET IND(FFPT)CH = character //UPDATE STACK POINTER// SET FFPT = FFPT + OF(LCH) //TEST FOR OVERFLOW// IF FFPT GE LFPT THEN GO TO ERLSOThe
READstatement should make each line of input end with the character `newline'. In the case of card input this character may need to be added to each line and in the case of paper tape input the pair of characters `carriage return, line feed' may need to be mapped into the single character `newline'.If appropriate the
READstatement should provide a listing of the input with accompanying line numbers. When the input is exhausted theREADstatement should return the character represented bySTOPCODE(see Character Constants). The logic is arranged so that if aSTOPCODEis returned then theREADstatement is not executed again but control goes to the labelMDHALT(see TheMDHALTLabel) instead. If the input is split up into several physical units (e.g. several separate paper tapes), theREADstatement should take care of the interface between different units.The
READroutine also has responsibility for taking action at the boundary between lines. At the start of each line, including the first, the following action should be performed:SET S2 = S2 + 1 IF LINECT = TLINCT THEN SET TLINCT = S2 SET LINECT = S2 IF S2 = 1 THEN insert startline character at beginning of linewhere
S1andS2are S-variables. This action should not be performed on the very last line, i.e. between the last newline and theSTOPCODEat the end. Any unused internal code can be chosen for startline. This is defined by theLAYCHAINstatement (see TheLAYCHAINstatement).To summarise, the action of the
READstatement should be to make the input text appear as a single sequence of characters, with the character `newline' at the end of each line, the characterSTOPCODEat the end of the text, and startline characters inserted where appropriate.
OUTPUTID StatementPurpose
Produces output.
General Form
OUTPUTID
Structure Representation
OUTPUTID NL
Action
Output a sequence of characters. The sequence is pointed at byIDPTand the number of characters in the sequence is given byIDLEN.IDLENis always greater than zero and can be arbitrarily large. The values ofIDPTandIDLENmay be clobbered by theOUTPUTIDstatement.Normally output should be sent to a device that can then be read back in by the job step that comes after macro processing. It may also be desirable to produce a listing of the output, on option.
OUTPUTIDshould, if necessary, translate characters to the code required by the external device. It may be necessary to take special action with the character `newline'. The markerSTOPCODEis never sent to theOUTPUTIDroutine. Instead it is the responsibility of the code atMDHALT(see TheMDHALTLabel) to finalise the output. Startline characters should be ignored.
PRTEXT StatementPurpose
Prints literal strings within error messages.
General Form
PRTEXT [ << character *>> ]
Structure Representation
PRTEXT WITHS [ N1 OPT $ N1 OR ] WITH NL ALL
Example
PRTEXT[$PROCESS ABORTED$]
Restrictions
The set of possible characters occurring within the argument to
PRTEXT is the same as for the quote macro
(see The Quote macro).
Action
Print the argument as a string of characters on the error message
listing. Each character stands for itself except $, which
represents a newline.
Notes
PRTEXTshould be mapped with a straight-scan macro. Arguments should be inserted using%WB1.etc. It is usually convenient to deal with$by treating it as a delimiter ofPRTEXTas suggested by the above structure representation.
A list of all the L statements used to perform assignment or branching operations follow. The statements are arranged in alphabetical order.
BACKSPACE StatementPurpose
Examines former value of a variable now lying on BSTACK.
BACKSPACEV-NMPTSWBACKSPACEV-NMGIVINGV-NMBACKSPACEV-PTGIVINGV-PTBACKSPACEV-SWGIVINGV-SW
Structure Representation
BACKSPACE OPT GIVING N1 OR N1 NL ALL
Examples
BACKSPACE SPTBACKSPACE SPT GIVING IDPT
Restrictions
%A1.is in the block that is calledSDB(seeBLOCKDECstatement of Declarative Statements and Initial Values).
Action
LetNbe the offset of%A1.from the start of theSDBblock (i.e. the number of units of storage occupied by the variables preceding%A1.in theSDBblock). Then the action taken for the respective forms is:
SET TEMPT = DBUGPT + NSET %A2. = IND(DBUGPT + N)NMSET %A2. = IND(DBUGPT + N)PTSETSW %A2. = IND(DBUGPT + N)SWIn cases b) to d)
TEMPTmay be clobbered.
CHARMATCH StatementPurpose
Multi-way conditional branch statement.
General Form
CHARMATCH V-PT <<, C-CH GOING program-label *>>
Structure Representation
CHARMATCH N1 OPT , GOING N1 OR NL ALL
Example
CHARMATCH IDPT, 'A' GOING INSA, 'B' GOING INSB
Action
The action is equivalent to:IF IND(%A1.)CH = %A2. THEN GOTO %A3. IF IND(%A1.)CH = %A4. THEN GOTO %A5. . . . IF IND(%A1.)CH = %AT1-1. THEN GOTO %AT1.It is quite possible for none of the above tests to hold.
GO TO StatementPurpose
Branch statement.
General Form
GO TO identifier
Structure Representation
GO WITHS TO NL
Example
GO TO BSNEXT
Restrictions
The identifier is a program label or a label in the MD-logic.
A GO TO statement never goes into a subroutine from
outside (it may, however, go out of a subroutine to a label outside and
it may go into the linkroutine from outside and vice versa).
Action
Branch to designated label.
SCALE StatementPurpose
Multiplies a variable by a constant.
General Forms
SCALEV-NMBYC-NMSCALEV-NMBYC-NMGIVINGV-NM
Structure Representation
SCALE BY OPT GIVING N1 OR N1 NL ALL
Example
SCALE MEVAL BY OF(LNM) GIVING ARGNO
Restrictions
%A2.is a small positive constant (not greater than2*LPT).
Action
Multiply%A1.by%A2., and assign the result to%A3.if%A3.exists; otherwise to%A1.
Notes
The constant will always be a call of theOFmacro (see TheOFand length-defining macros) and it may often have value 1, in which case the replacement text for form a) could be null.
SET StatementPurpose
Assigns a value to a number or a pointer.
General Form
SETVI-NMPT<<,V-NMPT*?>> =arithmetic expressionSETVI-NM=VI-SW
Structure Representation
SET N1 OPT , N1 OR = ALL NL
Examples
SET IND(SPT)NM, ARGLEN = IND(IDPT)NM+TEMP-6SET SKVAL = - SKVAL + 1SET TYPE = INSWSET IDLEN = IDLEN + IDPT - SPT
Restrictions
All the quantities to the left of the equals sign are of the same type. If any quantity occurs on both the left and the right of the equals sign then it must be the first quantity on the right, as illustrated by examples c) and d) above.
Action
Evaluate the arithmetic expression and assign its value to the variables and/or indirect address on the left of the equals sign.
SETSW StatementPurpose
Assigns a value to a switch.
General Forms
SETSWVI-SW<<,V-SW*?>> =VIC-SWSETSWVI-SW=VI-SW(&)VC-SW
(|)
Structure Representation
SETSW N1 OPT , N1 OR = ALL OPT | N2 OR & N2 OR N2 NL ALL
Examples
SETSW IND(TEMPT)SW, PARSW = 1 SETSW COPDSW = IND(INFOPT)SW & 1
Restrictions
If form b),%A1.cannot be the same as%A3.(e.g. the form:ASW = arg & ASWcannot occur).
Action
Evaluate the expression to the right of the equals sign (|means `inclusive or' and&means `and') and assign its value to the variable(s) and/or indirect address on the left.
STACK StatementPurpose
Stacks individual values.
General Form
STACK <<value(type) *>> ON (FSTACK)
(BSTACK)
where value is one of the following forms:
VCI-SWin which case type isSW.- arithmetic expression in which case type is
NMorPT.
Structure Representation
STACK N1 OPT ( ) N1 OR ON ALL NL
Example
STACK PARSW(SW) 3(NM) TRUE(SW) IDPT-1(PT) ON FSTACK
Restrictions
In theFSTACKcase, if the forwards stack pointer (FFPT) occurs within a value it occurs within the first value. In theBSTACKcase the backwards stack pointer (LFPT) is never used (these two restrictions make optimisation of multiple stacking operations possible, though care must be taken not to bumpFFPTuntil after the first value has been calculated).
Action
- If the last argument is
FSTACK, the following operation should be performed for each value and type:
- If type is
NMorPTthenSET IND(FFPT)type = value- If type is
SWthenSETSW IND(FFPT)SW = valuefollowed in each case by
CALL BUMPFF (N )NMwhere
Nis the value ofOF(L type)-- see TheOFand length-defining macros). Thus, for example:STACK IDPT - 1(PT) ON FSTACKis equivalent to:
SET IND(FFPT)PT = IDPT - 1 CALL BUMPFF(OF(LPT))NM- In the
BSTACKcase, the following operations should be performed for each value and type:CALL DECLF ( N )NMwhere N is the value of
OF(L type), followed by either:
SET IND(LFPT)type=valueorSETSW IND(LFPT)SW =valuedepending on type.
Notes
- In most L-maps it should be possible to optimise multiple stacking operations by bumping the stack pointer only once for the whole set of value(s) to be stacked. However the
STACKstatement occurs relatively infrequently so there is no need to worry too much about optimisation.- The values must be stacked in the order specified.
TEST StatementPurpose
Multi-way branch statement.
General Form
TEST V-NMSW GOING program label <<, program label *>>
Structure Representation
TEST TYPE GOING N1 OPT , N1 OR NL ALL
Example
TEST TYPE GOING SKIP, INS, WARN
Restrictions
The value of%A1.lies between 0 and%T1-2.(inclusive).
Action
The action is equivalent to:
IF %A1. = 0 THEN GO TO %A2.
IF %A1. = 1 THEN GO TO %A3.
. . .
IF %A1. = %T1-2. THEN GO TO %AT1.
UNSTACK StatementPurpose
Unstacks individual variables.
General Form
UNSTACK << V-NMPT ( (NM) ) *>> FROM BSTACK
(PT)
Structure Representation
UNSTACK N1 OPT ( ) N1 OR FROM WITHS BSTACK WITHS NL ALL
Example
UNSTACK DELPT(PT) MTCHPT(PT) MCHLIN(NM) FROM BSTACK
Restrictions
The parenthesised letters indicate the type of the preceding variable. The backwards stack pointer (LFPT) never occurs as an argument toUNSTACK.
Action
For each variable, starting with the first and ending with the last, the
action should be:
SET variable = IND (LFPT) (NM)
(PT)
SET LFPT = LFPT + OF( (LNM) )
(LPT)
The VARS SECTION contains all the declarative statements in
L. Every variable is declared exactly once. All variables have
global scope.
A glance at the VARS SECTION will show that some groups of
declarations are organised into BLOCKs. This means that all the
variables should be allocated contiguous storage (although any or all of
them can be