%\font\em=cmmi9 %\font\bf=cmbx9 \def\LEQU{=\kern-2pt=} @* Guide to Algol-like C. This is a style guide for writing programs in Algol-like C. Algol-like C (hereafter referred to as AC) is based on the syntax of the Algol languages (Algol~60 and Algol~68). AC is translated into C. Why AC? AC was developed because Algol programs are more readable than C programs. A compiler was not developed for this language because there are plenty of good C compilers and AC is not that different from C. Several things contribute to C being difficult to read. (Even experienced hackers occasionally have lapses and misinterpret code.) One is the use of lots of short symbols. Curly braces, single and double operators, = and ==, are used to make the language succinct, but at the cost of readability. As in LISP, finding a matching brace or parenthesis that is lines or pages away is difficult. Indenting helps. Keywords look just like identifiers. Two of the symbols that are frequently confused are {\tt =} and {\tt ==}. The former is assignment and the latter equality checking. The argument for using these was put forth by Kernighan and Ritchie: ``Since assignment is about twice as frequent as equality testing in typical C programs, it's appropriate that the operator be half as long.'' However, there is a 432~year precedent of using the symbol {\tt =} for equality. The original rationale for using {\tt =} for equality was given by Robert Recorde in~1557. The explanation was {\it ``... to auoide the tediouse repetition of these woordes: is equalle to: I will sette as I doe often in woorke vse, a paire of paralleles, or Gemowe lines of one lengthe, thus: \LEQU, bicause noe .2.\ thynges, can be moare equalle.''} Granted, spelling is not the same, but = has served as equals in Mathematics for centuries. Fortran changed the meaning of {\tt =} to assignment and equality was spelled {\tt .EQ.}. Algol~60 reverted to {\tt =} for equality and used {\tt :=} for assignment. This convention was followed by some of its successors: Algol~68, Pascal, and Ada. PL/I used {\tt =} for both assignment and equality so that one could write confusing statements like \hbox{\tt a = b = c;} (compare {\tt b} with {\tt c} and assign the result to {\tt a}). Is it really worth using a notation that is contrary to that of the more universally used mathematics, and thus guaranteed to confuse novices and catch pros off-guard in order to type {\tt =} instead of {\tt :=}? I think adding a few characters to improve readability and understandability is worth the time and effort. There are over four centuries of mathematics using {\tt =} for equals. Using {\tt =} to mean assignment is begging for confusing code. Assignment is a relatively new concept and a new notation is warranted. {\tt :=} has enough precedent to serve this purpose. Consequently, I wrote a pre-processor for C called ac68 that uses {\tt :=} for assignment, = for equality, and has all the cumulative operators in the style of Algol 68: {\tt +:=}, {\tt -:=}, {\tt *:=}, {\tt \&:=}, {\tt <<:=}, etc. Unfortunately, the C and dbx messages refer to the operators that get generated. Still, the code is a lot more readable. Combined with Spidery WEB, the output is {\it very} readable. The following sections illustrate how to use the language constructs in AC. @c @@/ @@/ @@/ @@/ @ Procedures and functions. Procedures and functions begin with the word |PROC|, followed by the type of the function, the name of the function and then the list of arguments. Procedures return the |VOID| type. An example is the procedure |main|, which is required for all programs, shown below. Note that the C denotation for a pointer is replaced by the word |REF|, which is short for {\bf REFERENCE}. (Note that there is a delicate balance between making programs read as English text and using cryptic symbols. Abbreviations such as |PROC| for {\bf PROCEDURE} and |REF| for {\bf REFERENCE} lie between being cryptic and being English.) |STRING| is used instead of C's {\tt char *} to denote a string. |BEGIN| and |END| are used to bracket the procedure text. @= PROC VOID main(argv, argc, env) STRING REF argv; INT argc; STRING REF env; BEGIN END @ Conditional statements. The conditional statements are |IF| and |SWITCH|, shown below. @= @@/ @@/ @ The |IF| statement has three parts --- a condition (following the |IF|), a then-part (following the |THEN|), and an optional else-part (following the |ELSE|). The |IF| statement ends with the |FI| keyword. Statements are {\it separated} by semicolons. An arbitrary number of statements may occur in the |THEN| and |ELSE| parts, thus it is not necessary to add braces when multiple statements are used. (In essence, they are {\it always} there by virtue of these keywords, but there is no performance penalty introduced by having them there.) The |FI| keyword shows the end of the |IF| statement and helps when nested-|IF| statements are used. @= IF condition THEN statement1; statement2 ELSE statement3; statement4 FI IF condition THEN statement1; statement2 FI IF condition1 THEN statements1 ELSE IF condition2 THEN statements2 ELSE IF condition3 THEN statements3 ELSE statements4 FI FI FI @ There is an alternate way of writing nested-|IF| statements. As can be seen above, the indenting of nested-|IF|s can quickly get out of hand. An alternate way of writing this is with the |ELIF|-|THEN| keywords shown below. Note that only a single |FI| is needed to end the |ELIF|. @c IF condition1 THEN statements1 ELIF condition2 THEN statements2 ELIF condition3 THEN statements3 ELSE statements4 FI @ The |SWITCH| statement is the other type of conditional statement. The |IF| statement evaluates nested alternatives sequentially, the |SWITCH| statement evaluates multiple alternatives concurrently. In the example below the |SWITCH| evaluates |expression|, then checks whether it is equal to |a| or |b|. If it is equal to one of these values, the corresponding statements are executed. If not, the |DEFAULT| statements are executed. @= SWITCH expression IN CASE a: statements_a BREAK CASE b: statements_b BREAK DEFAULT: statements_d BREAK NI @ There are two types of looping statements --- |FOR| and |WHILE|. @= @@/ @@/ @ |FOR| loops have four parts --- a part that is executed at the beginning of the loop, a part to check for continuation conditions, a part to specify code to be executed at the end of every iteration, and the loop body, which is executed during every iteration of the loop while the continuation condition is |TRUE|. Each of the first three parts may contain multiple expressions separated by commas and may include assignment operators. The loop body may contain an arbitrary number of statements separated by semicolons. @= FOR a := initial_value, b := init_b AS (a < terminal_value) AND expr_t EXEC a INCR DO statement1; statement2 OD @ |WHILE| loops have two parts --- a conditional part that is evaluated before every iteration and a loop body that is executed during every iteration that the conditional part is |TRUE|. The loop body may contain an arbitrary number of statements separated by semicolons. The remainder of the loop body may be skipped by using the |CONTINUE| statement. The loop may be terminated by using the |BREAK| statement. @= WHILE condition1 DO statement1; IF cond2 THEN CONTINUE ELIF cond3 THEN BREAK FI statement2 OD @ This section describes the data types. With the exception of |BOOL| and |STRING|, the data types shown below are as in C. The data type names are in upper case to distinguish them from the variables, which are typically written in lower case. Structure declarations that are part of type definitions require two names --- one for the name of the structure and one for the name of the type. A useful style for declaring these is use the same name for both of these, but change the capitalization. It is encouraged to use understandable names, which often are made up of multiple words. These words can be separated by capitalizing the first letter of each word, as in |AMultiWordIdentifier|. This can be combined with underscores as in |A_Multi_Word_Identifier|. (I find the latter more readable.) The structure name is rarely used, so I capitalize the {\it last} letter of each word, as in |A_multI_worD_identifieR|. I capitalize the first letter of each word in field names in structures and capitalize every letter in new types, as in |NEW_TYPE|. @= BEGIN INT i; /* integer */ BOOL b; /* Boolean */ CHAR c; /* character */ CHAR REF c, REF REF d; /* a pointer to characters and a pointer to pointers to characters */ STRING s := "a string of characters"; FLOAT f; /* single-precision floating point */ DOUBLE d; /* double-precision floating point */ STRING foo := CONT CONT INCR argv; LONG INT li; /* long integer */ CONST INT ci; /* a constant integer */ enum hue {red, green, blue }; /* an enumerated type */ VOLATILE LONG INT clock; /* a volatile long integer */ AUTO INT aa; /* an automatic integer */ STATIC INT st_int; /* a static integer */ REGISTER DOUBLE dr; /* a double-precision floating register */ ENUM color {red, green, blue}; /* an enumerated type */ EXTERN INT ei; /* an external integer */ SHORT s; /* a short integer */ SIGNED si; /* a signed integer */ UNSIGNED usi; /* an unsigned integer */ TYPEDEF STRUCT strucT_namE {INT Integer_1, Another_Integer; STRING The_Name; STRUCT REF strucT_namE;} Struct_Name, REF Struct_Ptr; UNION {INT Look; CHAR For_The;} Union_Label; END @ There are a few commonly used data values that have names. |TRUE|, |FALSE|, |NORMAL|, |ERROR|, |UNDEFINED|, and |NULL|.