Co is an interpreted language with dynamic typing which supports prototyping and in which coroutines are first-class objects. It has basic branching and looping constructs, and a minimal set of built in functions for I/O, math, and string manipulation.
print is a built in function which takes any number of expressions as arguments, evaluates them, and prints them to stdout with a space between each one.
In this example, we can see string literal, and arithmetic expressions.
The keyword var introduces a declaration
statement, which adds a named value to the current scope. In this
example the current scope is the global scope
(more about scopes
later). On the right hand side of the declaration is an expression,
which is evaluated when the declaration is executed, and whose result
ends up as the value of the variable.
You can also write multiple declarations, just as you can in C:
Co variables don't obviously have a type the way they do in a language like C-- you declare everything with var, rather than using int for integers and float for floating point numbers. But internally, Co has the following six types:
Opaque pointerfor passing back to built-in functions. For example, if you use the built in function open to open a file, it returns a variable of the handle type which you pass back to subsequent calls to read. Otherwise there would be no way of telling read which out of potentially many open files you wanted to read from. You can do very little with variables of the handle type except pass them back into the built in functions that are expecting them.
objects. This is what many languages (Python, for example) do. Generally this seems to make life easier for the Python programmer at the expense of the Python extender, which is probably the right way around.
Assignment changes the value of a variable that's already been declared. If you assign to a variable that hasn't been declared, it gets automatically declared for you. Note also that you can declare variables without initializer expressions, in which case they are initialized to the null value:
Autodeclarations are handy, especially for global variables in short scripts, but be careful-- remember that an assignment statement looks for the variable in ancestor scopes, and the variable is only autodeclared if it wasn't found in an ancestor scope. This is best illustrated by an example.
You can also assign to member expressions, which we'll explain later.
An integer literal is a string of digits 0-9 representing a decimal number. You can't do hex or octal at present.
A real literal is a string of digits 0-9 followed by a dot followed by some more digits. You can't do standard form at present, and you must have all the digits (i.e. you have to write 0.0, not .0 or 0.).
A string literal is some ascii characters in single or double quotes. You can use \t and \n for tabs and newlines. If you want double quotes in the string, use single quotes to delimit it (and vice versa).
You can't do object literals at the moment, although those would be good to add one day. JavaScript does them in a nice way.
if works by determining the truth
of
the condition you give it. This can be any expression, and there are
rules for what values of each of the six types count as true
:
Boolean expressions (ones with == or != in them) evaluate to integers, with the
value 1 for true
, and 0 for false
.
There's also a unary negation operator, !, just like C.
takenblock, the condition is evaluated again and the block repeated if the condition is still true. You cannot have an else clause after a while.
You can use the break statement within a loop to break out of the innermost loop.
So far we've seen how to write top-level
code and we could write
little scripts similar to bc scripts with no
functions.
Some languages (BBC Basic for example) make a distinction between
subroutines
(also called procedures
) and functions
.
The technical difference is that subroutines can have local variables, and
you can think of a function as a special case of a subroutine that has no
locals. In languages like C, using this terminology, what we call
functions
are actually in many cases subroutines.
Thinking more about what the words function
and procedure
mean in other contexts, functions are often things that we define
implicitly-- for example by declaring that the value of a function
f(x) is the value such that some condition(s) are true of
the input and output. Many functions that can be easily defined in this way
are extremely difficult to evaluate on a computer or at all.
Procedures on the other hand are more explicit and more imperative, taking
the form of instructions to machines to move and combine data. The fact
that some classes of function can be evaluated with procedures probably has
something to do with the Church-Turing thesis
, and is fundamental to
computing.
Much programming involves a mixture of these two things: defining
procedures that evaluate functions explicitly; and defining functions
implicitly in terms of each other (and sometimes in terms of themselves).
The balance between these two kinds of definition is different in different
programs and in different programming languages. Procedure definitions are
generally more likely to involve local variables than implicit
definitions, which is where the idea that a subroutine without locals
should be called a function
seems to come from.
As soon as a subroutine contains local variables, it has its own
scope, its own place to store and retrieve intermediate results.
The code in the body of the subroutine can take the form of commands to a
machine to manipulate the contents of this data area, as part of the
process of computing the value of the function. It's natural to call a
function that's defined this way a procedure
.
program specification. No known machine can execute those directly.
In languages like C, the local variables of a function exist on a stack.
When the function is called, an area of memory (called a stack
frame
) is allocated on this stack for the local variables, and the
variables are constructed
using the initializer expressions in the
function definition. When the function returns, the stack frame is
destroyed. If the function is ever called again, it's with a new stack
frame, in which all the locals are constructed again.
One can think of the stack frame as an object
in the sense that it's
an area of storage containing named variables. The lifetime of stack frame
objects
cannot be controlled independently by a C programmer-- it's
something that happens automatically as part of the mechanism of function
calls.
If a C programmer does need objects that persist across function calls, he
usually defines structures, and constructs his own structure instances in
heap memory. Objects of this kind have evolved into the objects of
object-oriented
programming.
The fact that in many languages (Python and C++, for example), constructors
have function-call syntax
reflects this. The difference between a
function call and a constructor call is that a function call constructs an
automatically allocated and de-allocated stack frame object, and a
constructor call constructs an object that persists for as long as it is
needed.
on the stack.
In Co, functions and objects are indistinguishable. This is best illustrated by example:
This program prints out 4 5
, not 4 6
as you might
expect. The definition statement defines an object f. The object is constructed by the first call to
it, which initializes its variable y with the
value 2. This value remains 2 the next time the function is called. The
variable y behaves rather like a static variable in C, but with an important
difference:
prints out 4 6
. In this example, each call
is actually a call
to a newly constructed clone
of the object f. Each clone has its own storage area, and its own
copy of the variable y. This is unlike static
objects in C, of which there is only one instance per process.
The object in the last example looks rather like a C function and can be used more or less like one. But consider this object:
In this example the Rectangle object is being used like a class, or, more strictly speaking, as a prototype.
There are a couple more language features of Co this example introduces which should be explained:
methods
Every object in Co has an associated scope which is basically a
dictionary of named variables. Clone an object, and you clone two things:
its scope, and the code in it (which you might call the body
of the
object). Any of the variables inside an object may have the object type, with their own scopes. All this is
cloned recursively.
If you assign something to a member that isn't in the object's own scope, then the object is extended with the new member. Ancestor scopes are never updated as a result of assignments to member expressions.
Using this feature, any Co object can be used as a dictionary of name-value pairs.
Many objects contain only a handful of members, but objects used as dictionaries like this may contain many thousands. The Co interpreter builds an efficient index to make lookup and insertion into such large objects a relatively efficient operation.
The parameters of an object become variables in its scope when it is called
(or constructed
-- they're the same thing in Co). This means we could
have written our Rectangle example more succinctly like this:
In other words, the actual parameters passed in width and height persist inside the object along with everything else.
object-orientedlanguages one often finds oneself writing constructors that simply copy their parameters into
thisor
self. This workaround reflects the fact that such languages are stack-oriented with objects
bolted on-- in effect, when you write self.width = width you are manually copying data from the stack object to the heap object.
Objects can be cloned at any time in their lifecycle. In this example, we cloned the Rectangle object before we had constructed it. In fact, we never constructed it, we only constructed the clones.
Definition statements are like declarations with initializers, in the sense that they are executed once. When a definition statement is executed, it creates a new object variable referred to by name in the scope in which it is executed. In this way, the definition of computeArea is no different from the initialization of a variable computeArea of object type, in which the initializer expression would correspond to something like a Lisp lambda expression or a JavaScript object literal.
If you clone before constructing, you get clones of the declaration and
definition statements ready to be executed. If you clone afterwards, you
get clones of the variables and their names. As we will see in the next
section, you can also clone an object half-way
through construction,
and this presents no problems.
Objects in Co can be used to implement subroutines (or procedures), as we
have seen; but in fact they belong to a more general class called
coroutines
(or generators).
This example of a generator prints out the series of numbers 0 to 3:
yield is just like return, except for one important difference: each time f is called, execution continues from after the last yield statement, with the state of all local variables the same as they were.
In this example, we have a yield statement last
thing in the object's body, and we yield the null
value to signify termination. This is a common technique in Co. When a
generator has yielded for the last time
in Co, it starts again from
the beginning (although initializations are not executed again, of course).
A slightly quirky feature of Co is that the yield value of a function is a variable in its scope just like any other, and the yield statement does two things: it sets the value of that variable, and it relinquishes control. Control is relinquished at the end of the body anyway, so the yield value, if not set at that point, retains the value it was last set to.
You can access the yield value by name if you like:
Note that you have to use the [] syntax for the
member expression (as shown in this example), rather than the dot
syntax as that would confuse the parser (since the unquoted word yield is a reserved word).
This example prints out the sequence 0 1 2 3 3 0 1:
What's happening here is that after the while loop in the body of f has finished, there is an implicit yield with the previous yield value at the body's closing brace. Note too that the declaration of i has been changed to an assignment (with implied automatic declaration). This way, when the generator starts again, the value of i becomes 0 again. Had we left that line as var i = 0;, the generator would have continued to yield the value 3 indefinitely for all future calls (since in this case the loop's entry condition would not have become true again).
Note that return is really just the same as yield, except for the one fact that when you call
an object again after it returned, it starts again from the beginning,
instead of resuming from the statement following yield. The return
value is still put into
the object's scope under the name yield
just as if you had used
yield.
You can use either yield or return on their own, without an expression. In this case the null value is yielded.
As mentioned briefly earlier, you can clone an object at any point; so you can call a generator a few times, and then clone it, and then start calling the clone. The clone will continue where the original object left off. You can call the original object again later, and it will carry on from where it was the last time it was called. This is somewhat similar behaviour to the fork system call in UNIX/POSIX C programming.
In fact, coroutines will be familiar to C programmers in the form of
threads
and processes
, although these are a special kind of
coroutine called asynchronous
coroutines.
One of the motivations behind the kind of generator-oriented
programming that Co provides is to make writing programs for multiple CPU
systems more natural. For example, in conventional languages it's common to
write a function that assembles a list of results of some kind. The list is
then passed to another function which does the next operation on each
element. The two functions could easily be designed to run in parallel
since they effectively form a pipeline
.
The current implementation of the Co interpreter is single-threaded, but it would be straightforward for objects to execute in parallel routinely, provided they were synchronized at yield points. This would make turning a serial program into a parallel program a simple switch in the interpreter, without requiring enormous redesign and refactoring of the program.
We've seen that the parameters you call an object with just become variables in that object's scope. If you call the same object again with new arguments, the values of those same variables are overwritten with the new ones.
If you call an object with fewer arguments than it has formal parameters, the null value is passed in all remaining parameters.
It's possible to capture
the values of the parameters to the
first call to an object, using the following simple trick:
All the subsequent calls to primes(); in this example pass no parameters, which means in fact that they set the max_ parameter in the called object's scope to the null value. But the body of primes isn't using max_, it's using max, which is initialized to the value that max_ had the first time the object was called (recall how var declarations work).
Many widely used languages provide some support for functions and objects, and it's common to simulate generators with objects-- you give the object a method called something like next() or step(), and call that repeatedly.
Of course generators aren't the only thing you need objects for: in stack-based languages you need to construct objects of some description any time you require state that persists across returning from function calls; and this is the style of programming we've all become used to. It works well in fact, because, due to the naturally recursive nature of many programs, the procedure instances required by many programs fit naturally into a stack pattern.
Some languages support generators too-- Python, for example, although
interestingly there, although the generator you define looks quite a lot
like a Co generator, calling it results in the construction of a generator
object with a next() method. This is a
clever way to introduce generators into what is basically a fairly
normal
call-stack and class based language.
The unusual thing about Co is that you don't necessarily make the decision about whether a definition is of a generator, function or constructor at the time you define it, but at the time you use it. JavaScript is somewhat similar in this regard, in that functions and constructors are the same. The difference is that in the body of a JavaScript function/constructor, you always have access to a persistent object (accessible from this), and a stack frame (which is just there in the normal way). Co has no stack, and always constructs a persistent object, which may of course be a clone. Co requires no this keyword.
Is this indistinguishability a good thing? Possibly not. I suspect that with most objects, how they have to be used is effectively established at definition time, in which case the language may as well reinforce this by providing separate keywords like class, function etc., as opposed to Co's general purpose define... But only time and the experience of writing a few Co programs will tell.
Co provides a fairly minimal set of built in functions intended to make it
possible to write real-world programs to see what it's like. Adding new
builtins is very easy, so one could use Co as a glue
language etc.
handle. If mode is not supplied,
ris assumed.
handle.
intrinsic array(more about that later) the substrings of s that result by splitting it at any of the characters in splitchars (which is a string). If no splitchars are given, split splits at whitespace. If null is passed in splitchars, the string is split at every character (rather like using / */ in perl).
In addition to these builtin functions, there are four builtin objects, stdin, stdout, stderr and argv.
Co regular expressions use the PCRE library. Regular expression syntax is pretty much like Perl or Python (as opposed to POSIX). match and compile both have an optional flags parameter, which is a string of single-character flags. The following flags are supported:
For more information, see man(3) pcreapi and the PCRE documentation.
If the first argument to print, or the second argument to write, starts with the % character, then it is interpreted as a format specifier that applies to all subsequent arguments of numeric and string types, although note that a precision, if specified, is ignored for arguments of the string type.
So, to print out a series of numbers with 3 decimal places of precision, you can use:
Outputs:
You can specify a width as well as, or instead of, a precision, by using, for example, %12.6 or %12 respectively.
Note that the \ character can be used to escape %, so use \% if you really need to print the % character as the first argument to print. \n and \t can also be used to print newlines and tabs (we've already used this in many of the examples).
Scopes in Co contain dictionaries of name/value pairs as we have seen. Every scope, including the global scope, also provides an array, accessible using the built in functions get, set, length, push, pop, shift, and head.
An object's intrinsic array can also be read and assigned to using [] expressions.
The program arguments (argv) are passed in the global array. So here is a sample implementation of echo in Co:
For convenience, the program args are also available as array members of the predefined global object argv.
If you try to read or write an array past its bounds, the interpreter aborts with an error message.
One thing we haven't mentioned yet is the parent keyword. Every scope contains a reference
to the object containing the object whose scope it is, named with the
special name parent
. This is necessary if, for example, you want to
write intrinsic array mutators:
Co provides an include statement that works just like C's #include directive. That is to say, it's as though you'd pasted in the other file at the point you include it.
The syntax is pretty straightforward:
You can provide any path, absolute or relative. If the file is not found at first, the interpreter tries prefixing the path with a list of paths defined by the enviroment variable COPATH, which is either a single directory, or a : separated list.
Why the bulldozer, and not a more intelligent import statement? To keep things simple. Co provides general-purpose objects that can be nested, and this should be adequate for maintaing modularity without the need for another system of namespaces etc. on top of it.
Code inside objects is only executed when the objects are called. This is another property that should make it possible to get away with nothing more fancy than include-- the Co programmer should have sufficient control over any of the potential side-effects of including.
It's probably good advice to avoid putting any code in the global scope other than definition statements in Co files intended to be used as re-usable modules.
Inheritance
inheritanceof class-based languages:
The way to think about this is to ask, when we clone the object Shape, what exactly do we get? This depends on when we clone it.
When we execute the code in an object (by calling it), any declaration or definition statements in there will add named values to the object's scope. Those will all get cloned when the object is cloned. The code in the object, including where to continue from, is also cloned.
In this example, we deliberately construct Shape before cloning it to make our Rectangle and two Circles, but we don't call either of its methods Rectangle or Circle until after cloning.
So, when we clone the Shape prototype we get an object containing a variable area with the null value, and a block of code containing three definition statements.
Then, in the case of rectangle, we execute only one of those definitions, to add to that clone the Rectangle object. After executing the definition, the rectangle object contains variables width and height, and the definition of computeArea, and also has access to the variables area and printArea since they are in its parent scope.
For circle and circle2, we simply instantiate the alternative specialization in just the same way, and use that.
It's interesting to compare this with the way languages like C++ implement inheritance, which they basically do with composition. In other words, in C++, Rectangle objects would contain Shape objects. Any method not found in the Rectangle object would get looked for in the Shape object inside it. But in Co, the Rectangle is inside the Shape, and looks upwards, using exactly the same mechanism that's used to find variables in wider scopes.
copiescode just by incrementing reference-counts, since all code in Co is read-only, there's no need to copy it. This keeps cloning fairly efficient.
The word recursion
in the context of programming brings to mind two
things:
These are two different angles on the same thing really (see thoughts above about functions and procedures).
Attention is often drawn to tail recursion
, which is a special case
of recursion. You can spot a tail-recursive procedure because the recursive
call is last thing in the body of the procedure.
It is well-known that tail recursion is equivalent to iteration, and much
is made of languages like Scheme that can implement tail recursion as
efficiently as iteration. So too does the Logo
interpreter on these pages. Some C compilers are able to achieve
this optimization for leaf
functions I believe as well.
The reason C compilers generally have a problem implementing tail recursion efficiently is that every time you make a function call in C, you construct a new stack object, even if you don't really need one.
Illuminatingly, in Co, an object that calls itself just is a loop-- remember, objects are generators, which means you carry on from where you left off the last time. When you get to the end you go round again. If you need full recursion, then you don't call yourself, but a clone of yourself. This is just like pushing a new stack frame in C.
Freeform
style where all whitespace is treated the same. Semicolons
between statements, and braces around blocks. Any expression may be used as
a statement (just like C), but note that assignment is a statement, not an
expression (so no embedded
assignments). None of the the pre- and
post- increment and decrement operators, but you get += etc.