Myrddin Tutorial
Myrddin is a simple modern programming language. It allows you to write clear, terse, and readable code with a powerful but comprehensible type system. The compiler infers types globally, checking your code without getting in your way. It is currently available on Linux, OSX, FreeBSD, OpenBSD, and Plan 9.
This tutorial will get a new user up to speed with Myrddin quickly. This tutorial comes in three parts. The first will discuss key concepts via several example programs, the second will cover parts of the language in more detail, and the third will give an idea of what libraries exist and how to use them. For deeper coverage, look at the language specification and the library reference manual.
We assume that you are already familiar with programming, and have installed Myrddin on your machine already, following the instructions on the Environment Setup page.
A Simple Program
A program begins running at the first line of the function named main
, and
proceeds line by line, executing statements one after the other. Each
statement is ended by a newline or semicolon.
Here, the first line of main invokes std.put
. This function does formatted
output. We pass it the string hello world
, and it dutifully prints out
hello world
The put function can also handle more complex formatting. The first argument
to std.put
can contain format specifers ({}
). These will be substituted
with the corresponding argument in the parameter list. Myrddin passes type
information to the format function, and tries to produce a reasonable output
for all arguments.
For example,
std.put("{} + {} = {}\n", 2, 2, 5)
would output the string 2 + 2 = 5
. Additional parameters for specifying
the formatting can be passed between the {
and }
. These vary by type,
and are fully documented in the library documentation.
The std.put
function comes from the std
library, loaded via use std
on
the first line of the program. Use statements will import a library, allowing
the program to access all of the functions and variables that the library
provides.
In order to compile this program, save it into a file with the extension
.myr
. A good name for this program is hello.myr
. Then, build it with
mbld
:
mbld -b hello hello.myr
./hello
There are other ways to invoke mbld, which will be covered later in this tutorial.
Another small program
This program computes factorials.
As before, it can be compiled and run with the following command:
mbld -b factorial factorial.myr
Expressions are similar to other common programming languages, such as C, Java, or Python. A full table of operators will be in the second half of this document.
Declarations begin with the keyword var
, const
, or generic
, followed by
a list of variable names, optionally with types and initializers. Variable
names are composed of the characters 'a-z', 'A-Z', '0-9', and '_'. The first
character of the variable must not be a digit.
If we want to provide a type for the variable, then the variable name can be followed by a ':', and then the type we want to declare. Providing the type explicitly is optional, because the compiler can usually infer the type on its own.
Functions in Myrddin follow the pattern outlined above, with no special syntax
for declarations. Instead, we simply declare a const
, and assign it a
function literal expression. Function literal expressions are chunks of code
with arguments and a body, and generally follow this form:
{arg, list
function
body
}
The argument list consists of a list of argument names. Like declarations,
types can be added with :type
, but are usually not needed. Like statements,
the argument list is terminated with a line ending.
Functions are called using the function call operator, ()
. The types and
arguments of the function must match the declared or inferred type of the
function arguments.
In our factorial program, the variable x
is given the type int64
. This
means that when we call factorial(n)
, the compiler realizes that the
factorial
function must return an int64. Because the factorial function
returns the variable acc
, this means that it must also have the type
int64
. Thus, the type of acc
is fixed, in spite of the lack of explicit
type declaration. If we attempted to assign acc
anything other than int64
,
the compiler would reject the program.
For loops in Myrddin come in stepping form, and iterator form. The type of
loop used in the factorial
function is a stepping loop.
Stepping for loops will be familiar to anyone who has used C. This type of
loop has the form for init; test; incr; body ;;
. The init
expression is
executed before the loop is entered. The test
expression is run at the start
of each loop iteration, and the incr
expression is run at the end of every
loop iteration. The test
expression is a boolean expression, and the loop is
exited when it returns false.
Iterator loops have the form for pat : expr; body ;;
. These loops
operate on an iterable expression such as an array or a slice. Each time
that the loop runs, the next element in that iterable is stored into pat
,
and the body is run. This continues until all elements of the iterable are
exhausted. Pat is actually not simply a variable, and may be a pattern.
Patterns are covered later in this tutorial.
Myrddin also has other common control flow statements. If statements are written as you'd expect:
if cond
thing()
;;
As usual, the control construct is separated from the body of the if statement
using a line ending or semicolon. The condition is a boolean typed expression,
which, if true, will enter the body of the if statement. Otherwise, it will
skip over it. If statements can also be expanded with elif
and else
conditions.
if cond
thing()
elif othercond
otherthing()
elif moreconds
morethings()
else
fallback()
;;
While loops are also supported. These loops repeat as long as the condition
on the while
is true:
while cond
thing()
;;
The only other significant tool for controlling program flow are match statements. These are covered below.
Pattern Matching
This is a simple example demonstrating pattern matching.
This program will output "got 11"
. Each pattern in the match statement is
checked against the value in sequence, and the first one that matches has its
body executed. Here, 7 and 9 are not equal to 11, so their bodies are not
executed. However, a free name matches any value, so matching against n
succeeds. Additionally, the free name captures the value that it is being
matched against, meaning that in the expression std.put("got {}\n")
, the
variable n evaluates to 11.
This kind of matching can be applied to more than just integers. If x
was
assigned the tuple (11, 33)
, then in the code below, the pattern (11, n)
would match, and n
would hold the value 33
:
match x
...
| (11, n): std.put("got {}\n", n)
...
Pattern matches can descend into the structure of almost any type. Structures, arrays, strings, unions, and even values on the other end of pointers are fair game. Of these, matching on unions is likely to be the most common.
A union is a type that has two parts: A tag, and a body. The body is optional, but the tag is always present. We could define a union type as:
type u = union
`Bodyless
`Int int
`Pair (int, char)
;;
The word after the ` (backtick) is the tag. A union can only hold one of its variants at once. Unions are written out with the tag and body value, as in:
x = `Int 123
Once a value is in a union, the only way to extract is by applying a pattern match to it. The tag is matched on to decide which variant of the union to extract, and the body is matched using the usual rules. For example:
In order for a match statement to compile, it must be exhaustive. This means that there must be at least one case that will match any possible value. Additionally, each pattern must be useful. This means that a match must not be fully subsumed by earlier matches.
Patterns also show up in iterator style for loops. In this context, only a single pattern is allowed, on the loop variable. If a value does not match the pattern, the loop body is skipped.
This program will only print x = 1
and x = 3
, even though it is iterating
over 4 values. This is because the pattern (1, x)
only matches the values
(1,1)
and (1,3)
.
A Marginally Useful Program
This program behaves like the Unix wc
program. You'll have to run it on your
local machine -- it does input and output, and therefore will fail when run in
the playground.
This program is a state machine centered around a pattern match statement.
It operates by keeping track of whether it's currently inside a word or not,
and every time it flips into a word, it increments the number of words
using a ++
expression.
We start off by initializing all of our counters to zero, and creating a
buffered wrapper around the std.In
input stream. This buffered reader is
used to efficiently read and decode whole Unicode codepoints.
The main loop of the wc
program matches over the result of bio.getc
. The
std result type is generic, but for our purposes right now we can assume it is
defined as:
type std.result = union
`Err bio.err
`Ok char
;;
A value of `std.Err `bio.Eof
indicates that the reader has successfully reached
the end of the file. A value of `std.Err bio.err
indicates that the
reader has encountered an error reading the file. And a value of `std.Ok
char
indicates that a single character was successfully read from the file.
Refer to the API documentation for the full details of what the buffered I/O library provides.
The main loop first checks for the end of the file, exiting the loop and printing the accumulated statistics if one is encountered. Then, it checks for errors, bailing out of the program with a failure if one is encountered. In all other cases it matches on the character that was encountered to count up the lines, words, and characters.
There are four patterns that match on the bio.Ok
union tag. The first two
match on spaces and tabs.
| `std.Ok ' ': inword = false
| `std.Ok '\t': inword = false
These patterns simply set the inword
state variable to false. If we are in a
word, this records that we have left the word. Otherwise, the state is
unchanged.
The next pattern matches on bio.Ok \n
. Here, in addition to recording the
end of a word, the program increments the line count.
| `std.Ok '\n':
lines++
inword = false
And finally, the last case matches any character that was successfully read. Since this character is not a space character or newline, we define it to be a word character. If we are not currently in a word, then this must mark the start of a new word, so we increment the word count. Finally, the fact that the program is scanning along a word is recorded.
| `std.Ok c:
if !inword
words++
;;
inword = true
;;
The program then finishes the loop, incrementing the total number of characters in the program, and reads the next character, starting the cycle over again.
Stacks
Here's a program that defines a stack. For simplicity, the stack is statically sized, holding at most 100 elements.
User-defined types are created using the type
keyword. Type definitions
may define new types based on existing ones, and may optionally take
parameters. For example:
type flags = int32
type slice(@a) = @a[:]
The flags
type is a definition based off of the int32
type. This definition
is a distinct type, and requires an explicit cast to be converted to an int32.
The slice(@a)
type is parameterized, taking a single type parameter @a
.
When this type is used, the type parameter must be passed in. This substitutes
the type parameter on the right hand side, producing a new type.
In the stack example, the type stack
is generic. It gets specialized into
stack(int)
and stack(byte[:])
in the body of main
. The int
stack can
only contain ints, as verified by the compiler when type checking. Similarly,
the byte[:]
stack can only contain byte[:]
.
The functions stkpush
, stkpop
, and mkstk
are declared with the keyword
generic
. The generic
keyword indicates that they may contain type
parameters in their signatures. This means that when stkpush
is called with
a stack of fixstack(int)
, the type @a
is substituded with int
.
Similarly, when called with fixstack(byte[:])
, @a
is substituted with
byte[:]
. Note that @a
is substituted with the same type throughout the
context, so if we defined a max
function, we would not be able to mix
arguments:
generic max = {a : @t::numeric, b : @t::numeric
if a > b
-> a
else
-> b
;;
}
max(1, 2) /* ok, @t is replaced with int */
max('x', 'y') /* ok, @t is replaced with char */
max('x', 2) /* error: @t wants to be both int and char */
In the max
example, we also used traits to restrict the types passed to
max
, requiring them to be numeric. Traits are constraints on generic types,
requiring the type passed to have certain attributes. Numeric is a trait built
in to the language, and is defined for integer, floating point, and character
types. If a type has the numeric trait, it can be compared using relational
operators (<
, <=
, >
, >=
). It can also have the usual numeric operators
applied (+
, -
, *
, /
).
Turning Code into a Library
Often, code can be reused from multiple files. This example shows how to put
code into reusable libraries, available from a use
statement.
pkg stack =
type fixed(@a) = struct
top : std.size
data : @a[100]
;;
generic mk : (-> fixstack(@a))
generic push : (s : fixstack(@a)#, val : @a -> void)
generic pop : (s : fixstack(@a)#, val : @a -> void)
;;
generic push = {s, val
s.data[s.top++] = val
}
generic pop = {s, val
-> s.data[--s.top]
}
generic mk = {
-> [.top=0]
}
The library code is based on the stack example above, but repackaged so that
it can be used from multiple places. We removed the main
function, and added
a pkg
section to declare the exports. The pkg
section contains the data
type that we are providing, and the function prototypes to expose in order
to manipulate that type.
There were also a few stylistic changes. Because the fully qualified name
of the functions (stack.funcname
) must be used to refer to the library
exports, the stk
prefix is redundant. It has been removed, replacing, for
example, stkpush()
with push()
.
The package name is unrelated to the file name that we decide to save this code into, and as a general rule, packages consist of multiple files. However, this example is small enough that a single file suffices.
This library is built and installed with mbld. If the file that the code was
in was named stk.myr
, then we need to create a file named bld.proj
, in the
same directory as stk.myr
, containing the following:
lib stack =
stk.myr
;;
The lib
clause produces a library named stack
out of the files listed in
the package. In our case, there is only one file.
mbld
will build the library, and
mbld install
will install it to a place that use
statements in other code will be able
to find it. To use it, we might write a program similar to our previous one,
but using this library. For brevity, main is shortened:
use std
use stack
const main = {
var istk : stack.fixed(int)
istk = stack.mk()
stack.push(&istk, 123)
std.put("{}\n", stack.pop(&istk))
}
If mbld install
has been run, then the usual mbld -b main main.myr
would
produce a binary linked against the stack library that we just wrote.
Alternatively, main.myr
may also be built with a bld.proj
file. We can
put this into a bld.proj file in the same directory as main.myr
:
bin main =
main.myr
;;
There is one problem that separate bld.proj files and installed libraries does
not address. We may want to have the binaries and libraries shipped as part of
the same project, implying that we want to build them all together as a unit.
To do this, we could put the two build targets into the same bld.proj
, we
and add a dependency from main
to the stack
library, as below:
lib stack =
stk.myr
;;
bin main =
main.myr
lib stack
;;
Splitting code into multiple files is done in a similar way. Only two small changes need to be done. First, because the files are being compiled into the same unit, instead of dependent libraries, the use statements have to be changed to the quote form:
use std
use "stk"
const main = { ... }
Then, the bld.proj needs to be changed to put both files into a single unit:
bin stackdemo =
stk.myr
main.myr
;;
The distinction between quoted and unquoted use statements is how the
packages are looked up. An unquoted use looks for a fully compiled and
installed library with requested name. A quoted use looks for a single
.myr
file and imports the definitions from that. The quoted form is
used for dependencies within a single package, while the unquoted form
is used for dependencies between different packages.
There's a lot more to mbld, and the full documentation is available in the mbld tutorial.
Printing Roman Numerals
This program uses traits to decide how to stringify integers. Traits are a powerful mechanism for attaching behavior to types that can be overridden at compile time.
They add a lot of expressiveness, but the overloading that they imply can heavily hurt readability. As a result, they are best used sparingly, and with care.
This program begins by defining a trait stringable @a
. The stringable
trait requires implementations to provide a stringify
function with
a type (buf : std.strbuf#, v : @a -> void)
. This function will put a
string version of the value v
into the string buffer.
Next, a new type roman
is defined. It's an integer, but we attach a
trait to it that will cause stringify
to render it as a roman numeral.
The implementation follows.
Then, another trait is defined to stringify int32
values. The int32
impl just uses std.sbfmt()
to render the integer into the string buffer.
Finally, main
uses the string
function on the two types, demonstrating
that the roman numeral value indeed gets formatted as a roman numeral,
and the int32 gets formatted with boring old arabic numerals.
Traits are closely related to generics, however instead of substituting the type within the body of a function, the types are used to look up a type specific implementation when the program is compiled.
Command Line Arguments
This program implements the Unix echo
program. When run on the command
line, it will echo all of the arguments given to it.
Arguments given on the command line are passed to Myrddin programs as
the first argument to main. The type of the arguments is a byte[:][:]
.
The first element of this slice is the program name. The second element
onwards are the arguments passed to the program.
This program is the first program written where an additional type annotation
is needed. Because the operations on args
can be done on both a slice or
an array, type inference has too little information to disambiguate the two
cases. Therefore, the args
parameter to main
is annotated with a type.
By convention, options are flagged with a leading -
. Flags which take no
arguments can be grouped together, so that -a -b -c
is equivalent to -abc
.
Flags that do take arguments are insensitive to spaces in the argument list,
so that -o arg
is equivalent to -oarg
. And option processing is stopped
after the first --
seen in the input.
Following these rules yourself isn't difficult, but standard library provides code that handles these cases for you.
The example program above is incomplete: According to POSIX, /bin/echo
accepts a -n
option which suppresses the final newline. For the sake of
illustration, let's also extend it with a -p prefix
argument, which adds a
prefix to each value printed.
The std.optparse
function takes two arguments. The first is the argument
list to parse. The second is a pointer to an argument description structure.
In this program, this is written out as a struct literal.
The argument description structure is used for two purposes. The primary
purpose is for describing to std.optparse
what the command line should look
like. The second purpose is producing a useful help message for the user.
The optparse
function parses the command line into two data structures. The
first is a slice of (char, byte[:]) pairs that contains the options and their
values. The second is a slice of byte[:] that contains the non-option
arguments.
Once the options are parsed, the program loops over them and processes them, storing the prefix and recording whether to print newlines.
This program only exercises a small portion of the command line parser. The API reference covers the rest of the capabilities in detail.
Declarations in Detail
Declarations come in three flavors. There are constant declarations and
variable declarations. Constant declarations are indicated with const
.
Variable declarations are indicated with var
. Generic declarations are
indicated with generic
.
This keyword is followed by the variable name. The type follows, optionally. If the type is ommitted, then it will be inferred. Finally, the initializer follows. In the case of consts, the initializer is mandatory. Otherwise, it can be omitted.
Here's an example of a fully specified declaration:
var x : int = 123
The type can be omitted, and left up to the type inference:
const y = 123
And, if the declaration is a var, then the initializer can also be omitted:
var z
Multiple declarations can be placed after a single keyword. Each type and initializer is independent.
var w, x = 123, y : char = 'a', z = "string"
Vars are mutable at runtime. The compiler prevents using them before they are initialized. If the address of a variable is passed to a function, the analysis assumes that they are being passed as an out parameter, and will be initialized by this function.
var a
f(a) /* illegal: used before defined */
g(&a) /* ok: assumption that g initializes a */
Consts are are compile time constants, and are often placed in read only memory by the compiler. Consts must be initialized with an expression that is computable at compile time. Generics are closely related to constants, although their type may contain type variables.
Myrddin has no special syntax for declaring functions. Functions are simply declared by initializing a const or var with an anonymous function. For example, to declare a function that takes a single argument and returns it unmodified:
const id = {a
-> a
}
Because it is desirable to make mutual recursion convenient, functions may be declared in any order. But because there is no distinction between functions and variables, this means that variables may also be declared in any order. This leads to interesting effects, where it is possible to use a variable before it is declared.
const f = {
y = 123
-> y
var y
}
This is strongly discouraged, stylistically.
Literals in Detail
Many values in can be written out directly in code, as literals. Integers, characters, strings, arrays, structs, and slices are all examples.
Ints
Integer literals are usually written out as decimal numbers. Integers can
also be written out in hex, octal, or binary. These variants are specified
with the prefixes 0x
, 0o
or 0b
, respectively. For example:
123 /* decimal 123 */
0x123 /* hex 123 (291 decimal) */
0b101 /* binary 101 (5 decimal) */
Integer literals have a generic type. and can therefore be assigned to any
type with the integral
and numeric
traits. Integer suffixes can be used
to restrict the type. The integer suffixes 'b', 's', 'i', and 'l' respectively
indicate that the integer is a signed 8, 16, 32, or 64 bit integer. Adding a
u
suffix indicates that the integer is unsigned.
Floats
Floating point literals are written using decimal notation, separating the integer portion from the fractional portion with a period. Optionally, an exponent may be written using either an 'e' or an 'E'. For example:
0.5 /* 0.5 decimal */
1.0e2 /* 100.0 decimal */
Floating point literals have a generic type, and can be assigned to any other
type with the floating
and numeric
traits.
Characters
Characters are quoted using single quotes. They represent a single Unicode codepoint. Most characters can be written directly, but some are either syntactically significant, or would combine with the quotes. As a result, the following escape sequences are recognized:
\n | New line |
\r | Carriage return |
\b | Backspace |
\" | Double quote |
\' | Single quote |
\\ | Backslash |
\v | Vertical tab |
\0 | Null character |
\xDD | Hex byte. DD are two hex digits |
\u{codepoint} | Unicode codepoint |
The codepoint value for Unicode escapes is a hex encoded integer.
Strings
Strings are quoted using double quotes. They contain a byte slice, which is conventionally a UTF-8 encoded string. The language, however, enforces no such constraint on the contents of a string, and leaves the interpretation up to the libraries using it.
The escape codes allowed in strings are the same as those allowed in
characters. Unicode escapes (\u{codepoint}
) will be UTF-8 encoded. All other
escape codes, including hex escapes, will be inserted into the byte sequence
uninterpreted.
Arrays and Slices
Array literals are written as comma separated sequences of values enclosed in square brackets. Optionally, indexes can be given to the initialized values. If there are gaps in an indexed initializer sequence, then the missing values are zero initialized. For example:
/* packed 3 element array */
x = [1,2,3]
/* 74 element array, with x[0]==1, x[73] == 2 */
x = [0: 1, 73: 2]
There is no dedicated slice literal syntax in Myrddin, but slices can be taken off of array literals, giving a compact syntax that serves the purpose.
sl = [1,2,3][:]
Beware, array literals within functions are allocated on the stack, so the lifetime of a slice is the same as the lifetime of the array literal.
Structs
Struct literals are written as comma separated sequences of initializers
enclosed in square brackets. Initializers come in the form .membername =
value
. In order for the compiler to be able to tell apart a struct literal
and an array literal, at least one initializer is needed. For example:
type example = struct
a : int
b : int
;;
var x : example
x = [.a=123]
If a member of a struct is not initialized by the literal, it is zeroed.
Unions
Unions are constructed by prefixing a value of the appropriate type with the union tag. If the union has no value for the tag, then the tag stands on its own as a constructor. For example:
uval = `Tag2 123
uval = `Tag1
Operators In Detail
This is the full list of operators in the Myrddin language, and what they do.
Precedence 11:
- x.name
The member lookup operator. Looks up a value from within a structure or pointer to structure, and evaluates to that value. As a special case, it also lets you get the length of a slice or array using the
.len
member. Used as:- x++
The postincrement operator. This operator evaluates to the expression it is applied to and increments the value after the subexpression is evaluated. Multiple increments within the same expression are applied after the full expression is evaluated.
- x--
The postdecrement operator acts the same way as the postincrement operator, but with subtraction instead of addition.
- x#
The dereference operator loads a value through a pointer.
- x[e]
The index operator loads a value at an integer offset from an indexable type (an array or a slice). Pointers are not indexable.
- x[lo:hi]
The slice operator takes a view into another sliceable type. Slices may be taken off of arrays, other slices, or pointers. Taking slices off of pointers is essential for writing lower level code or binding with C, but it should be done with care, as there are no bounds checks.
When slicing an array or slice, the upper and lower bounds may be omitted. If the lower bound is omitted, it defaults to 0. If the upper bound is omitted, then it is replaced with the length of the value being sliced.
The lower bound is inclusive. The upper bound is exclusive. For example, if the array
a
contained[1,2,3,4]
, then the slicea[1:3]
would contain[2,3]
.- x(arg,list)
The function call operator calls a function with the given arguments. Arguments are evaluated before the call in left to right order.
Precedence 10:
- &x
The addres-of operator takes the address of any value, evaluating to a pointer to that value.
- !x
The logical negation operator works on a boolean value, inverting it. It's functionality is quite Orwellian: True becomes false, and false becomes true.
- ~x
The bitwise negation operator inverts every bit in its integer traited argument.
- -x
The unary minus operator negates its operand.
- +x
The unary plus operator does nothing. It's present for symmetry with the unary minus.
- `Name x
The union constructor operator creates a new union with tag Name wrapping the value x.
Precedence 9:
- x << y
The left shift operator shifts
x
left byy
bits. Shifting by more than the number of bits inx
can lead to implementation-defined results, because different CPUs handle large shifts differently.- x >> y
-
The left shift operator shifts
x
right byy
bits. Shifting by more than the number of bits inx
can lead to strange results.If
x
is an unsigned integer, then the top bits of the result will be filled with zeros. Otherwise, the result will be sign extended.
Precedence 8:
- x * y
The multiplication operator multiplies two values using the appropriate arithmetic for the type. Two's complement arithmetic is used for signed integers. Unsigned arithmetic is used for unsigned integers. IEEE 754 arithmetic is used for floating point values.
- x / y
The division operator divides two values. Like multiplication, appropriate arithmetic for the type is applied.
- x % y
- The modulo operator finds the value of x modulo y. Like multiplication, appropriate arithmetic for the type is applied.
Precedence 7:
- x + y
The addition operator adds two values using the appropriate kind of arithmetic.
- x - y
The subtraction operator subtracts two values using the appropriate kind of arithmetic.
Precedence 6:
- x & y
The bitwise and operator ANDs every bit in its integer traited arguments.
Precedence 5:
- x | y
The bitwise or operator ORs every bit in its integer traited arguments.
- x ^ y
The bitwise xor operator XORs every bit in its integer traited arguments.
Precedence 4:
- x == y
The equality operator checks if two operands are equal, evaluating to a boolean.
- x != y
The inequality operator checks if two operands are unequal, evaluating to a boolean.
- x > y
The greater-than operator checks if the numeric traited operands follow a greater-than relation, evaluating to a boolean.
- x >= y
The greater-than-or-equal operator checks if the numeric traited operands follow a greater-than-or-equal relation, evaluating to a boolean.
- x < x
The less-than operator checks if the numeric traited operands follow a less-than relation, evaluating to a boolean.
- x <= x
The less-than-or-equal operator checks if the numeric traited operands follow a less-than-or-equal relation, evaluating to a boolean.
Precedence 3:
- x && y
The logical and operator checks if both the left and right side of the operator evaluate to true. If the left side evaluates to false, then the right side is not evaluated.
Precedence 2:
- x || y
The logical or operator checks if one of the left and right side of the operator evaluate to true. If the left side evaluates to true, then the right side is not evaluated.
Precedence 1: Assignment Operators (Right associative)
- x = y
- Fused assign
- x += y
- Fused add/assign
- x -= y
- Fused sub/assign
- x *= y
- Fused mul/assign
- x /= y
- Fused div/assign
- x %= y
- Fused mod/assign
- x |= y
- Fused or/assign
- x ^= y
- Fused xor/assign
- x &= y
- Fused and/assign
- x <<= y
- Fused shl/assign
- x >>= y
- Fused shr/assign
Precedence 0:
- -> x
- Return expression
Types In Detail
Primitive Types
Myrddin has a number of types built in. All of them are below:
- void
- A void. This is both a type and a value. It occupies no space, and can only ever hold the value `void`. The reason that it is a value is so that generic functions do not need to treat void specially.
- bool
- boolean value, either `true` or `false`.
- byte
- 8 bit unsigned integer value. Similar to `uint8`, but typically used to denote plain data.
- int8, int16, int32, int64
- Signed N-bit two's complement integers.
- uint8, uint16, uint32, uint64
- unsigned N-bit integers.
- char
- Unicode codepoint
- flt32, flt64
- IEEE 754 floating point value
Constructed Types
You can create new types by with modifiers. The allowable modifiers are listed below:
- # (pointer)
- Creates a pointer to the underlying type.
- [:] (slice)
- Creates a slice of the underlying type
- [N] (array)
- Creates an array with N elements of the underlying type
Type Parameters and traits.
Type parameters are variables for types. They are substituted for concrete
types by the compiler as part of the compilation process. They are written
@t
, and may specify traits:
@a
@a::trait
@a::(trait,list)
Generic types can be substituted with any type, and therefore, cannot rely on any internal details of the type. You cannot access members of a generic type, call functions that expect a more specific type, or do much else with it.
Traits relax this limitation by adding constraints on the type. For example,
if the built in trait numeric
is required, then this signals to the compiler
that the numeric operators are available for this type, and they may be used.
Users may also define traits:
trait foo @a =
double : (x : @a -> @a)
;;
In this case, if I had a function that required a generic type with the
foo
trait, then I would be able to call the double
function on it:
generic f = {x : @a::foo
-> double(x)
}
Generic types are only allowed as part of a generic declaration, or within a paremeterized type definition. If they are used outside of this context, then the compiler will flag this as an error.
There are only a handful of built in traits. All, except for the iterable trait, cannot be implemented by user code in the current version of the language.
Trait | Summary | Implemented On | |
---|---|---|---|
numeric |
Supports common numeric operations:
+, -, *, /
|
byte, char, int, int8, int32, int64, uint8, uint32, uint64, flt32, flt64 | |
integral |
Supports integer operators:
++, --, |, &, ^
|
byte, char, int, int8, int32, int64, uint8, uint32, uint64, | |
floating | Behaves like a float. Adds no operators, but indicates that fractional values will be preserved. | flt32, flt64 | |
indexable | Supports the index operator. | @a[:], @a[N] | |
sliceable | Supports the slice operator | @a[:], @a[N], @a# | |
function | Is a callable function. | (func : arg, ument : list -> ret) | |
iterable | Can be iterated over. | @a[N], @a[:], user types |
The type for the iterable trait is:
trait iterable @iterator -> @iteratedvalue =
__iternext__ : (iterp : @iterator#, valp : val# -> bool)
__iterfin__ : (iterp : @iterator#, valp : val# -> void)
;;
The iternext function takes a pointer to an iterator, and a pointer to a value. If there are no values remaining, iternext should return false. If there are values remaining, iternext should return true.
When iternext is called, it should fill in the pointer to the value appropriately. It should also update the iterator state so that on the next call, it will return the next value.
The iterfin function should clean up any resources allocated by iternext.
For example, if implementing a byrange(lo, hi)
iterator, I might write:
type rangeiter = struct
idx : int
stop : int
;;
const byrange = {lo, hi
-> [.idx=lo, .stop=hi]
}
impl iterable rangeiter -> int =
__iternext__ = {iterp, valp
if iterp.idx == iterp.stop
-> false
;;
valp# = iterp.idx++
-> true
}
__iterfin__ = {iterp, valp
/* nothing to clean up */
}
;;
Named Types
Named types create a new type based on an existing one. The created type is a fresh type, and is not simply an alias. For example, in the standard library, we define a new type that can index any array, even if that array spans all of memory. It would be unwise to simply hard code a fixed size integer, so a new named type is defined:
type size = int64
Named types can also take type parameters, which can be substituted into the defined type. For example, we may want to define a linked list. Since the algorithms for a linked list are identical regardless of what it would contain, it makes sense to abstract the data structure over the contained types:
type list(@elt) = struct
val : @elt
next : list(@elt)#
;;
Type names live in a namespace from variable names. This means that a variable may share a name with a type without conflict.
Struct Types
Structs are used to lump together variables into a single unit. Unlike many other languages, they are anonymous. The named type facility, introduced above, is typically used to name these types.
struct
len : int
var2 : char
var3 : byte[:]
;;
In typical use, they are coupled with a named type for ergonomic reasons. Each
element in a struct is called a 'member', and is accessed with the .
operator:
var s : struct
val : int
;;
x = s.val
The member operator also works on pointers to structs, implicitly dereferencing the struct.
var sptr = &struct
sptr.val = 123
Union Types
Union types are used to select between one of many alternatives contained in a
value. They can be thought of as a tag and value pair. The tag is sometimes
referred to as a constructor
, because of its use in creating a union from a
value. Like structs, typically unions are used in conjunction with a named
type for convenience and readability.
union
`Tag1
`Tag2 int
`Tag3 int
;;
Unions are constructed by prefixing a value of the appropriate type with the union tag. If the union has no value for the tag, then the tag stands on its own as a constructor.
uval = `Tag2 123
uval = `Tag1
Once a value is put into a union, extracting it requires checking the tag in a pattern match. This pattern match can come from either a match statement or a loop pattern:
for `std.Some val : iterable
or
match x
| `Foo:
| `Bar x:
;;
Types in unions may be repeated. Only the tag must be unique for each case.
Tuple Types
Tuples are defined with a parenthesized list of types. They store a number of values, similar to structs, but each member is anonymous. The main advantage of a tuple is that they are syntactically lighter, and allow for easily assigning or returning multiple values at once. Tuple types are written out as a parenthesized sequence of types:
(int, char, byte[:])
They are created by parenthesizing a list of values:
tup = (1,'2', "three")
If the tuple has only one element, a trailing comma is required to distinguish the tuple from an expression that was parenthesized for precedence:
one_element_tuple = (1 + 1,)
Tuples can be destructured on assignment, with an lvalue tuple assigning
to each member elementwise. For example, in the below code, x
, y
, and
z
will hold the first, second, and third elements of the tuple tup
respectively:
(x, y, z) = tup
Function Types
Functions are defined using '(arg : type1, list : type2-> ret)'. For example,
(x : int -> void)
would denote a function type with a single argument x
,
and a void return type. Functions can also be variadic. This means that they
can take any number of arguments through a final parameter of type ...
.
For example, you could declare a put function as:
const put : (fmt : byte[:], args : ... -> void)
and call it as:
put("{}, {}\n", 123, 456)
The variadic arguments can be extracted and manipulated through the std.va* functions in libstd. These are documented here.
Style
Myrddin is a simple language, and is best served with a sparse style. Cleverness, while sometimes useful, is often best avoided. Complicated features are best use sparingly. The code should be written to minimize surprise for the reader.
Avoid ceremony. Code should simply do what it says it does, without getters, factories and generators, design patterns written for the sake of following best practices. Some ceremony can be useful, but often it seems to be solutions in search of a problem.
Many powerful features are like salt. When used sparingly, they make programming palatable, but heavy use leads to unpleasant results. Use function pointers, traits, and generics if they fit the problem, but stop first and think if there is a simpler way to write the code.
Function pointers and traits are especially harmful to readability when used heavily. Both break the one-to-one relationship between a name and the code that is is mapped to, making it harder to build a mental model of the code.
Function names are ideally terse. The ideal function name is simply a
verb()
. For the same reason that people don't expand out words to the
full dictionary definition, it's best to avoid expanding function names
to full sentences. Sometimes a single verb isn't sufficiently expressive.
Use your judgement here.
Comments are useful, but should explain why a decision was made, rather than explaining what the code does. For example, this comment is very useful:
/*
tricky: we need power of two alignment, so we allocate double the
needed size, chop off the unaligned ends, and waste the address
space. With 64 bits of address space, this waste should not be
an issue. On a 32 bit system this would be a bad idea, and we
may want to revisit this.
*/
p = getmem(Slabsz *2)
s = (align((p : size), Slabsz) : slab#)
However, if we merely commented what we were doing, this would be a waste of space:
/* allocate 2 * Slabsz bytes of memory */
p = getmem(Slabsz *2)
/* align the result to slabsz */
s = (align((p : size), Slabsz) : slab#)
Comments that fail to explain the reasoning behind a decision should be deleted.
Conventions
Functions, variables, and types are named with lowercase
names. We prefer
oneword
names, but snake_case
is also acceptable. Types follow the same
convention. Constants are named with Initialupper
names.
Names should be as short as clarity allows. Local variables in small functions have all the context needed to make sense of them. Global variables may need longer names.
Use the standard result and option types. If a function may return a value,
use std.option(val)
. If a function returns an error, use std.result(ret,
err)
. If a function returns void
on success, then the return type should be
std.result(void, errtype)
.
Abstract lazily. While it makes sense to think about decoupling dependencies and slotting in multiple backends, it rarely pays to do the work before actually implementing that second backend. Hold off on abstraction until it is needed.
Keep lines short. Break up long, complex expressions into smaller ones. If necessary, use temporary variables for intermediate results. Overly long lines are difficult for eyes to track, so work to eliminate them. 60 characters of non-whitespace text is ideal.
Avoid deep nesting. It is better to return early than to nest conditionals. If matching patterns, often it is better to extract the match into a temporary variable than to nest another match.
Favour simplicity over efficiency until data suggests otherwise. If a fancy algorithm turns out to be warranted, a comment citing a reference that explains it in depth is a good idea.
Tabs are for indentation. Spaces are for surrounding operators, particularly low precedence operators.
Use block comments (/* and */). Line comments (// comments) are for commenting out code during development.
Name custom iterators by<valuetype>
. For example, bio provides a line
iterator and a char iterator for files. These are named, respectively,
bio.byline
and bio.byfile
.
Break these rules when it makes sense. They are suggestions, not laws.
Getting Help and Contributing
The language is young, and many bugs still lurk in the libraries and compilers. Furthermore, many libraries still exist only in our minds and hearts, and would be made more usable through the act of implementation.
Most discussion is on IRC, in #myrddin on irc.eigenstate.org. You can join using your favorite client, or online via kiwi IRC
We are also responsive on the mailing list.