Myrddin: Strings

Summary: Strings

pkg std =
        /* string buffers */
        type strbuf = struct
        ;;

        const mksb  : (-> strbuf#)
        const mkbufsb   : (buf : byte[:] -> strbuf#)
        const sbfin : (sb : strbuf# -> byte[:])
        const sbfree    : (sb : strbuf# -> void)
        const sbpeek    : (sb : strbuf# -> byte[:])
        const sbputc    : (sb : strbuf#, v : char -> bool)
        const sbputs    : (sb : strbuf#, v : byte[:] -> bool)
        const sbputb    : (sb : strbuf#, v : byte -> bool)
        const sbtrim    : (sb : strbuf#, len : size -> void)

        /* string searching */
        const strfind   : (haystack : byte[:], needle : byte[:] -> option(size))
        const strrfind  : (haystack : byte[:], needle : byte[:] -> option(size))
        const strhas    : (haystack : byte[:], needle : byte[:] -> bool)
        const hasprefix : (s : byte[:], pre : byte[:] -> bool)
        const hassuffix : (s : byte[:], suff : byte[:] -> bool)

        /* C strings */
        const cstrlen   : (buf : byte[:] -> size)
        const cstrconv  : (buf : byte[:] -> byte[:])
        const cstrconvp : (p : byte# -> byte[:])

        /* tokenizing and splitting */
        const strsplit  : (s : byte[:], delim : byte[:] -> byte[:][:])
        const bstrsplit : (sp : byte[:][:], s : byte[:], delim : byte[:] -> byte[:][:])
        const strtok    : (s : byte[:] -> byte[:][:])
        const bstrtok   : (sp : byte[:][:], s : byte[:] -> byte[:][:])
    const bytok     : (str : byte[:] -> tokiter)

        /* string joining and stripping */
        const strcat    : (a : byte[:], b : byte[:] -> byte[:])
        const strjoin   : (strings : byte[:][:], delim : byte[:] -> byte[:])
        const strstrip  : (str : byte[:] -> byte[:])
        const strfstrip : (str : byte[:] -> byte[:])
        const strrstrip : (str : byte[:] -> byte[:])

        /* parsing numbers out of strings */
        generic intparsebase    : (s : byte[:], base : int -> option(@a)) :: integral,numeric @a
        generic intparse    : (s : byte[:]  -> option(@a)) :: integral,numeric @a
        generic charval : (c : char, base : int -> @a) :: integral,numeric @a
;;

Types

type strbuf = struct
;;

The strbuf type contains a string buffer under construction. It can operate in two modes: Allocated, and static. The allocated mode keeps track of buffer sizing, and grows it efficiently as the data is appended to it.

The static mode, on the other hand, has a fixed size buffer that is provided to it, and truncates values to fit the buffer if needed. This can be used when allocations are undesirable.

Functions: String buffers

const mksb  : (-> strbuf#)

Mksb creates an string buffer. The buffer returned is in allocated mode, and starts off with an empty string.

const mkbufsb   : (buf : byte[:] -> strbuf#)

Mkbufsb creates a fixed size string buffer, initialized with buf. The initial length of the string is empty, regardless of the contents of the buffer. Anything in it will be overwritten as appends happen.

const sbfin : (sb : strbuf# -> byte[:])

Sbfin finishes the string buffer. Any auxiliary resources allocated by the string buffer are released, and the final string that was constructed is returned. In dynamic mode, the string is heap allocated, and must be freed with slfree. Otherwise, it is simply a slice into the buffer passed in the mkbufsb call.

const sbfree    : (sb : strbuf# -> void)

Sbfree frees the string buffer sb, and all associated resources. The string under construction is discarded if the buffer is dynamically allocated.

const sbpeek    : (sb : strbuf# -> byte[:])

Sbpeek returns the portion of the string that has already been constructed by the string buffer, without freeing it. The returned string is only valid as long as the buffer is not modified.

const sbputc    : (sb : strbuf#, v : char -> bool)

Sbputc appends a single character to the string buffer, encoding it into utf8 before appending it to the buffer. If the buffer is fixed and the character will not fit, then it will be dropped in its entirety.

Returns: True if there was sufficient space to append the character, false otherwise.

const sbputs    : (sb : strbuf#, v : byte[:] -> bool)

Sbputs appends a string to the string buffer. If the buffer is a static buffer, and the string is too long to fit, as much of it as can possibly fit will be copied. This may truncate characters mid-way through.

Returns: True if there was sufficient space to append the character, false otherwise.

const sbputb    : (sb : strbuf#, v : byte -> bool)

Sbputs appends a single byte to the string buffer. If the buffer is a static buffer and the character does not fit, the buffer will remain unmodified

Returns: True if there was sufficient space to append the byte, false otherwise.

const sbtrim    : (sb : strbuf#, len : size -> void)

Truncates a string buffer to the length provided. If the length provided i longer than the size of the buffer, the length of the buffer remains unmodified.

Functions: Searching

const strfind   : (haystack : byte[:], needle : byte[:] -> option(size))
const strrfind  : (haystack : byte[:], needle : byte[:] -> option(size))

Strfind finds the first occurrence of the string needle within the string haystack.Strrfind is similar, but it finds the last occurrence of needle within haystack.

Returns: `std.Some index if needle is found within haystack, or `None otherwise.

const strhas    : (haystack : byte[:], needle : byte[:] -> bool)

Strhas returns true if the string needle is found within haystack.

const hasprefix : (s : byte[:], pre : byte[:] -> bool)

hasprefix returns true if the string pre is found at the start of s.

const hassuffix : (s : byte[:], suff : byte[:] -> bool)

hassuffix returns true if the string pre is found at the end of s.

 const strreplace   : (haystack : byte[:], needle : byte[:], repl : byte[:] -> byte[:])

Strreplace replaces all instances of needle in haystack, returning a newly allocated string.

Functions: Splitting and joining

const strsplit  : (s : byte[:], delim : byte[:] -> byte[:][:])
const strtok    : (s : byte[:] -> byte[:][:])

Strsplit and strtok will split a string into its components. Strsplit uses a string passed in as a delimiter, while strtok will use a variable amount of whitespace to split the string. They dynamically allocate the split vector, but not the elements within it.

If there are repeated delimiters, strsplit will include zero length strings between them. For example, std.strsplit("a<><>b<>c", "<>") will return ["a", "", "b", "c"].

const bstrsplit : (sp : byte[:][:], s : byte[:], delim : byte[:] -> byte[:][:])
const strtok    : (sp : byte[:][:], s : byte[:] -> byte[:][:])

The bstrsplit and bstrtok functions produce a split string similar to the strsplit versions above, but will fill the sp vector passed in, instead of allocating their own storage. If there are more splits to be made than the vector can hold, then the last element will contain the tail of the string.

const strcat    : (a : byte[:], b : byte[:] -> byte[:])

Strcat will concatenate two strings, returning a new buffer containing the string that was created.

const strjoin   : (strings : byte[:][:], delim : byte[:] -> byte[:])

Strcat will concatenate all strings in a list, returning a new buffer containing the string that was created. It will interpose the delimiter between them.

const strstrip  : (str : byte[:] -> byte[:])
const strfstrip : (str : byte[:] -> byte[:])
const strrstrip : (str : byte[:] -> byte[:])

Strstrip will remove all whitespace characters from both ends of the string. Strfstrip and strrstrip will strip all whitespace characters from the start or end of the of the string, respectively.

 const bytok    : (str : byte[:] -> tokiter)

Bytok iterates over a string token by token.