Who Doesn’t Want Optimization? (Part One)

When it comes to optimization, C is often regarded as the ultimate language. Simple and efficient, C encourages programmers to write customized, often lightweight, though sometimes obscure, routines as opposed to its feature-richer alternatives. But is C always better at efficiency than other mainstream alternatives?

Ever since its birth, C++ has frequently been the subject of criticism for being overly bloated and complicated. Some of its higher-level, fancier features such as virtual functions and RTTI, although nice to have otherwise, more often than not impose unwanted performance penalties. The new C++ 11 standard has introduced a slew of new features, among which many aim to improve the performance. This post is going to look at constexpr and how its uses can provide a kind of optimization that is difficult, if not impossible, to achieve in C.

Let’s start with a typical use case in compilers. A compiler requires a tokenizer to parse user inputs and weed out incorrect ones. This can be done by defining various categories for inputs and transitioning states that map to inputs that are of these categories and those in between. The technique presented here will be taking advantage of maximal munch and DFA.

In our tokenizer, each character input shifts the current state according to a state table and when it reaches a null state, it goes back one step and examines that state; if it’s valid state, return that input with the corresponding type. If not, the tokenizer should report the error. The following figure depicts the transitioning rules of this particular tokenizer.

Transitioning rules for the tokenizer

Our tokenizer would populate a state table according to the above rules and, beginning from the state Start, examine each character of input by the state table and transition into the next available state. For example, the string VERTEX should match the token type Keyword whereas @101 should match to ID.

The state table can be implemented fairly efficiently using a two dimensional array, as below.

‘\t’, ‘\n’, ‘  ‘ A ~ Z 0 ~ 9 @ everything else
Start Whitespace Keyword Number Pre-ID
Whitespace Whitespace
Keyword Keyword
Number Number
Pre-ID ID
ID ID

There are six states, listed in the left most column, and five category of inputs, listed in the top most row. Each of the cells in the rest of the table represent the next state corresponding to its left most and top most neighbour, the current state and input. An empty cell simply means the next state is null. The states can be represented by enum, as their implicit conversion to integer makes looking up on the table for next state very easy.

enum State {
// Non-terminals
    ST_NUL,
    ST_START,
    ST_PRE_ID,

// Terminals
    ST_KEYWORD,
    ST_ID,
    ST_NUMBER,
    ST_WHITESPACE,    // Equal to (number of states - 1)
};

The first thing the tokenizer needs to do upon starting up would be to populate the state table. One strategy may involve manually typing out the entire initializer list, something like the following.

// Table has ST_WHITESPACE number of rows,
// and has transition rules corresponding to ASCII.
const State Table[ST_WHITESPACE + 1][256] = {

// Next state for each character in ASCII from state ST_NUL.
    { 0 },    // ST_NUL always transitions into ST_NUL.

// Next state for each character in ASCII from state ST_START.
// The order of the values match the order of ASCII characters.
//
// The first value corresponds to the first ASCII character,
// which is a whitespace. Hence the value of 6 (ST_WHITESPACE).
//
// As anyone can see, this is quite ugly and hard to maintain.
    { 6, 0, 0, 0, /* after many 0's */, 5, 5, 5, /* ... */, 0, 0},

    // Initialize other states... PAINFUL!
};

Since ST_NUL transitions into itself regardless of input, we “only” need to type 5 * 256 entries to define our table capable of transitioning between only six states. Doesn’t sound too bad? There is a catch. You may need to do it more than once!

Consider if you would like to modify the tokenizer to recognize some extra symbols, such as braces and hash. This is useful as braces can be used to indicate the start and end of a function whereas hash can indicate the start of a comment. And so, our enum defining the state now looks like the following.

enum State {
// Non-terminals
    ST_NUL,
    ST_START,
    ST_PRE_ID,

// Terminals
    ST_KEYWORD,
    ST_ID,
    ST_NUMBER,
    ST_WHITESPACE,    // Equal to (number of states - 1)

    // More token types are good!
    ST_LBRACE,    // '{'
    ST_RBRACE,    // '}'
    ST_COMMENT,   // '#'
};

Three more states. So, just add 3 * 128 more entries to our table and then we can call it a day. Right?

Well, no. We need to change how other states respond to these newly added entries as well. ST_START previously would just transition to ST_NUL upon seeing a character of {, } or #. It can no longer do that. We need to make sure it transitions into ST_LBRACEST_RBRACE, or ST_COMMENT respectively. In ASCII table, { corresponds to a hexadecimal value of 0X5B, which is 133 in decimal. So, the 133th value in the second initialization block of our table needs to be changed to 7 from 0. Same needs to be done for the other states except ST_NUL. And don’t forget the extra 3 * 128 entries. And don’t forget next time you add or remove states, they will need to be modified accordingly as well. And don’t forget…

It’s clear that this approach is just not practical. So do we have to resort to populating the table at run time? In C or C++ 03, this would probably be the way to go. We take a hit in performance at program start and then won’t need to pay that price again until the next time the tokenizer starts. The initialization code may look something like this.

// Macros for convenience.
#define WHITESPACES     "\t\n\r "
#define LETTERS         "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
#define DIGITS          "0123456789"

// Zero initialize everything at first.
State Table[ST_WHITESPACE + 1][128] = { 0 };

void setT (State from, char *str, State to) {
    for (unsigned int i = 0; i < strlen(str); ++i)
        Table[from][(unsigned)chars[i]] = to;
}

void initTable () {    // Call this first at run time.
    setT(ST_START,      WHITESPACES, ST_WHITESPACE);
    setT(ST_WHITESPACE, WHITESPACES, ST_WHITESPACE);
    setT(ST_START,      LETTERS,     ST_KEYWORD);
    setT(ST_KEYWORD,    LETTERS,     ST_KEYWORD);
    setT(ST_START,      DIGITS,      ST_NUMBER);
    setT(ST_NUMBER,     DIGITS,      ST_NUMBER);
    setT(ST_START,      "@",         ST_PRE_ID);
    setT(ST_PRE_ID,     DIGITS,      ST_ID);
    setT(ST_ID,         DIGITS,      ST_ID);
    setT(ST_START,      "{",         ST_LBRACE);
    setT(ST_START,      "}",         ST_RBRACE);
    setT(ST_START,      "#",         ST_COMMENT);

    // Everything after '#' is comment.
    for (unsigned int i = 0; i < 128; ++i)
        Table[ST_COMMENT][i] = ST_COMMENT;
}

For our tokenizer, the performance loss incurred by run time initialization is going to be negligible due to the table size being so small. However, what if instead of nine states there are several hundreds of states? And what if instead of accepting only ASCII input the tokenizer must accept accept unicode with size up to 32 bits? If that was the case, the performance hit at program start is going to be miserable. It also renders our first approach impractical due to the sheer number of entries one has to key in. In that scenario, one would be almost certain that a third party tool would be involved. Perhaps a ruby script generating the entries for the initializer list.

C++ 11 comes to the rescue.

This iteration of the standard saw the introduction of constexpr1. This feature allows certain computation to take place during compile time. The rest of this post will explain how to take of advantage of this feature to write code that was previously only possible to execute at run time, but now can be used to populate our state table at compile time!

Some restrictions exist on the use of constexpr. A function marked with the keyword must consist of a single return statement and may use only functions and global variables marked with constexpr. This means we cannot just label the above run time initialization code with constexpr and call it a day. We will need to work with the restrictions, especially the single return statement limitation.

So how much can you do with a return statement? Quite a bit. For all we need, constructing a function returning the next state when given the current state and input is surprisingly straightforward.

constexpr
State nextState (State from, char c)
{
// isSpace, isLetter, isNumber, isASCII defined elsewhere...
    return from == ST_START      && isSpace(c)  ? ST_WHITESPACE:
           from == ST_WHITESPACE && isSpace(c)  ? ST_WHITESPACE:
           from == ST_START      && isLetter(c) ? ST_KEYWORD:
           from == ST_KEYWORD    && isLetter(c) ? ST_KEYWORD:
           from == ST_START      && isNumber(c) ? ST_NUMBER:
           from == ST_NUMBER     && isNumber(c) ? ST_NUMBER:
           from == ST_START      && (c == '@')  ? ST_ID:
           from == ST_ID         && isNumber(c) ? ST_ID:
           from == ST_START      && (c == '{')  ? ST_LBRACE:
           from == ST_START      && (c == '}')  ? ST_RBRACE:
           from == ST_START      && (c == '#')  ? ST_COMMENT:
           from == ST_COMMENT    && isASCII(c)  ? ST_COMMENT:
           ST_NUL;
}

Above code does a ternary operation on the current state and input, testing if they match the transition rules. If they do, return the matched state; if not, proceed to the next comparison.  All calls that pass constexpr parameters to nextState will be resolved at compile-time. For example, we can populate an array of State‘s at compile-time, like the following.

// Transition rules that map every input in ASCII
// to the next state when the current state is ST_START.
constexpr
State transitionFromStart = {
    nextState(ST_START, 0),
    nextState(ST_START, 1),
    nextState(ST_START, 2),
    nextState(ST_START, 3),
    // ...
    nextState(ST_START, 127),
};

Even in such a simple case, 128 function calls are saved. The program simply loads pre-computed values into transitionFromStart when it starts up. Now, some may ask how this is any different from simply typing out the values by hand. Well, that is exactly what we want. The end result remains the same but the process is made much better. If we ever want to change the transition rules, simply modifying nextState is sufficient. It will correctly alter the content of transitionFromStart as well.

We still face the problem of having to manually type each pairing of state and ascii input. To populate the entire table, the same number of 9 * 128 entires are required. However, note that each function call is very similar to each other, perfect for some macros.

// Pass the State to INIT, and initalize
// every input with regards to this state.
#define INITIALIZE(S) { INIT(S,0) }

// Step through all 128 ASCII characters.
#define INIT(S, N) INIT_##N(S)
#define INIT_0(S) nextState((S), 0), INIT_1(S)
#define INIT_1(S) nextState((S), 1), INIT_2(S)
// ...
#define INIT_126(S) nextState((S), 126), INIT_127(S)
#define INIT_127(S) nextState((S), 127)

The above code can be generated very easily by another program. I did so by using a ruby script. When INITIALIZE is used, it will expand to calling nextState for all ascii characters based on the specified current state. So, now the initialization code for our state table can be written as follows.

constexpr State table[ST_WHITESPACE + 1][256] = {
    INITIALIZE(ST_NUL),
    INITIALIZE(ST_START),
    INITIALIZE(ST_ID),
    INITIALIZE(ST_NUMBER),
    INITIALIZE(ST_KEYWORD),
    INITIALIZE(ST_LBRACE),
    INITIALIZE(ST_RBRACE),
    INITIALIZE(ST_COLON),
    INITIALIZE(ST_SEMICOLON),
    INITIALIZE(ST_COMMENT),
    INITIALIZE(ST_WHITESPACE)
};

With the help of the macros, we have greatly reduced the amount of code. Later if we make a change to the transitioning rules, such as adding a new state. We simple add a new clause, INITIALIZE(ST_NEW_STATE), to the initializer list. This approach is much better than either the entering all values by hand (or a third party tool). It has the same performance and is much easier to maintain.

But is this good enough? Instead of having ugly initializer list, we now have a large number of clunky macros. If the tokenizer were to accept a wider range of input (such as extended ASCII), we would have to add more macros. Although using another program to generate the macro can be very trivial, it is still not as easy as the run-time initialization approach.

So, can we do better than that? Is it possible to avoid the macros all together and still have the compiler populate the state table at compile time?

In the next part of this post, I will focus on the varidiac template and std::initializer_list and how to use them to achieve the perfect blend of performance and simplicity.

[1] Alex Allain from Cprogramming.com has an excellent article on constexpr.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s