C11’s _Generic Keyword: Macro Applications and Performance Impacts

Overview and macro building blocks

Generic programming in C is usually seen as painful, crufty, and verbose in comparison with languages such as C++ or Java.  Many C programmers opt instead for the non-generic approach of implementing polymorphism with void *, but this can carry a performance penalty, since the lack of precise type information at compile-time inhibits optimizations, and necessitates calling through function pointers (or worse, use of library calls like memcpy() and memcmp()).

One of the interesting features of the new C11 Standard is type-generic expressions, implemented using the _Generic keyword (for a nice introduction, see http://www.robertgamble.net/2012/01/c11-generic-selections.html).  By itself, this feature may not seem to add all that much convenience to the language, but combining it with some simple macro constructs makes it possible to write generic and fully type-safe data structures and functions, entirely in straight C.  Adding a new type becomes as simple as creating a new entry in a few lists – still not quite as convenient as in more modern languages, but close enough for C work.

Ideally, a generic idiom should provide for:

  • Type-independent programming and code reuse: data structures or functions are defined once and used across many types,
  • Type checking: attempts to mix generic objects or functions of the wrong type should yield compile-time errors,
  • No run-time overhead: defining separate implementations for each type allows the compiler to generate optimal code for each use case, and
  • Simple, readable syntax.

In this post we’ll explore a simple implementation of generics in C11 that provides all four of the above (with some room for programming language connoisseurs to quibble over the last bullet point).  All the source code and profiling output cited below is available on Bitbucket.  The implementation rests on the concepts of “typed struct” and “typed function,” defined by the macros:

Using the ## (concatenation) preprocessor operator, these macros simply append a type name to a given struct or function, separated by underscores.  For example, TYPED_STRUCT(list, int) expands to struct list_int, while TYPED_FN(list, int, add) becomes list_int_add.  These are just convenience definitions for use in header files where we’ll declare the typed variants for the structs or functions we’re creating.

A generic bounds-checked array

Consider the problem of implementing a generic, bounds-checked, fixed-size array of values — call it a chkd_array.  We’ll define this array’s implementation details once, and reuse that definition for any arbitrary type where it may be needed.  If clients attempt to read or write outside the array’s bounds, rather than using error codes (or segfaulting or scribbling some memory somewhere), we wish to return a “garbage” error value which is specified at the time the chkd_array is created.  To demonstrate how everything works with different C data types, we’ll implement the chkd_array for ints, floats, and a custom struct (called struct strct and consisting of a single int and float).

The following macros declare the chkd_array and some basic operations on it.

With these macros available, the statement

CHKD_ARRAY_DEC(int, int, "chkd_array_int")

would declare a struct chkd_array_int, along with function signatures like

chkd_array_int_set(struct chkd_array_int *self, unsigned idx, int *val);

A DESC_LIT string literal, unused in this particular macro, is included in its arguments so that we can return a descriptive name from the DESC typed function. We’ll use the DECLARE macro in a subsequent header file to generate code for a given list of types.

Next, we need the macros to define the implementations for these functions.  At the point where this next header file is #included, the client already should have specified the types for which it needs chkd_array definitions. For brevity’s sake I’ve included only the GET and DESC definitions here.

Note that we can’t define IS_EQUAL in the “master” CHKD_ARRAY_DEF macro, because the method body must differ depending on whether T is a simple data type, or a struct (for which the == operator is not defined).  The definition is split out into a separate macro to at least reuse the code across primitive types.

Using these declaration and definition macros, we could set up our chkd_arrays by writing out CHKD_ARRAY_DEC and CHKD_ARRAY_DEF for each type, but there’s a more concise (and less error-prone) way. With XX factory macros, we can create a single list of types and automatically expand the macros for each list entry. The header for the chkd_array client code defines the struct strct type and the XX lists of needed types.

Toward the end of this file, we finally make use of _Generic. Before that point, with all of the macros we had defined so far, we still couldn’t work with our new array in a truly generic way, because the function names all have a type spliced into them — chkd_array_int_get, chkd_array_float_get, etc. What we want is to call a single get function and leave it up to the compiler to figure out which function body to use based on the types being passed in. This is what _Generic gives us. Since the get function takes a pointer to a chkd_array of a given type, the GENERIC_ARRAY_PTR_FN_DEFINE macro can take those pointers along with a function name and select the appropriate typed function to use.

The rest of the chkd_array_factory_defs.h file uses the XX lists to define most of the function implementations, and creates a CHKD_ARRAY_GET macro which provides the type-independent get. It’s a simple matter to extend this example to the other chkd_array functions.

The final piece of the puzzle is the definition of is_equal for the local struct strct type, which it makes sense to place in the client’s .c file:

Now, if int_array is a pointer to a struct chkd_array_int, the statement

int *i_ptr = CHKD_ARRAY_GET(int_array, 3);

expands to a call to chkd_array_int_get automatically.  See the repo for a lot more examples of how this is implemented.  Once all these macros have been set up, I find the syntax for using them almost downright pleasant; for instance, here’s how new chkd_arrays are created:

These macros serve as good building blocks for generic functions in whichever files they are used.  (In the repo, have a look at the WRITE_VALS_DEF macro in chkd_array_demo.c for an example.)  And as a final boon, they are also type-safe — attempting to get a float pointer from a chkd_array_int generates the following error in clang:

Performance gains

Okay.  So, why bother with all this effort?  Setting up a generic data structure with this sort of preprocessor hackery is admittedly verbose and not a little tedious to write, and we could use void * (and a bit of added care not to mix up our types) to get the same result from our programs.

As it turns out, avoiding the use of void * and the associated function pointers (which, incidentally, also require more duplication of code) can reduce the number of instructions executed in our programs significantly.  To see this effect in action, I created a void * implementation which mirrors chkd_array and each of its functions, and made an identically ordered series of calls, randomized by type, to both implementations. Performance profiles were generated using Valgrind’s callgrind utility.  Here’s how the two variants stacked up:

valgrind-output-final

Here’s the Valgrind annotated output for the -O3 runs. The generic portion comprised the instructions counted in make_generic_calls, and half those in chkd_array_strct_is_equal (which is also used by the void * version), while the remainder were executed during the void * run.

With -O3 optimizations enabled, the generic functions gave us a healthy reduction in execution time, finishing in 64% of the time required for their void * counterparts, while executing only 57% as many CPU instructions.  The Valgrind output also hints that the compiler is able to inline the generic calls to the get, set, and is_equal methods, likely the source of most of the reduction in instruction count. Optimizations like these allow C++’s std::sort, a generic template implementation, to outperform naive library calls to C’s qsort.

Some drawbacks

While the syntax made possible with these generic macros is quite nice once they’re defined, it’s tedious to create header declarations and definitions for entire data structures or algorithms within preprocessor #define statements.  Every line except the last must be terminated with a backslash, and it’s impossible — at least, as far as I’m aware — to add comments to the source code within the #define.  Editors and IDEs offer much less in the way of syntax highlighting, and the clang-format.py tool, which I like to use to auto-format my C code in Vim, is also a little shaky when faced with macros using _Generic.

In addition to those issues, which are mostly unique to C, these generic techniques carry all the same disadvantages of C++ templates.  Code is duplicated for each generic function in each translation unit where it’s #included, which can lead to long compile times and bloated executables.  Debugging, at both compile-time and run-time, is made much more difficult by the need to trace a generated line of code back to the macro where it was expanded. This only gets harder as more levels of generic macros are nested within each other.

Nevertheless, if you’re looking for a way to add type safety or squeeze a little performance out of your C program, have access to C11’s _Generic or equivalent compiler extensions, and are willing to pay the cost in increased compile time and program size, this approach is worth considering.

Compiling with _Generic

clang has supported _Generic since its 3.0 release. GCC 4.9, due (hopefully) to be released in the first half of this year, will support it as well. In the meantime, those committed to compiling with GCC can still emulate the C11 feature using the P99_GENERIC macro.

Advertisements

6 thoughts on “C11’s _Generic Keyword: Macro Applications and Performance Impacts

  1. this is a great post. the performance implications of using void* casting vs _Generic are explained very well. the first article I have seen to do so.

    it would be nice if the standard adopts typeof() or even decltype() to solve these problems. but_Generic is better then nothing.

  2. a note about gcc. a good number the c99 and c11 features come from gnu-c extentions. gnu-c has typeof() and a host of other extensions. if you want to use typeof() in c code you can do so with gcc.

  3. I used a similar technique once to implement a generic vector in C99 but never found how _Generic could improve the implementation. It used yet another layer of macros and indirection, plus the equivalent of a vtable to make the whole thing work. It also used variadic macro dispatch to overload functions on the number of parameters. Care to have a look? From what I can see we could both benefit from that 🙂

    http://codereview.stackexchange.com/q/43809/15094

  4. I’m trying to create a _Generic function to read bits in different byte and bit orders, sounds like the perfect application right?

    What I’m struggling with tho, is that the deciding variable isn’t the type of object, but an enum variable saying if it should read as big endian or little endian, how the hell do I do this?

    I have an enum called ByteOrders that’s defined as

    extern enum ByteOrders {
    BigEndian = 0, // Big Endian
    LittleEndian = 1, // Little Endian
    } ByteOrders;

    #define ReadBits(ByteOrder, BitB, Bits2Read) \
    _Generic( (ByteOrder), BigEndian:ReadBitsBigEndian, LittleEndian:ReadBitsLittleEndian)(BitB, Bits2Read)

    The only obvious problem is that I discard the ByteOrder variable from the macro at the end for the actual function definition (which is precisely what I want, that way there’s no branching, and so the code becomes more easily inlinable)

    • Unfortunately I’m no C expert — this article was really the only deep dive into C macros I’ve ever done. The only idea I have is to make each byte order a separate enum, so that they have different types which _Generic can disambiguate.

      You might try asking Jens Gustedt, he wrote the P99_GENERIC macro I linked to in the article.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s