C11’s _Generic Keyword: Macro Applications and Performance Impacts

Overview and macro building blocks

Generic programming in C is usually seen as painful, crufty, and verbose in comparison with languages such as C++ or Java.  Many C programmers opt instead for the non-generic approach of implementing polymorphism with void *, but this can carry a performance penalty, since the lack of precise type information at compile-time inhibits optimizations, and necessitates calling through function pointers (or worse, use of library calls like memcpy() and memcmp()).

One of the interesting features of the new C11 Standard is type-generic expressions, implemented using the _Generic keyword (for a nice introduction, see http://www.robertgamble.net/2012/01/c11-generic-selections.html).  By itself, this feature may not seem to add all that much convenience to the language, but combining it with some simple macro constructs makes it possible to write generic and fully type-safe data structures and functions, entirely in straight C.  Adding a new type becomes as simple as creating a new entry in a few lists – still not quite as convenient as in more modern languages, but close enough for C work.

Ideally, a generic idiom should provide for:

  • Type-independent programming and code reuse: data structures or functions are defined once and used across many types,
  • Type checking: attempts to mix generic objects or functions of the wrong type should yield compile-time errors,
  • No run-time overhead: defining separate implementations for each type allows the compiler to generate optimal code for each use case, and
  • Simple, readable syntax.

In this post we’ll explore a simple implementation of generics in C11 that provides all four of the above (with some room for programming language connoisseurs to quibble over the last bullet point).  All the source code and profiling output cited below is available on Bitbucket.  The implementation rests on the concepts of “typed struct” and “typed function,” defined by the macros:

Using the ## (concatenation) preprocessor operator, these macros simply append a type name to a given struct or function, separated by underscores.  For example, TYPED_STRUCT(list, int) expands to struct list_int, while TYPED_FN(list, int, add) becomes list_int_add.  These are just convenience definitions for use in header files where we’ll declare the typed variants for the structs or functions we’re creating.

A generic bounds-checked array

Consider the problem of implementing a generic, bounds-checked, fixed-size array of values — call it a chkd_array.  We’ll define this array’s implementation details once, and reuse that definition for any arbitrary type where it may be needed.  If clients attempt to read or write outside the array’s bounds, rather than using error codes (or segfaulting or scribbling some memory somewhere), we wish to return a “garbage” error value which is specified at the time the chkd_array is created.  To demonstrate how everything works with different C data types, we’ll implement the chkd_array for ints, floats, and a custom struct (called struct strct and consisting of a single int and float).

The following macros declare the chkd_array and some basic operations on it.

With these macros available, the statement

CHKD_ARRAY_DEC(int, int, "chkd_array_int")

would declare a struct chkd_array_int, along with function signatures like

chkd_array_int_set(struct chkd_array_int *self, unsigned idx, int *val);

A DESC_LIT string literal, unused in this particular macro, is included in its arguments so that we can return a descriptive name from the DESC typed function. We’ll use the DECLARE macro in a subsequent header file to generate code for a given list of types.

Next, we need the macros to define the implementations for these functions.  At the point where this next header file is #included, the client already should have specified the types for which it needs chkd_array definitions. For brevity’s sake I’ve included only the GET and DESC definitions here.

Note that we can’t define IS_EQUAL in the “master” CHKD_ARRAY_DEF macro, because the method body must differ depending on whether T is a simple data type, or a struct (for which the == operator is not defined).  The definition is split out into a separate macro to at least reuse the code across primitive types.

Using these declaration and definition macros, we could set up our chkd_arrays by writing out CHKD_ARRAY_DEC and CHKD_ARRAY_DEF for each type, but there’s a more concise (and less error-prone) way. With XX factory macros, we can create a single list of types and automatically expand the macros for each list entry. The header for the chkd_array client code defines the struct strct type and the XX lists of needed types.

Toward the end of this file, we finally make use of _Generic. Before that point, with all of the macros we had defined so far, we still couldn’t work with our new array in a truly generic way, because the function names all have a type spliced into them — chkd_array_int_get, chkd_array_float_get, etc. What we want is to call a single get function and leave it up to the compiler to figure out which function body to use based on the types being passed in. This is what _Generic gives us. Since the get function takes a pointer to a chkd_array of a given type, the GENERIC_ARRAY_PTR_FN_DEFINE macro can take those pointers along with a function name and select the appropriate typed function to use.

The rest of the chkd_array_factory_defs.h file uses the XX lists to define most of the function implementations, and creates a CHKD_ARRAY_GET macro which provides the type-independent get. It’s a simple matter to extend this example to the other chkd_array functions.

The final piece of the puzzle is the definition of is_equal for the local struct strct type, which it makes sense to place in the client’s .c file:

Now, if int_array is a pointer to a struct chkd_array_int, the statement

int *i_ptr = CHKD_ARRAY_GET(int_array, 3);

expands to a call to chkd_array_int_get automatically.  See the repo for a lot more examples of how this is implemented.  Once all these macros have been set up, I find the syntax for using them almost downright pleasant; for instance, here’s how new chkd_arrays are created:

These macros serve as good building blocks for generic functions in whichever files they are used.  (In the repo, have a look at the WRITE_VALS_DEF macro in chkd_array_demo.c for an example.)  And as a final boon, they are also type-safe — attempting to get a float pointer from a chkd_array_int generates the following error in clang:

Performance gains

Okay.  So, why bother with all this effort?  Setting up a generic data structure with this sort of preprocessor hackery is admittedly verbose and not a little tedious to write, and we could use void * (and a bit of added care not to mix up our types) to get the same result from our programs.

As it turns out, avoiding the use of void * and the associated function pointers (which, incidentally, also require more duplication of code) can reduce the number of instructions executed in our programs significantly.  To see this effect in action, I created a void * implementation which mirrors chkd_array and each of its functions, and made an identically ordered series of calls, randomized by type, to both implementations. Performance profiles were generated using Valgrind’s callgrind utility.  Here’s how the two variants stacked up:

valgrind-output-final

Here’s the Valgrind annotated output for the -O3 runs. The generic portion comprised the instructions counted in make_generic_calls, and half those in chkd_array_strct_is_equal (which is also used by the void * version), while the remainder were executed during the void * run.

With -O3 optimizations enabled, the generic functions gave us a healthy reduction in execution time, finishing in 64% of the time required for their void * counterparts, while executing only 57% as many CPU instructions.  The Valgrind output also hints that the compiler is able to inline the generic calls to the get, set, and is_equal methods, likely the source of most of the reduction in instruction count. Optimizations like these allow C++’s std::sort, a generic template implementation, to outperform naive library calls to C’s qsort.

Some drawbacks

While the syntax made possible with these generic macros is quite nice once they’re defined, it’s tedious to create header declarations and definitions for entire data structures or algorithms within preprocessor #define statements.  Every line except the last must be terminated with a backslash, and it’s impossible — at least, as far as I’m aware — to add comments to the source code within the #define.  Editors and IDEs offer much less in the way of syntax highlighting, and the clang-format.py tool, which I like to use to auto-format my C code in Vim, is also a little shaky when faced with macros using _Generic.

In addition to those issues, which are mostly unique to C, these generic techniques carry all the same disadvantages of C++ templates.  Code is duplicated for each generic function in each translation unit where it’s #included, which can lead to long compile times and bloated executables.  Debugging, at both compile-time and run-time, is made much more difficult by the need to trace a generated line of code back to the macro where it was expanded. This only gets harder as more levels of generic macros are nested within each other.

Nevertheless, if you’re looking for a way to add type safety or squeeze a little performance out of your C program, have access to C11’s _Generic or equivalent compiler extensions, and are willing to pay the cost in increased compile time and program size, this approach is worth considering.

Compiling with _Generic

clang has supported _Generic since its 3.0 release. GCC 4.9, due (hopefully) to be released in the first half of this year, will support it as well. In the meantime, those committed to compiling with GCC can still emulate the C11 feature using the P99_GENERIC macro.

A Few Good Systems Programmers

Having recently learned a little C, and inspired by this hilarious take on systems programming by a Microsoft researcher, I was moved to write this riff on Colonel Jessup’s famous courtroom meltdown from A Few Good Men:

KAFFEE:  Colonel Jessup, did you make the off-by-one error?

JUDGE RANDOLPH:  You don’t have to answer that question!

COL. JESSUP:  I’ll answer the question!

[to KAFFEE]

COL. JESSUP:  You want answers?

KAFFEE:  I think I’m entitled to.

COL. JESSUP:  YOU WANT ANSWERS?

KAFFEE:  I WANT THE TRUTH!

COL. JESSUP:  YOU CAN’T HANDLE THE TRUTH!

[pauses]

COL. JESSUP:  Son, we use memory that has cells, and those cells have to be addressed by men with pointers. Who’s gonna do it? You? You, Lt. Weinburg? I have a greater responsibility than you could possibly fathom. You weep for LISP, and you curse manual memory management. You have that luxury. You have the luxury of not knowing what I know. That segfaults and memory corruption, while tragic, probably saves processor cycles. And my existence, while grotesque and incomprehensible to you, saves cycles. You don’t want the truth because deep down in places you don’t talk about on HackerNews, you want me on those pointers, you NEED me on those pointers. We use words like alignment, type-punning, endian-ness. We use these words as the backbone of a life spent debugging something. You use them as a punchline. I have neither the time nor the inclination to explain myself to a man who allocates and garbage collects under the blanket of the very performance that I provide, and then questions the manner in which I provide it. I would rather you just said thank you, and went on your way. Otherwise, I suggest you pick up a coredump, and load it into GDB. Either way, I don’t give a damn what you think you are entitled to.

KAFFEE:  Did you make the off-by-one error?

COL. JESSUP:  I did the job I–

KAFFEE:  DID YOU MAKE THE OFF-BY-ONE ERROR?

COL. JESSUP:  YOU’RE GODDAMN RIGHT I DID!