Overview and macro building blocks
Generic programming in C is usually seen as painful, crufty, and verbose in comparison with languages such as C++ or Java. Many C programmers opt instead for the non-generic approach of implementing polymorphism with void *
, but this can carry a performance penalty, since the lack of precise type information at compile-time inhibits optimizations, and necessitates calling through function pointers (or worse, use of library calls like memcpy()
and memcmp()
).
One of the interesting features of the new C11 Standard is type-generic expressions, implemented using the _Generic
keyword (for a nice introduction, see http://www.robertgamble.net/2012/01/c11-generic-selections.html). By itself, this feature may not seem to add all that much convenience to the language, but combining it with some simple macro constructs makes it possible to write generic and fully type-safe data structures and functions, entirely in straight C. Adding a new type becomes as simple as creating a new entry in a few lists – still not quite as convenient as in more modern languages, but close enough for C work.
Ideally, a generic idiom should provide for:
- Type-independent programming and code reuse: data structures or functions are defined once and used across many types,
- Type checking: attempts to mix generic objects or functions of the wrong type should yield compile-time errors,
- No run-time overhead: defining separate implementations for each type allows the compiler to generate optimal code for each use case, and
- Simple, readable syntax.
In this post we’ll explore a simple implementation of generics in C11 that provides all four of the above (with some room for programming language connoisseurs to quibble over the last bullet point). All the source code and profiling output cited below is available on Bitbucket. The implementation rests on the concepts of “typed struct” and “typed function,” defined by the macros:
Using the ##
(concatenation) preprocessor operator, these macros simply append a type name to a given struct or function, separated by underscores. For example, TYPED_STRUCT(list, int)
expands to struct list_int
, while TYPED_FN(list, int, add)
becomes list_int_add
. These are just convenience definitions for use in header files where we’ll declare the typed variants for the structs or functions we’re creating.
A generic bounds-checked array
Consider the problem of implementing a generic, bounds-checked, fixed-size array of values — call it a chkd_array
. We’ll define this array’s implementation details once, and reuse that definition for any arbitrary type where it may be needed. If clients attempt to read or write outside the array’s bounds, rather than using error codes (or segfaulting or scribbling some memory somewhere), we wish to return a “garbage” error value which is specified at the time the chkd_array
is created. To demonstrate how everything works with different C data types, we’ll implement the chkd_array
for int
s, float
s, and a custom struct (called struct strct
and consisting of a single int
and float
).
The following macros declare the chkd_array
and some basic operations on it.
With these macros available, the statement
CHKD_ARRAY_DEC(int, int, "chkd_array_int")
would declare a struct chkd_array_int
, along with function signatures like
chkd_array_int_set(struct chkd_array_int *self, unsigned idx, int *val);
A DESC_LIT
string literal, unused in this particular macro, is included in its arguments so that we can return a descriptive name from the DESC
typed function. We’ll use the DECLARE
macro in a subsequent header file to generate code for a given list of types.
Next, we need the macros to define the implementations for these functions. At the point where this next header file is #include
d, the client already should have specified the types for which it needs chkd_array
definitions. For brevity’s sake I’ve included only the GET
and DESC
definitions here.
Note that we can’t define IS_EQUAL
in the “master” CHKD_ARRAY_DEF
macro, because the method body must differ depending on whether T
is a simple data type, or a struct
(for which the ==
operator is not defined). The definition is split out into a separate macro to at least reuse the code across primitive types.
Using these declaration and definition macros, we could set up our chkd_array
s by writing out CHKD_ARRAY_DEC
and CHKD_ARRAY_DEF
for each type, but there’s a more concise (and less error-prone) way. With XX factory macros, we can create a single list of types and automatically expand the macros for each list entry. The header for the chkd_array
client code defines the struct strct
type and the XX
lists of needed types.
Toward the end of this file, we finally make use of _Generic
. Before that point, with all of the macros we had defined so far, we still couldn’t work with our new array in a truly generic way, because the function names all have a type spliced into them — chkd_array_int_get
, chkd_array_float_get
, etc. What we want is to call a single get
function and leave it up to the compiler to figure out which function body to use based on the types being passed in. This is what _Generic
gives us. Since the get
function takes a pointer to a chkd_array
of a given type, the GENERIC_ARRAY_PTR_FN_DEFINE
macro can take those pointers along with a function name and select the appropriate typed function to use.
The rest of the chkd_array_factory_defs.h
file uses the XX
lists to define most of the function implementations, and creates a CHKD_ARRAY_GET
macro which provides the type-independent get
. It’s a simple matter to extend this example to the other chkd_array
functions.
The final piece of the puzzle is the definition of is_equal
for the local struct strct
type, which it makes sense to place in the client’s .c
file:
Now, if int_array
is a pointer to a struct chkd_array_int
, the statement
int *i_ptr = CHKD_ARRAY_GET(int_array, 3);
expands to a call to chkd_array_int_get
automatically. See the repo for a lot more examples of how this is implemented. Once all these macros have been set up, I find the syntax for using them almost downright pleasant; for instance, here’s how new chkd_array
s are created:
These macros serve as good building blocks for generic functions in whichever files they are used. (In the repo, have a look at the WRITE_VALS_DEF macro in chkd_array_demo.c for an example.) And as a final boon, they are also type-safe — attempting to get a float
pointer from a chkd_array_int
generates the following error in clang
:
Performance gains
Okay. So, why bother with all this effort? Setting up a generic data structure with this sort of preprocessor hackery is admittedly verbose and not a little tedious to write, and we could use void *
(and a bit of added care not to mix up our types) to get the same result from our programs.
As it turns out, avoiding the use of void *
and the associated function pointers (which, incidentally, also require more duplication of code) can reduce the number of instructions executed in our programs significantly. To see this effect in action, I created a void *
implementation which mirrors chkd_array
and each of its functions, and made an identically ordered series of calls, randomized by type, to both implementations. Performance profiles were generated using Valgrind’s callgrind
utility. Here’s how the two variants stacked up:
Here’s the Valgrind annotated output for the -O3
runs. The generic portion comprised the instructions counted in make_generic_calls
, and half those in chkd_array_strct_is_equal
(which is also used by the void *
version), while the remainder were executed during the void *
run.
With -O3
optimizations enabled, the generic functions gave us a healthy reduction in execution time, finishing in 64% of the time required for their void *
counterparts, while executing only 57% as many CPU instructions. The Valgrind output also hints that the compiler is able to inline the generic calls to the get
, set
, and is_equal
methods, likely the source of most of the reduction in instruction count. Optimizations like these allow C++’s std::sort
, a generic template implementation, to outperform naive library calls to C’s qsort
.
Some drawbacks
While the syntax made possible with these generic macros is quite nice once they’re defined, it’s tedious to create header declarations and definitions for entire data structures or algorithms within preprocessor #define
statements. Every line except the last must be terminated with a backslash, and it’s impossible — at least, as far as I’m aware — to add comments to the source code within the #define
. Editors and IDEs offer much less in the way of syntax highlighting, and the clang-format.py
tool, which I like to use to auto-format my C code in Vim, is also a little shaky when faced with macros using _Generic
.
In addition to those issues, which are mostly unique to C, these generic techniques carry all the same disadvantages of C++ templates. Code is duplicated for each generic function in each translation unit where it’s #include
d, which can lead to long compile times and bloated executables. Debugging, at both compile-time and run-time, is made much more difficult by the need to trace a generated line of code back to the macro where it was expanded. This only gets harder as more levels of generic macros are nested within each other.
Nevertheless, if you’re looking for a way to add type safety or squeeze a little performance out of your C program, have access to C11’s _Generic
or equivalent compiler extensions, and are willing to pay the cost in increased compile time and program size, this approach is worth considering.
Compiling with _Generic
clang has supported _Generic
since its 3.0 release. GCC 4.9, due (hopefully) to be released in the first half of this year, will support it as well. In the meantime, those committed to compiling with GCC can still emulate the C11 feature using the P99_GENERIC macro.