c++: inlining code away
Boris Kolpackov
boris at kolpackov.net
Thu Jul 1 21:57:32 CDT 2004
Good day,
Sometimes you want the same line of source code to do something in
one case and completely disappear in the other. assert() is a good
example. The common way of achieving this in a C/C++ environment is
to use the preprocessor. With this approach, however, you are stuck
with the function call notation. For assert() that's exactly what we
need but what if we want code like
a + b
or even
cerr << "allocation " << size << "bytes" << endl;
to go away? We could write
if_debug (cerr << "allocation " << size << "bytes" << endl;)
or something along those lines but it doesn't look very appealing.
To restate the problem: can we achieve conditional compilation for
operator-based language constructs like cerr << "hello"? The answer
could be a do-nothing inline function and a C++ compiler optimization.
But will this actually work? That's what we are going to test today.
Our example will be a piece of tracing facility I've been playing with
lately. One of the requirements were to be able to turn tracing
completely off with zero overhead in resulting code. Also I didn't
want to pose any notational burden on the user so I decided to provide
an interface similar to the one in the iostream library:
tout << "operator new (" << size << "): " << p;
Even though the code snippet above looks very innocent, there is quite
a lot of things going on under the hood. While the inter-workings of
the tracing facility is not the topic of this essay, the number of
actions performed under the hood is quite relevant to our discussion.
Therefore, I am going to provide a quick overview of how everything
works. There are two main object involved: a record and a stream.
The records are traced into the stream:
class record
// ...
template <typename x>
operator<< (x const& arg);
class stream
// ...
operator<< (record const& r);
Having these definitions we can write something like this:
stream tout;
record r;
r << "operator new (" << size << "): " << p;
tout << r;
Or even this:
tout << (record () << "operator new (" << size << "): " << p);
It is not exactly what we want, however. We would like the temporary
record to be automatically created for us:
class stream
// ...
class mediator
mediator (stream& s)
: s_ (s)
~mediator ()
s_ << r_;
stream& s_;
record r_;
friend record&
operator<< (mediator const& mc, char const* s)
mediator& m (const_cast<mediator&> (mc));
return m.r_ << s;
template <typename x>
friend record&
operator<< (mediator const& m, x const& arg)
mediator& m (const_cast<mediator&> (mc));
return m.r_ << arg;
Do you see how this works? Let's start from a simple example and walk
through it step-by-step:
tout << "hello";
When the compiler sees this line, it must decide which operator<< to
call. Let's see what choices it has:
stream& stream::
operator << (record const& r);
This one doesn't work since "hello" is of type char const [6], not
record, and there is no conversion from char const[6] (or decayed
char const*) to the type record.
operator<< (mediator const& mc, char const* s);
The second argument matches after decaying to char const*. The first
formal argument is of type mediator, the stream can be implicitly
converted to the type mediator (see mediator::mediator (stream&)).
We've got the match. In order to make the call the compiler creates
a temporary of type mediator and passes a const reference to it as
the first actual argument. The generated code will be something
equivalent to this:
mediator m (tout);
operator<< (m, "hello");
And our original example
tout << "operator new (" << size << "): " << p;
will be turned into this:
mediator m (tout);
operator<< (m, "operator new (").operator<< (size).
operator<< ("): ").operator<< (p);
The innocent looking piece of code turned out to do quite a lot:
the compiler has to create the temporary (with all the constructors)
and then call a number of functions each of which depends on the return
value of the previous.
Now let's go back to our zero-overhead problem: if we provide
a do-nothing inlined implementation, will the compiler be able to
optimize the whole thing away?
Here is our zero-overhead implementation:
class record
// ...
template <typename x>
operator<< (x const& arg)
return *this;
class stream
// ...
class mediator
mediator (stream& s)
: s_ (s)
~mediator ()
stream& s_;
record r_;
friend record&
operator<< (mediator const& mc, char const* s)
return const_cast<mediator&> (mc);
template <typename x>
friend record&
operator<< (mediator const& m, x const& arg)
return const_cast<mediator&> (mc);
Even though it's a do-nothing implementation, we are still performing
some initializations and return some values. Therefore, it's not quite
obvious that the compiler will be able to figure out that all those
actions don't produce anything.
Our test case will be a simple function, assembler code of which we
are going to inspect:
stream tout;
bar (size_t size, void* p)
tout << "operator new (" << size << "): " << p;
return 0;
Here is the assembler code for this function when compiled by
g++ 3.4.0 with -O2:
.globl _Z3barmPv
.type _Z3barmPv, @function
xorl %eax, %eax
For comparison here is the same function but compiled with -g:
.globl _Z3barmPv
.type _Z3barmPv, @function
.loc 2 26 0
pushq %rbp
movq %rsp, %rbp
pushq %rbx
subq $72, %rsp
movq %rdi, -24(%rbp)
movq %rsi, -32(%rbp)
.loc 2 27 0
leaq -64(%rbp), %rdi
movl $tout, %esi
call _ZN4cult5trace6stream8mediatorC1ERS1_
leaq -64(%rbp), %rdi
movl $.LC1, %esi
call _ZN4cult5tracelsERKNS0_6stream8mediatorEPKc
movq %rax, %rdi
leaq -24(%rbp), %rsi
call _ZN4cult5trace6recordlsImEERS1_RKT_
movq %rax, %rdi
movl $.LC2, %esi
call _ZN4cult5trace6recordlsIA4_cEERS1_RKT_
movq %rax, %rdi
leaq -32(%rbp), %rsi
call _ZN4cult5trace6recordlsIPvEERS1_RKT_
jmp .L13
movq %rax, -72(%rbp)
movq -72(%rbp), %rbx
leaq -64(%rbp), %rdi
call _ZN4cult5trace6stream8mediatorD1Ev
movq %rbx, -72(%rbp)
movq -72(%rbp), %rdi
call _Unwind_Resume
leaq -64(%rbp), %rdi
call _ZN4cult5trace6stream8mediatorD1Ev
.loc 2 28 0
movl $0, %eax
.loc 2 29 0
addq $72, %rsp
popq %rbx
I also ran this test on Intel C++ with the same results. This shows
that contemporary compilers are smart enough to make the technique
of inlining code away practical. Keep in mind, however, that in order
for this technique to work, the compiler should be able too see
through function calls until elementary operations. In particular, if
you have a call to a non-inline function as part of your expression
there is nothing the compiler can do about it except making the call.
To illustrate, consider this code fragment:
stream tout;
char const*
foo ();
bar (size_t size, void* p)
tout << foo () << size << p;
return 0;
When compiled by gcc 3.4.0 with -O2:
.globl _Z3barmPv
.type _Z3barmPv, @function
subq $40, %rsp
movq tout(%rip), %rax
movq $tout, (%rsp)
movq %rax, 8(%rsp)
movl tout+8(%rip), %eax
movl %eax, 16(%rsp)
call _Z3foov
xorl %eax, %eax
addq $40, %rsp
This is because a C/C++ compiler cannot make any assumptions about
arbitrary functions. Using GCC's function attributes we can specify
that our function is "pure" and consequently can be called fewer
times than the program says:
char const*
foo () __attribute__ ((pure));
With this hint GCC eliminates the call:
.globl _Z3barmPv
.type _Z3barmPv, @function
xorl %eax, %eax
If you have made it this far, thank you for your time. Permission is
granted to copy, distribute and/or modify this document under the terms
of the GNU Free Documentation License, Version 1.2; with no Invariant
Sections, no Front-Cover Texts, and no Back-Cover Texts.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 652 bytes
Desc: Digital signature
Url : http://www.kolpackov.net/pipermail/notes/attachments/20040701/0884b6ad/attachment-0001.bin
More information about the notes
mailing list