From boris at kolpackov.net Thu Jul 1 21:57:32 2004 From: boris at kolpackov.net (Boris Kolpackov) Date: Thu Jul 1 22:06:54 2004 Subject: c++: inlining code away Message-ID: <20040702025732.GA7516@kolpackov.net> Good day, Sometimes you want the same line of source code to do something in one case and completely disappear in the other. assert() is a good example. The common way of achieving this in a C/C++ environment is to use the preprocessor. With this approach, however, you are stuck with the function call notation. For assert() that's exactly what we need but what if we want code like a + b or even cerr << "allocation " << size << "bytes" << endl; to go away? We could write if_debug (cerr << "allocation " << size << "bytes" << endl;) or something along those lines but it doesn't look very appealing. To restate the problem: can we achieve conditional compilation for operator-based language constructs like cerr << "hello"? The answer could be a do-nothing inline function and a C++ compiler optimization. But will this actually work? That's what we are going to test today. Our example will be a piece of tracing facility I've been playing with lately. One of the requirements were to be able to turn tracing completely off with zero overhead in resulting code. Also I didn't want to pose any notational burden on the user so I decided to provide an interface similar to the one in the iostream library: tout << "operator new (" << size << "): " << p; Even though the code snippet above looks very innocent, there is quite a lot of things going on under the hood. While the inter-workings of the tracing facility is not the topic of this essay, the number of actions performed under the hood is quite relevant to our discussion. Therefore, I am going to provide a quick overview of how everything works. There are two main object involved: a record and a stream. The records are traced into the stream: class record { public: // ... template record& operator<< (x const& arg); }; class stream { public: // ... stream& operator<< (record const& r); }; Having these definitions we can write something like this: stream tout; record r; r << "operator new (" << size << "): " << p; tout << r; Or even this: tout << (record () << "operator new (" << size << "): " << p); It is not exactly what we want, however. We would like the temporary record to be automatically created for us: class stream { // ... private: class mediator { public: mediator (stream& s) : s_ (s) { } ~mediator () { s_ << r_; } stream& s_; record r_; }; friend record& operator<< (mediator const& mc, char const* s) { mediator& m (const_cast (mc)); return m.r_ << s; } template friend record& operator<< (mediator const& m, x const& arg) { mediator& m (const_cast (mc)); return m.r_ << arg; } }; Do you see how this works? Let's start from a simple example and walk through it step-by-step: tout << "hello"; When the compiler sees this line, it must decide which operator<< to call. Let's see what choices it has: stream& stream:: operator << (record const& r); This one doesn't work since "hello" is of type char const [6], not record, and there is no conversion from char const[6] (or decayed char const*) to the type record. record& operator<< (mediator const& mc, char const* s); The second argument matches after decaying to char const*. The first formal argument is of type mediator, the stream can be implicitly converted to the type mediator (see mediator::mediator (stream&)). We've got the match. In order to make the call the compiler creates a temporary of type mediator and passes a const reference to it as the first actual argument. The generated code will be something equivalent to this: { mediator m (tout); operator<< (m, "hello"); } And our original example tout << "operator new (" << size << "): " << p; will be turned into this: { mediator m (tout); operator<< (m, "operator new (").operator<< (size). operator<< ("): ").operator<< (p); } The innocent looking piece of code turned out to do quite a lot: the compiler has to create the temporary (with all the constructors) and then call a number of functions each of which depends on the return value of the previous. Now let's go back to our zero-overhead problem: if we provide a do-nothing inlined implementation, will the compiler be able to optimize the whole thing away? Here is our zero-overhead implementation: class record { public: // ... template record& operator<< (x const& arg) { return *this; } }; class stream { // ... private: class mediator { public: mediator (stream& s) : s_ (s) { } ~mediator () { } stream& s_; record r_; }; friend record& operator<< (mediator const& mc, char const* s) { return const_cast (mc); } template friend record& operator<< (mediator const& m, x const& arg) { return const_cast (mc); } }; Even though it's a do-nothing implementation, we are still performing some initializations and return some values. Therefore, it's not quite obvious that the compiler will be able to figure out that all those actions don't produce anything. Our test case will be a simple function, assembler code of which we are going to inspect: stream tout; int bar (size_t size, void* p) { tout << "operator new (" << size << "): " << p; return 0; } Here is the assembler code for this function when compiled by g++ 3.4.0 with -O2: .globl _Z3barmPv .type _Z3barmPv, @function _Z3barmPv: .LFB1528: .L11: xorl %eax, %eax ret For comparison here is the same function but compiled with -g: .globl _Z3barmPv .type _Z3barmPv, @function _Z3barmPv: .LFB1496: .loc 2 26 0 pushq %rbp .LCFI6: movq %rsp, %rbp .LCFI7: pushq %rbx .LCFI8: subq $72, %rsp .LCFI9: movq %rdi, -24(%rbp) movq %rsi, -32(%rbp) .LBB6: .loc 2 27 0 leaq -64(%rbp), %rdi movl $tout, %esi call _ZN4cult5trace6stream8mediatorC1ERS1_ leaq -64(%rbp), %rdi movl $.LC1, %esi call _ZN4cult5tracelsERKNS0_6stream8mediatorEPKc movq %rax, %rdi leaq -24(%rbp), %rsi .LEHB0: call _ZN4cult5trace6recordlsImEERS1_RKT_ movq %rax, %rdi movl $.LC2, %esi call _ZN4cult5trace6recordlsIA4_cEERS1_RKT_ movq %rax, %rdi leaq -32(%rbp), %rsi call _ZN4cult5trace6recordlsIPvEERS1_RKT_ .LEHE0: jmp .L13 .L16: movq %rax, -72(%rbp) .L12: movq -72(%rbp), %rbx leaq -64(%rbp), %rdi call _ZN4cult5trace6stream8mediatorD1Ev movq %rbx, -72(%rbp) .L14: movq -72(%rbp), %rdi .LEHB1: call _Unwind_Resume .LEHE1: .L13: leaq -64(%rbp), %rdi call _ZN4cult5trace6stream8mediatorD1Ev .loc 2 28 0 movl $0, %eax .LBE6: .loc 2 29 0 addq $72, %rsp popq %rbx leave ret I also ran this test on Intel C++ with the same results. This shows that contemporary compilers are smart enough to make the technique of inlining code away practical. Keep in mind, however, that in order for this technique to work, the compiler should be able too see through function calls until elementary operations. In particular, if you have a call to a non-inline function as part of your expression there is nothing the compiler can do about it except making the call. To illustrate, consider this code fragment: stream tout; char const* foo (); int bar (size_t size, void* p) { tout << foo () << size << p; return 0; } When compiled by gcc 3.4.0 with -O2: .globl _Z3barmPv .type _Z3barmPv, @function _Z3barmPv: .LFB1527: subq $40, %rsp .LCFI0: movq tout(%rip), %rax movq $tout, (%rsp) movq %rax, 8(%rsp) movl tout+8(%rip), %eax movl %eax, 16(%rsp) .LEHB0: call _Z3foov .LEHE0: xorl %eax, %eax addq $40, %rsp ret This is because a C/C++ compiler cannot make any assumptions about arbitrary functions. Using GCC's function attributes we can specify that our function is "pure" and consequently can be called fewer times than the program says: char const* foo () __attribute__ ((pure)); With this hint GCC eliminates the call: .globl _Z3barmPv .type _Z3barmPv, @function _Z3barmPv: .LFB1528: .L11: xorl %eax, %eax ret If you have made it this far, thank you for your time. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 652 bytes Desc: Digital signature Url : http://www.kolpackov.net/pipermail/notes/attachments/20040701/0884b6ad/attachment-0001.bin