It is known that non-recursive makefile architectures have a higher degree of parallelism than recursive ones. It is not clear just how much higher. This document presents results of benchmarking recursive and non-recursive build systems based on GNU make. Both build systems were used to build the same project under the same conditions.
Description of the benchmark, the build systems and hardware are in the Benchmark section, results of the benchmark are in the Results section, and finally some thoughts and interpretation of the results are presented in the Comments sections.
I believe the benchmark to be accurate; the conditions and tasks performed by both build systems are the same except where recursive vs non-recursive design comes to play. For example, both build systems use so-called advanced auto-dependency generation technique. If you find any inaccuracy or unfairness towards one build system or the other, please let me know and I will try to fix it.
The benchmark is based on the
source code of CIDL compiler. According to sloccount
,
it contains 277 source code files of which 102 are translation units with
a total of 27,589 SLOC (source lines of code). The source code is split over
15 directories with the translation-units-per-directory ratio being about 7.
The build systems being tested are a traditional recursive build system distributed as part of Utility library and a non-recursive build system called build
If you want to reproduce the benchmark you will need to install a few
third-party libraries (described in CCF/Documentation/Build.html
in archive above) as well as the build systems.
The tests were run on four different machines which are described below.
noatime
option.
distcc
build over 100MB
ethernet. Localhost is X2 and DISTCC_HOSTS="X2/2 X4/10"
. Both
3/rest and 1/rest splits were also tested with the results worse that the
2/rest split.This test consists of running make
on an up-to-date build
and measures the time it takes make
to discover the build is
up-to-date. You can view this result as the amount of time between when you
start make
and when the first compilation command starts
execution.
build machine | recursive | non-recursive |
---|---|---|
P1 | 1.79 | 1.57 |
X2 | 5.51 (-j 3) | 0.70 |
X4 | 1.04 (-j 3) | 0.80 |
This test measures the time it takes to build everything from scratch.
machine | recursive | non-recursive |
---|---|---|
P1 | 597.34 | 598.46 |
machine | -j | recursive | non-recursive |
---|---|---|---|
X2 | -j 2 | 178.91 | 165.80 |
-j 3 | 174.16 | 152.81 | |
-j 4 | 178.92 | 152.77 | |
-j 5 | 179.90 | 152.06 | |
-j 6 | 207.48 | 192.39 |
machine | -j | recursive | non-recursive |
---|---|---|---|
X4 | -j 4 | 177.30 | 123.75 |
-j 5 | 176.49 | 113.34 | |
-j 6 | 162.59 | 103.93 | |
-j 7 | 169.03 | 101.87 | |
-j 8 | 168.71 | 104.87 | |
-j 9 | 173.49 | 109.18 | |
-j 10 | 177.26 | 113.28 |
machine | -j | recursive | non-recursive |
---|---|---|---|
D6 | -j 6 | 108.39 | 80.10 |
-j 7 | 111.12 | 79.71 | |
-j 8 | 112.87 | 75.52 | |
-j 9 | 120.62 | 71.29 | |
-j 10 | 130.90 | 67.85 | |
-j 11 | 114.18 | 73.01 | |
-j 12 | 126.97 | 71.56 |
In the up-to-date check both build systems perform pretty close except in case of X2. My only explanation to such a poor performance of the recursive build system is 2.6 kernel and/or reiserfs. I would also expect a bigger difference in up-to-date check times between recursive and non-recursive build systems in larger projects and in projects with lower translation-units-per-directory ratio.
The results of the build from scratch benchmark on P1 indirectly proves that both build systems were tested in equivalent conditions and perform the same tasks.
Not surprisingly, the non-recursive build outperforms the recursive one on every parallel configuration. On X2 the non-recursive build takes about 87% of the recursive time. On X4 that time drops to 63% and on D6 to 61%.
What's surprising is the speed-up of the recursive build when moved from X4
to D6 being 32% (33% for the non-recursive build) in comparison to the
speed-up when moved from X2 to X4 being just 7% (33% for the non-recursive
build). I tend to think it is not the recursive build that scales well, but
rather the non-recursive one that does not. The potential distcc
bottlenecks (e.g., the fact that X4 depends on X2 for preprocessed material
as well as on saving the output) may result in flow control and consequently
in the reduction of parallelism. This reduction in some sense evens the
chances of the recursive and non-recursive builds. Supporting evidence to this
explanation is the load of the two hosts. In the non-recursive build the load
fluctuates significantly the first 10 seconds or so and then both boxes become
uniformly loaded. The recursive build never loads either of the two boxes to
100%, however the fluctuations are not as big as in the first 10 seconds of
the non-recursive build. It would be interesting to see the results with
reversed roles, i.e., X4 being the localhost.
It is also interesting to note that the highest performance of the
non-recursive build is achieved about two -j
points higher
than of the recursive build in equivalent conditions.
Copyright © 2004 Boris Kolpackov.
Last updated on Aug 21 2004