I was recently drawn into another discussion about a claim that project Foo was faster than project Bar because Foo is written in C (or maybe C++) and Bar is written in Java. In my experience, as a long-time kernel programmer and as someone who often codes in C even when there are almost certainly better choices, such claims are practically always false. The speed at which a particular piece of code executes only has a significant effect if your program can find something else to do after that piece is done – in other words, if your program is CPU-bound and/or well parallelized. Most programs are neither. The great majority of programs fit into one or more of the following categories.
- I/O-bound. Completing a unit of work earlier just means waiting longer for the next block/message.
- Memory-bound. Completing a unit of work earlier just means more time spent thrashing the virtual-memory system.
- Synchronization-bound (i.e. non-parallel). Completing a unit of work earlier just means waiting longer for another thread to release a lock or signal an event – and for the subsequent context switch.
- Algorithm-bound. There’s plenty of other work to do, and the program can get to it immediately, but it’s wasted work because a better algorithm would have avoided it altogether. We did all learn in school why better algorithms matter more than micro-optimization, didn’t we?
If you look at this excellent list of performance problems based on real-world observation, you’ll see that most of the problems mentioned (except #5) fit this characterization and wouldn’t be solved by using a different language. It’s possible to run many synchronization-bound programs on one piece of hardware, with or without virtualization, but the fewer resources these programs share the more likely it becomes that you’ll just become memory-bound instead. On the flip side, if a program is purely disk-bound or memory-bound then you can obtain more of those resources by distributing work across many machines, but if you don’t know how to implement distributed systems well you’ll probably just become network-bound or synchronization-bound. In fact, the class of programs that exhibit high sensitivity to network latency – a combination of I/O-boundedness and synchronization-boundedness – is large and growing.
So, you have a program that uses efficient algorithms with a well-parallelized implementation, and it’s neither I/O-bound nor memory-bound. Will it be faster in C? Yes, it very well might. It might also be faster in Fortran, which is why many continue to use it for scientific computation but that hardly makes it a good choice for more general use. Everyone thinks they’re writing the most performance-critical code in the world, but in reality maybe one in twenty programmers are writing code where anything short of the most egregious bloat and carelessness will affect the performance of the system overall. (Unfortunately, egregious bloat and carelessness are quite common.) There are good reasons for many of those one in twenty to be writing their code in C, but even then most of the reasons might not be straight-line performance. JIT code can be quite competitive with statically compiled code, and even better in many cases, once it has warmed up, but performance-critical code often has to be not only fast but predictable. GC pauses, JIT delays, and unpredictable context-switch behavior all make such languages unsuitable for truly performance-critical tasks, and many of those effects remain in the runtime libraries or frameworks/idioms even when the code is compiled. Similarly, performance-critical code often needs to interact closely with other code that’s already written in C, and avoiding “impedance mismatches” is important. Most importantly, almost all programmers need to be concerned with making their code run well on multiple processors. I’d even argue that the main reason kernel code tends to be efficient is not because it’s written in C but because it’s written with parallelism and reentrancy in mind, by people who understand those issues. A lot of code is faster not because it’s written in C but for the same reasons that it’s written in C. It’s common cause, not cause and effect. The most common cause of all is that C code tends to be written by people who have actually lived outside the Java reality-distortion bubble and been forced to learn how to write efficient code (which they could then do in Java but no longer care to).
For those other nineteen out of twenty programmers who are not implementing kernels or embedded systems or those few pieces of user-level infrastructure such as web servers (web applications don’t count) where these concerns matter, the focus should be on programmer productivity, not machine cycles. “Horizontal scalability” might seem like a euphemism for “throw more hardware at it” and I’ve been conditioned to abhor that as much as anyone, but hyper-optimization is only a reasonable alternative when you have a long time to do it. Especially at startups, VC-funded or otherwise, you probably won’t. Focus on stability and features first, scalability and manageability second, per-unit performance last of all, because if you don’t take care of the first two nobody will care about the third. If you’re bogged down chasing memory leaks or implementing data/control structures that already exist in other languages instead of on better algorithms or new features, you’re spending your time on the wrong things. Writing code in C(++) won’t magically make it faster where it counts, across a whole multi-processor (and possibly multi-node) system, and even if it did that might be missing the point. Compare results, not approaches.
Hey Reddit users, if you want to try something less than two years old, how about today’s post? Thanks!
I agree completely. In fact, at this point I’ve decided that even “performance critical” code can be written in (gasp!) Java, and I believe there are real productivity advantages to doing so. However, I think there is one other reason that C (and to a lesser extent, C++) programs end up being faster: The expensive things are painful, so programmers try to avoid them. For example, dynamic memory allocation is a pain, so you avoid doing it. In Java, on the other hand, creating many objects is typical, even when caching and re-using objects can *sometimes* be significantly faster. In higher level languages, the performance cost and programmer cost are not as strongly aligned as in C.
I agree with the observation that the importance of low-level performance is diminishing in a world where you are waiting for network/database/… I/O anyway. Given that there are usually far safer (or appropriate) languages around for a given task, it’s often worth trading (invisible) performance for safety. Personally, I dislike C++ somewhat, although I require it almost daily for CPU-bound work. My compromise is using Prolog/Ruby/Python for most code, and C++ where necessary for performance.
And yes, experience does matter. There’s even a lot of code that does unnecessary copying, because the author never learned about (const) reference passing.
Cliff Click has an excellent discussion on this topic http://www.azulsystems.com/blog/cliff-click/2009-09-06-java-vs-c-performanceagain .
$ time ./hello # compiled with a C compiler
Hello
real 0m0.004s
user 0m0.000s
sys 0m0.002s
$ time ./hello # compiled with a C++ compiler
Hello
real 0m0.011s
user 0m0.002s
sys 0m0.005s
$ time java hello
Hello
real 0m0.166s
user 0m0.124s
sys 0m0.028s
The problem is that in practice, it does make a difference. Yes it’s not cause-and-effect, it’s shared cause, but what does it really matter ? As a general indication programs written in C and C++ will be faster. Why ? Cause they’re not written by idiots.
The concentration of idiots writing C (/C++) code is a lot less than among java/php/visual basic/c# … programmers. This combined with the fact that idiots outnumber every other group of human beings means that there are exponentially more idiotically slow java/php/vb/c# applications than c/c++ applications.
This is not to say that there aren’t a lot more great java programmers than there are great C/C++ programmers. But while great C/C++ programmers are maybe 1% of all C/C++ programmers, great java programmers are 0.001% of all java programmers. There are even great vb programmers. That means that, say 1% of all C code you’ll encounter will be intelligently written, while java code … *shudder*, and let’s just not talk about the chances of finding quality vb code.
But C/C++ are hardly alone in this. Languages like, say F#, clojure, and to a lesser extent python also have a lot less idiotic programs, because the only reason you’d even know a language like that is if you’re interested in programming, not just doing it for a quick and easy job (or worse : you were forced into it by your employer, which is true for close to 80% of java programmers at my previous job).
Additionally C is a bad language. Writing truly large software in C is an exceptional accomplishment, while large java programs are a dime a dozen. Now obviously small software runs faster than large overly generalized software, nothing strange about that.
Interesting Evan. So what your saying is that in C/C++ dynamic memory management is annoying, so you avoid it… (perhaps relying on the stack more? RAII seems to imply this)…
Honestly tho, I think if you have a good reference counting smart pointer (like the kind coming in C++0x (or like the kind I’ve been using since 1999)) it’s really not hard at all…
Every programming language choice for any project has a cost. For Java, there’s the unknown internals of the language that could cause performance issues – Java isn’t as ‘down to the metal’ as C is, so knowing what the hardware is doing is difficult from a Java programmer’s point of view. However, Java is cheap when it comes to programmer hours. A huge multi-threaded app can be written by a single developer in a few days, so there’s a cost associated to the man hours involved.
Also, most programs are somewhat CPU-bound, somewhat I/O-bound and somewhat memory-bound, etc … The larger the app, the more of these issues need to be handled by any app in any language. The OS will help you in limited ways by read-aheads and write-caching and pre-fetching instructions, but the app eventually will have to wait for *something* and be inefficient.
My general rule is to write the first version of everything in a language that you can prototype quickly in. Throw it together. Then, rewrite the pieces that need to be high-performance in C or whatever you want to write it in, and tweak the small, re-written bits for I/O if you need to (increase read/write buffers for your tiny app, so it’s resource-friendly). When it’s done, you’ll have a system that ‘efficient enough’ and it’ll use the most appropriate tool for the task and will be pretty good with resources in the end. You can’t ever hope for an *ultimate* solution, but you can get something that’s ‘good enough’ and that’s usually what makes the cut.
Just my $0.02.
Also, people who manage to write non-trivial software in C generally know what they’re doing.
What will happen is that the application would be written in a language that provides a high level of abstraction – like Python or Ruby. It is then deployed and profiled. The worst performing code and the most heavily used code are identified. These are then optimized and written in a lower level language like C. Both, Python and Ruby actively support this mixing and matching which results.
The programmer of tomorrow will be a polyglot that is able to use the right language at the right abstraction !
Absolutely. The distributed version control system Mercurial is written in slow Python, but is very explicit in using high-performing IO operations, thus giving the entire app speed similar to C. Likewise the Varnish web cache is explicitly knowledgeable about how virtual memory works, and thus runs rings around Squid. (I think they’re both in C.)
http://www.linuxinsight.com/ols2006_towards_a_better_scm_revlog_and_mercurial.html
http://deserialized.com/reverse-proxy-performance-varnish-vs-squid-part-2/
I agree completely.
No. You fail at programming.
The fact it’s not CPU bound is actually the biggest reason *why* Java is slower. Java is exceedingly memory intensive – it’s in fact the SECOND WORST language in alioth language shootout for memory INCLUDING THE SCRIPTING LANGUAGES.
This memory consumption doesn’t just hit you on RAM, it hits your cache, your I/O, your heap, your load times, and your CPU pipeline. This is why Java programs fail on benchmarks. So long as Java continues to insist on creating lookup tables for bloody everything, it will not be capable of running as fast as other languages. You shouldn’t need a instantiate a class to write hello world.
Simple advise always helped me: 1) Write the program in a HL (i.e. “slow”) language as best possible. 2) Idenfity troublesome CPU intensive inner loops 3) use your languages FFI of choice and replace the inner loop with a couple lines of native (C or otherwise) code.
Memory-bound problems can very often run faster in C than in other languages, because you have control over HOW the memory is accessed. i.e. you can structure your memory access to minimize cache misses. Most people regard this as a micro-optimization, but it’s not uncommon (literally) to achieve 5-10x performance improvements by careful cache usage.
“it’s faster because it’s C” true that. but JavaScript is faster.
Java < C < JavaScript
Java can be as missions critical fast as it needs to get. Sure it can hog memory but that’s a one time thing in most cases which is NOT a problem as uptime is more of the worry. It also depends on how well you want to know Java. Sure in C you can write a hella fast hello world without knowing much of anything, but honestly are you any better off? The benchmarks of course show worse Java on small problems but writing Hello World isn’t a problem. If it is, you need to take the next step in your career.
One issue with Java that matters a lot for general purpose programs is their start-up latency.
Java may be as fast as C at solving a large differential equation, but Java-applications “feel” a lot slower than their C/C++ counterparts.
Interesting post. Not everyone will agree though.
I predict a programming language flame war.
Most high performance financial trading systems are written in C.
If you ever get the chance to hear a presentation from the LMax team I would recommend it – their ultra high performance trading engine is written in Java by assembler programmers. (http://bit.ly/6yjtP3)
@Ransom Java doesn’t hog memory if it never allocates it in the first place ;-)
You’re making the pretty big assumption that the system being programmed for has things like virtual memory and a kernel. There’s a whole world of embedded programming out there where energy efficiency and mass-scale cost concerns push CPU speed down to single-digit megahertz or lower. Saying things like “Most importantly, almost all programmers need to be concerned with making their code run well on multiple processors.” is ridiculous. Maybe almost all programmers working in desktop or web applications have to worry about multiple cores, but those of us pushing out all the tiny brains in the electronic gadgets that infest a modern life have a very different set of constraints.
Nope, sorry Nic, but even the majority of embedded programmers do need to worry about parallelism. I know there are still lots of extremely simple microcontrollers out there that might not even have things like multiply and divide instructions, but shipping lots of units is not the same as employing lots of programmers. The majority of embedded-system programmers operate in a milieu not too far from that of commodity-system programmers, on systems that do have multiple processors. A pretty good number of them have been dealing with multi-processor systems, including heterogeneous multi-processor systems, since well before such things were common in the mainstream. I made no assumptions about things like virtual memory and kernels, and even explicitly called out embedded systems as an example of where using C makes sense. Please don’t contribute to the stereotype of embedded-systems programmers as ostriches who reflexively look down their noses at anything more modern than the 4004. I’ve known too many highly skilled and creative embedded-system folks over the years who would consider your perspective just as narrow and unrepresentative as you seem to find mine.
Yet another programmer who is not able to code in C or C++ and trying to find an excuse to stay comfy with java
Dan, fails on benchmarks? No. while no huge Java fan myself, notice how those same benchmarks show Java’s speed over even languages like C/C++ because Java can allocate and GC memory 10x faster than malloc and free. It’s a trade off.
You are all just proving his point. Solid work folks.
I want to point out that on a desktop, the sooner the program is done, the sooner another program can use the cpu. I don’t care the slightest bit that your program is fast enough on your desktop if it makes movies skip on mine.
That’s not to say that “write it in c” is the answer, for all the reason’s you already described. But performance still matters, even if the program isn’t cpu bound.
@Nikolai,
If that was your interpretation, you may want to retake any sort of class that teaches graphs.
http://shootout.alioth.debian.org/u32q/benchmark.php?test=all&lang=java&lang2=gcc
What are your thoughts about Objective-C as an extension of C rather than C++?
I’ve coded C++ since it was invented and C for a large number of years prior to that. I *can* write very low level code in C and ASM, but I tend to do most things in Java these days because there’s very little that I do that requires machine level access. This includes writing OpenGL games in Java which need to perform well to be any good.
Any informed programmer can write fast code in Java.
I agree that a larger percentage of C programmers are probably well informed about how the machine works. Likewise, there’s probably more clueful Java programmers than Javascript or Visual Basic programmers. But the central issue is not which language you use, but having programmers which understand performance. If there’s some part of your code which cannot be done fast in whatever language you use (this has never happened for me in Java, BTW) you can usually plug in something coded in a lower level language (via a DLL, for example)
There are a couple of areas where I think Java is less appropriate to use than C – anything where the startup cost of Java is an issue – like a command line application that needs to runs many times a second. Or on a very memory constrained device. Fortunately, most meaningful apps on home computers have neither of those concerns.
@Nikolai, what’s your point? malloc / free aren’t the only way to allocate memory in C
Well, it’s not that C is automatically “faster”, it’s that C gives you enough flexibility to do the things you need to do in order to make things fast. Other languages all have various pitfalls and limitations. There are things you simply can’t write fast in Java, and there are other things that are really tough to get to run fast in Java.
However, C is a pretty awful and poorly designed language. It is quite possible to have languages that have the same flexibility and performance characteristics of C without its problems. Unfortunately, C has driven them all to extinction over the last couple of decades.
The relative merits of Java vs C/C++ is not so simple. It is not simply a language issue, but also an issue of the accompanying runtime environment. One could write an interpreter or a JIT for an intermediate language to run C programs. Likewise, one can use static compilation for Java. There is also the issue of safety.
There is a tradeoff between using GC and not. A properly written GC provides a safer programming environment and faster development time for some overhead at runtime. The amount of runtime depends on how well the code is written. A long running program written in C that really needs to uses malloc and free is likely to fail at some point due to fragmentation, unless perhaps the programmer is extremely careful about object allocation. Fortunately, the computational complexity of many long running applications is such that dynamic memory allocation is not really required. For example, operating systems do not do much data processing; this is left to application code.
Garbage collection technology has improved in the last 20 years to the point where it is possible to build a deterministic garbage collector. In fact, our JamaicaVM is being used in time critical systems today along with statically compiled Java code. JIT is just the wrong solution for many embedded applications, not only because of indeterminism, but also due to memory bload. It requires caring around a compile in every application. That does not mean a JIT does not have its place. It is great to have when there is a need to load code dynamically.
Performance is not the only issue for embedded systems. For many systems correctness is at least as important. Not only does GC and strong typing improve robustness for complex application, the syntax and semantics of Java is more conducive to formal analysis. Complex, multithreaded programs are difficult to test thoroughly. Techniques such as data flow analysis and deductive reasoning can now be used on Java programs to help ensure programming correctness. There where some analysis tools for Ada, but nothing like what one can now do with Java.
Writing formal analysis tools (as opposed to most of the informal one now on the market) for C and C++ is for masochists! The preprocessor always causes trouble. In fact, the use of a preprocessor, instead of including macro expansion and compiler directive in the language itself, was probably the biggest mistake in C, which has now propagated to C++. The other nighmare is the unrestricted nature of pointer manipulation, which makes writing a correct GC for C, as well as more advance analysis, next to impossible.
I have written large programs in a few different languages and I must say Scheme is still a favorite of mine. There use to be a dialect called T that had an amazingly good compiler for Sparc and 68K, often beating C code for raw performance. Unfortunately, the CG was not so good. Now, to take T and add static typing and a modern GC would be fun. Java has many warts, but given what is commercially available and viable, it is a good choice for all but the smallest devices where the essential task has the computational complexity of a finite state machine. As compiler and linker technology improves, the performance of Java code will equal and often surpass C/C++. We had these same performance arguments about C vs assembler, and assembler lost.
The vast majority of free-software applications that I use are written in C or C++, because although Java is easier to program, it is a memory and CPU hog. The only apps I used that were written in Java are Azureus and javac. Javac is magnitudes slower than “jikes” a fast java compiler written in C.
I recently compared perl and C for a simple task, reading a CSV file. The C code was more than 10 times faster, even though the perl code was using a CSV reader library written in C. I am 100% sure that Java will be similarly much slower than C for this task and every other task that involves computation not just waiting.
People who say compiled C code is better than well-written assembler are smoking the same dope as the person who wrote this article. Hand-written assembler is much much faster than the junk your C compiler spits out. The C compiler usually doesn’t even know if two objects are aliased or not. And equivalent C programs run faster than Java programs.
Excellent article. This reminds me of that ridiculous drama at work where a high level management software was to be developed from scratch and the dinosaurs in the team argued that it would be best done in C++(!) while even a moderately experienced programmer such as I could instantly recognize that it would be best to implement it in Java, since it was definitely an I/O and Memory bound application and implementing it C++ would be a veritable nightmare. Logic did not seem to apply and the seniors began to unnecessarily use their seniority card. I was almost jaded by the whole experience when surprisingly upper management (who do have technical backgrounds) intervened and literally grilled the Technological Luddites over the Eternal Fire of Logic. End result? We decided to go with Java. This has almost entirely restored my confidence in the industry!
Thanks for writing this terse sensible and precise blog. Looking forward to reading more of your writing.
Code is like bricks in a building. The important thing is how you stack your bricks, not what kind of brick you are using. Fast is great only if fast can happen on time and within budget. else the only person you are going to impress is yourself.
For those that remember RT11,RSTS et.al. It is pretty obvious that the ability to design efficient and economical code evaporated with the advent of MSDOS. Both DEC and Acorn machines handled interrupts MUCH better than the ‘PC’ and THAT was the main problem one used to encounter when trying to use a ‘PC’ for anything ‘industrial’ If one DOES have a process that needs to be ‘nippy’ then quite often the only real way forward is to allocate (or even produce) a separate bit of hardware to do the time-critical bits.
Yes, C/C++ is faster than java, however it really depends how the program is written. The java is a platform, which means the java application is not a machine code, instead compiled application is script machine code for the platform. This means java application have more limited option for direct micro-optimization. Or in short if you are familiar with the compiler and optimizer, it is possible (but extremely difficult) to calculate what CPU instruction will be produced by the compiler after optimization process and thus to write code in such a way that will help to produce faster code. In java that is not possible, because the final code does not only runs on the java platform and that produces additional instruction to the processor to read the code and decide what to do (while CPU machine code has a hardware mechanism for that), but the dynamic memory and garbage collector require even more time.
So to write a program in java is faster than C/C++, but the program itself will be faster when it is written in C/C++.
As some folks have mentioned, usually a career C programmer understands lower-level programming better than your typical Java developer. Couple that with the additional software layer added by the JVM, and C compiled code should be at least a little faster. Java doesn’t have to be faster than C, though. The critical advantage of Java is it’s cross-platform support. All of this reiterates the old saying “right tool for the job”.
More the abstraction, slower is the speed. Moreover, Java has to run inside a virtual machine! It will never match the speed of a C program but there are other feature of Java that C can never match. I guess, it is all usage based. You just need to onok to use what you need and when you need it.
well, calm down everyone. ‘I do this’ I do that’ ‘ I think the other’. blah blah blah.
we all do different stuff in different ways, with different aptitudes and preferences.
Programming is NOT a religion . Dogma is not healthy.
A lot of boilercode libraries written in C/C++ share some distinct features
- feature complete – for every need you can find a really robust C library .
- fairly good, optimized code – even pedestrian C code runs on computer nowadays (out of order execution)
- cross-platform !! and still the same performance
- deterministic memory usage patterns
- usable in nearly all programming languages via wrappers
- app startup within milliseconds (shared libraries)
- code shared between processes for .so and .dll
Usual java app
- third party code, normally not the same quality as cross-platform C code . Sometimes not feature-complete.
- only runs on one platform (the java platform)
- a lot of runtime-dependent code (“string coding”) like loading of spring application contexts gives the JIT compiler a hard time and can only be resolved at runtime very very late.
- as working set of data increases, hidden memory leaks start to pop up (third party libraries).
- start up times take 100-1000 times longer than C++
- no code shared between processes for jars
- no optimization for cpu data caches
- no real platform specific APIs, only compromises
I have seen some really great Java Solutions – no doubt. Since most of the time we are doing web programming these days, Java is a great solution. Its also fast. Eclipse is nice.
But the core C/C++ libraries available on the platform or on cross-platform level are rock solid.
Argh, not this debate again. My few cence are:
Asm to C is a small loss cost of performance and control with a large gain in productivity.
Jump from C to C++ is a odd one because it can be no loss of control or speed and a gain in productivity. BUT it’s a wide language and following the path of productivity starts to cost control and performance. This is partly because leaving the metal so far away leaves people ignorant of it, the same people hate C++ as they don’t understand some problems they hit. But another part of the problem is that the language is so wide and expressive, it can be hard for even seasoned programmers to know exactly what all the code is doing just by reading it.
From C++ to Java/.NET seams to me to be a disaster as ignorance of how computers and operating systems work is rife, and the control is greatly reduced. Add to that the memory costs of a JIT’ed memory managed environment, and well the result are out there to see. Worse then just being slow themselves, they slow the whole system down as there is less memory to go around.
Script languages are different because no one expects them to be fast, just expressive, so people don’t do anything speed critical in them. They tend to be relatively light on memory as they are interpreted and use just referenced counted memory management.
So for me, the best environment is Python and C. One for expressiveness, one for speed/control. Unfortunately most of my career C++ puts bread on the table.
Well written !
I like to emphasize the need for algorithm optimization. Using a so called fast language with weak data structures may lead to a prototype, which fails miserable on real data. Using a strong structure or a data container written as a class is a real help here.
I had this with a developer emphasizing the not optimize early startegy. This is ok for the loop constants, but using a N^3 algorithm is not covered by this rule.
The tests took 20 seconds with a 1000 entries. This is ok for a prototype.
The real data set was 100 thousand entries. Since the real machine was 10 times faster than the test machine and the imports could be splitted to 20000 entry packets, we got away with a black eye.
Try writing an imagae editor in C# or Java. Simple algorithms to fill a closed area by a certain color takes much more time on JIT softwares than on precompiled ones. Any system that relies heavily on data structures (Lots of Linked Lists, Stacks, Arrays, Binary Trees, etc) will have a performance hit with JIT softwares.
The one thorough comparison I saw of languages (Prechelt, 2000) although somewhat dated, reflected just that: C, C++, and Java were in term of medians, quite comparable in with respect to performance, with Java somewhat higher in terms of memory consumption. (probably not true anymore) The area where Java outpaced C, C++, Perl, Python, and TCL was development time: Java took significantly longer to turn specs into code then any other language Prechelt tested, in particular C and script languages. Java’s top times were actually off the chart that were so far out. Those time left out, Java still took much longer that C, and marginally longer than C++.
A different perspective for you, from a corporate UNIX systems administrator. I manage your hardware, OS, and environment.
When I see a Java process consuming 400mb of RAM and chewing CPUs like there is no tomorrow, almost without exception, I know two things: 1] It isn’t productively using the resources it has, *AND* 2] The programmer responsible for the mess has absolutely no idea how to make it behave better. That, and is usually they have more than one Java process that is running and behaving badly.
When I see compiled C code consuming 400mb of RAM and chewing CPUs like there is no tomorrow, almost without exception, I know two things: either 1] the process is doing a great deal of real work, *OR* 2] the programmer responsible for the mess will be able to make significant headway in cleaning up their act.
I don’t know if this is the result of the language itself, the kinds of people those languages attract, or the ability to debug the performance/resource problems inside the language. All I know is that if there is a resource hog, and it is running in Java, it is no longer a programming issue. It becomes a hardware issue of adding more CPU and memory to fix what cannot be done in code.
You can’t just assume “a program” is I/O bound or CPU bound. You have to look at individual parts of the program, and most programs have parts that are CPU bound… that is, they really do work faster when you use them on a faster computer… or, to look at it another way… if you use a slower computer, you can tell it’s slower even if most of the time any given program is waiting on you.
When your code is in Java, or C#, or any other p-code based language, even if it’s got a really good just-in-time interpreter, it’s like you’re running it on a computer 2-5 times slower. And you CAN tell. Just as you can tell when you use a computer that’s got a 5 year old Celeron instead of your usual Core Duo.
Sure, you might be able to make it faster by using a better algorithm, but when you write it in Java you’re probably going to use the same algorithm. Otherwise, you know, why didn’t you use that better but trickier to implement algorithm in C?
I really am shocked at all the comments I’m seeing here about C being a poorly designed language. It has flaws but it is small and compact and easy to keep everything about C syntax straight in your head.
As for Java and C++ the languages are overflowing with hideous complexity that boggles many programmers who attempt to build something fast and efficient. Java is too slow and uses too much memory, too verbose, inflicts type erasure on those who use generics, etc. C++ is so complex that everyone is forced to use a subset of the language or ignore the safe subsets at their peril. In short, C is flawed but superior.
Excellent post. I especially like the categories you’ve come up with: cpu, I/O, memory, synchronization and algorithm. It would be interesting to see how problems tend to distribute in these categories within any given industry.
The good thing about a statement like “It’s faster because it’s C” is that it lets you know, instantly, that whoever uttered it is either not terribly thoughtful or not being serious.
It’s hard to be so honest with yourself that you can recognize that there are many cases where all the skills you’ve spent the past X years honing are not all that important in many contexts. Who’s happy to admit that they’re one of the 19 in 20 programmers for whom productivity greatly trumps execution speed when they’d like to be one of the 1 in 20 programmers where it matters?
@Dan Thanks for the tip on reading graphs! You are teh best!
Take a look at benchmarks that remove the startup time and see:
http://www.w3sys.com/pages.meta/benchmarks.html
Java fares very well against C++.
But I am looking at your graph, too. While the C/C++ runs nominally faster for mathematical benchmarks such as the alioth benchmarks, Java fares better against the C/C++ code than almost any other language in the list. And when you move away from mathematical benchmarks to more typical throughput/transaction-oriented applications you will see Java kick C/C++’s ass because its memory model is flat (which means it is a memory pig, too). It’s a trade off. C++ optimizes for ALL routes, which means compromises. Java has the benefit of JIT and HotSpot which can optimize at runtime for the ACTUAL route.
Total nonsense. Interpreting bytes codes via an app written in C or C++ (i.e. the JVM) won’t be as fast as the same app written directly in C or C++.
That said, I can understand the confusion since most app slowness I’ve ever seen is due to unoptimized algorithms, whether that be by design or implementation flaws. A good algorithm written in Java may can easily beat a poor one written in C/C++. Java has had a really good (optimized) library for more years than C++, upon which to build programs.
Static compilation of Java to native code is possible without losing compatibility, provided of course the runtime includes a JIT or interpreter for handling classes that were not known to the static compiler. Statically compiled Java ranks very well against C/C++: http://www.stefankrause.net/wp/?p=9 (Disclosure: The link points to an independent third-party comparison that includes a product for the vendor of which I work.)
@horsh All you’re testing there is the start-up costs of the app, which for most non-trivial apps is irrelevant. Java clearly loses that race because it has to load the JVM. Its pretty easy to come up with a contrived example that show C/C++ is slower than Java, example below. My point being choose the language most appropriate for what you’re doing. Every language has its downfalls, if you insist one language is better than another in every case you’re not looking at the problem objectively.
#include
int main(int argc, char** argv){
long double total = 0.0;
for (int i = 0; i < 10000; i++) {
for (int j = 0; j < 10000; j++) {
total += i * j;
total *= .12345;
}
}
printf("Total : %LFn", total);
}
(Same loop in Java, omitted)
#time ./a.out.c (gcc 4.4.3)
Total : 14080608.383443
real 0m1.018s
user 0m1.016s
sys 0m0.000s
#time ./a.out.cpp (g++ 4.4.3)
Total : 14080608.383443
real 0m1.039s
user 0m1.032s
sys 0m0.004s
#time java Test (java 1.6.0_20)
Total : 1.4080608383442638E7
real 0m0.559s
user 0m0.488s
sys 0m0.028s
As I recall, in terms of performance, the choice of algorithm is the most important.
As for why C is faster than java or C++:
- Simpler language. Makes it easier for a compiler programmer to understand it. And thus, add optimizations on the resulting machine code.
- Contrary to java and .net. It runs natively. Even with just in time compilation, VM are slower.
- Less indirection.
- Less memory allocations/de-allocations. Mostly because the programmer needs to keep track of this.
- Closer to what you code is what you get. Making the programmer aware of whats going on.
- Is very easy to figure out what the compiler will produce at assembler level.
- Pointers are not hidden or disguised.
- Clear separation between code and data.
- No exception mechanism. Exceptions take time, and are too often misused. In C, a programmer tends to check the return values of functions, thus more aware of possible errors.
- VERY easy to convert functions to assembler if you need more performance.
Overall, I think that the most important thing C provides is awareness of what your program is doing (what you code is what you get) and of all the things that can go wrong with your program. However, it does it at the expense of programmers convenience.
I agree with horsh and Sam Watkins. ( you guys must be actual programmers!)
For the rest of you guys: try to differentiate between what your mind makes up and what is real…
If this was categorically the case, then we would care about cpu-bound code. How often do you see a person running a P90 because, heck, who cares about cpu-bound code. No, people get faster CPUs for this very reason. And since Intel stopped making CPUs faster, the entire industry has gone topsy-turvy trying to figure out the best way to make software more parallel — for the very reason that there’s so much CPU bound work to be done.
I think that’s drastically oversimplifying the situation there, Kenneth. Have you heard the aphorism that programs will expand to fill the memory available? Same with CPU speed. Processors get faster, programmers get sloppier – not just in their choice of language but in other ways as well. There’s part of the reason why people keep getting faster machines. Another is that they’re increasing capabilities other than CPU speed, such as memory or I/O. It’s still true that the average machine spends the vast majority of its time waiting for the user, which is easily seen by looking at the percentage of the time that frequency scaling and sleep modes are able to kick in – and that’s been shown for both desktops and servers BTW. Whenever that’s the case, you’re not CPU bound. Machines with one tenth of the CPU power and one tenth of the memory of what you and I have on our desktops *should* be quite fast enough for many tasks, and were in their day. It’s nice to have the extra horsepower for when you do want to transcode video or something, but the program doing that is only one of dozens that you use consciously (and hundreds more that most users consider part of the OS). Then there’s virtualization, but that’s a whole different discussion.
Just to be clear, because others also seem determined to twist what I’ve been saying to suit their own ends, I’ve never said that CPU-bound code is so rare that normal users never encounter it, nor that people shouldn’t use C. What I’m saying is that the great majority of code that people write isn’t CPU-bound, and that writing all of an application in C is often inappropriate. People should attempt to discover *empirically*, via profiling, which parts of their programs really are CPU-bound and optimize *only those parts* in C – or even in assembler if need be. I’ve done plenty of both. In another window right now, I’m working on a big chunk of code that’s in C because it has to push a lot of data in a lot of directions and I’ve dealt with enough of the other bottlenecks that now it’s CPU-bound. I’m doing it because I had a demonstrable need, not out of habit. Technikhil almost hit the nail on the head saying that the programmer of tomorrow will be a polyglot able to use the right language at the right abstraction, except that I’d say even the programmer of today should be that way. Optimize the hell out of that inner video-transcoding library/module, by all means, but that’s no excuse for writing even the UI in the same program that way.
I’ve used C, C++, and Java. Don’t use a hammer on a screw. The reason we have different tools is that we face different problems at different times. All in all, I prefer to program in C under linux or on a bare micro, but for a really big job, C++ may be a better tool. Do you have to squeeze the last few percent out of the CPU cycles and RAM, or do you have an impossible schedule to meet? Are you trying to do video analytics at a high frame rate on mega-pixel security cameras? Maybe Java isn’t the best bet here. (That’s what I’m working on now in C.) A child with a hammer views everything as a nail, an expert has the correct tool for every job. Assess what your constraints are likely to be, and select the correct tool.
@Nikolai – Why are you pointing to such out-of-date measurements when Dan has shown you up-to-date measurements of larger programming tasks?
Note that for the more mathematical benchmarks n-body and spectral-norm, Java is closer to the C measurements.
Note that for the less mathematical benchmarks regex-dna and reverse-complement, Java is further from the C measurements.
In my experience C++ is overall faster but more flexible languages (java, c#) will allow you to refactor for speed with minimum effort. So it is not unlikely that a software written in c/c++ will end being slower than an equivalent written in a higher level language because the latter can adapt faster to the changes that will come.
Designing a non trivial piece of software having optimization in mind from day 1 is a difficult and dangerous task.
I personally like a lot the F# approach: your bricks are not the object/classes but rather the functions/features and a function is easier to optimize for speed/multicore later or even port to a different language if really needed.
Its been said by a few but I think its worth stating again:
Many times in these discussions the fact that Java runs on a JVM is overlooked. Having a fully performant (even in critical environments) Java application is not just a matter of writing good Java code (highly parallel, concurrent, etc) but also *tuning* the JVM running that app for the specific performance criteria you have. Each situation is different (available memory, disk arrays, etc) and the JVM needs to be tweeked to get the best. Furthermore several JVM brands exist, each being designed to meet differenent performance requirements.
I believe if you’re choosing a language solely on the premise it will allow you to write faster code you have bigger problems to worry about! Nearly all the modern languages are Turing Complete so can all describe solutions to the same set of problems, one of the major criteria for choosing a language should be how efficient the expression of the solution of your problem is. That, and what you have to work with (existing infrastructure, tools, technology, skills). To say I’m a programmer and I only write programs in C cos its the best is the same as finding a carpenter who uses a sandpaper to make an entire dining table – use the tool best suited for the situation, end of.
oelewapperke : i totally agree with you! great quote!
why does everyone say that software development is more “productive” using java?
where is the empirical evidence?
Most of you fail to recognize this blog’s plot – entirely.
There are the simple minded, who call $ time java hello” and say: yeah, java is slow”, not recognizing they’re also measuring JVM startup and not recognizing that well below a second is actually _fast enough_ to start a program that just gives some output for almost any user there is in the planet.
There are people bragging about how many memory Java consumes or generally slow, ignoring that memory and speed is usually a question of the JVM and not recognizing that Java is actually what drives Android, obviously _memory efficient and fast enough_ to handle the limits of mobile phones.
There are people venting over garbage-collectors, not recognizing that there are actually garbage collector libraries for C/C++ which greatly improve the speed over the standard malloc and free calls (most of them fail to recognize why that is, well, they better dive into more detail about garbage collectors).
The bottom line is:
It doesn’t matter if executing code written $language is faster than code written in $language2, as long it works well enough for the user. The user doesn’t care if he has to wait 0.01 seconds more for getting his result. As long as the user gets what he needs, it is more important that the code is largely understood, maintainable (easy to read and modify), reasonably fast to write wihle not leaving to omuch space for error (like doing memory management on hands and feet) and the program can be well deployed to a large number of systems.
Using $language may gain you a few percentage of extra speed, but most programs suffer more from wrong algorithms and bad scalability due to bad design.
Plus: It is entirely okay to trade some memory for speed and vice versa, as soon as one of them becomes a problem. So writing a program in $language just for performance reason hardly ever makes sense.
So do you all a favour and use the languages and APIs that serve you best to get the user what he wants – as apposed to what you think is cool. Really, putting effort into saving 2 microseconds on displaying a table is not cool, it is stupid. Likewise chosing a language for the same reason is something people should be fired for.
Kind regards
I once knew a colleague who quipped ‘writing in C is like building a staircase without a stair rail – one slip and you are in trouble’
You need to try out the magnificent programming language Erlang: http://en.wikipedia.org/wiki/Erlang_(programming_language)
I trust it is not too late to recall the famous quip:
“C++ is so buggy that there’s even a bug in the name”.
Can any C++ coders find it?
It’s funny how the more things change, the more they stay the same. It used to be said that hand coded assembler was faster than C. I generally get 90-99% of what can be done in assembler from standard C. And with C you get portability, architecture independence and it’s a lot easier to drop in a new algorithm or library.
> Processors get faster, programmers get sloppier…
The so called, “May’s Law” or the old adage “a job swells to fill the time given it”.
> The average machine spends the vast majority of its time waiting for the user, which is easily seen by looking at the percentage of the time that frequency scaling and sleep modes are able to kick in – and that’s been shown for both desktops and servers BTW.
If you look at the innards of a CPU, when running optimized code, the computational units are usually idle. Look at the ALU in a CPU and compare to some user code profile. It’s easy to see that it’s not fully utilized or even close (except for highly specialized code that generally runs for short periods of time). Take that all the way through the system at every level and see that there is a lot of down time,even when things seem very busy.
Truth be told, if all code were fully optimized, we would probably still be using 16 bit machines that execute directly out of main memory…
Cheers,
– Rick ;^)
This discussion is quite similar to some of the NoSQL vs. relational database debate: if you hide the details some folks will always hang themselves, but if you expose out all the details it is hard to be productive.
And then occasionally we hit inflection points.
I find hard to believe that there’s still people that tries to justify that Java or any other memory managed language is faster or more efficient than C when most of them were actually coded in C, so plain simply, if your managed language excel in any way pretty much that’s cause C and a team of clever C programmers. JAVA and C# or whatever manage language you use is just a sloppy frameworks designed to take load from the programmer and let them focus on productivity, and if your cost is productivity and time to market then probably using a managed language is better for you and or your business, but please, don’t insult skilled C programmers, if the quality of the algorithm is the same on both C and JAVA, ofc it will be faster in C.
It’s not as simple as you make it out to be, anonme. In a modern JVM most running code is compiled, only just in time instead of all up front. That means the main difference is going to be the quality of the generated code. Java is sometimes easier to optimize for, due to the lower incidence of pointer aliasing which is a major optimization-breaker for a lot of C code. Also, a JIT compiler can take advantage of run-time information to optimize based on which functions or code paths are invoked more often, so it can often avoid instruction-pipeline bubbles better than its traditional counterpart. (The old MIPS C compilers used to do this too, BTW, but hardly anyone has since.) For these reasons and others, it’s not at all accurate to say that the C version must be faster. People who use Java do sacrifice rapid startup and predictability, and many of them do give up a lot of performance because of idioms and frameworks that have grown around the language, but those are all different things. Besides, the main point of the article is that the efficiency of the code while it’s running is often less important than how often it ends up not running at all because it’s waiting for I/O or because it’s unnecessarily single-threaded, etc. A good programmer can write fast code in Java, and a mediocre one can write slow code in C.
Wake me up when c/c++ can handle 60M (18 nsec mean latency) inter-thread messages with all benefits of java language and ecosystem…
http://lmax-exchange.github.com/disruptor/
Been using java on critical financial apps…no problem so far. it excels on any context the legacy c/c++ realtime services.
Java hogs Memory? Never heard offheap techniques?
Jeff, that post about SEDA pattern years ago. Disruptor is the SEDA done right!