Saturday, December 6, 2008

Punk Rock Programming

Having had the privilege to experience a couple of days' worth of temporary tinnitus due to attending Marky Ramone's Blitzkrieg show (picture courtesy of rockXpress.ro), I've been thinking about a conversation I've had a few years ago with an ex-colleague about C and C++. Why? I'll tell you in a minute, but first: if you don't know who the Ramones are, give them a listen right now.

Now, my then colleague expressed his reservations about the continued use of the C programming language. He argued that C is now mostly a subset of C++, therefore C should probably be better off put away somewhere and forgotten. No more separate committees for ANSI C and ISO C++, resources spent more efficiently, and so on. The argument has some justice to it, and I'm sure the hardcore geek readers' eyes are gleaming with anticipation for a rainy weekend with a hot technical debate over whether compiling with one compiler for both programming styles should generate exception handlers for the low-level code by default, but that's really not what I want to talk about.

What I want to talk about is the human component to using a tool such as a guitar, or a programming language. Programmers tend to believe the myth that efficiency has to be desired at all costs, and by organizing the code, and indeed the workplace in a certain "One True (TM)" manner all will be well with the world. Incidentally, so did the Nazis.

But there's a big difference in the overall philosophy of most of the people who prefer C vs. the axioms of the people who prefer C++.

C++ is more like classical music. If you want to be a classical music composer, or simply a competent classical musical instrument player, you need to train a lot for dexterity, to master all the modes, to understand how intervals and modes relate to chords, to study the work of the great composers, to learn sight-reading music, and so on. So you have your Mozart, your Mahler, and your Yehudi Menuhin.

But, what if someone simply enjoys music, and wants to have fun with it? Make the best of what one can learn in a short time with limited resources, and express all there is to express with that. After all, my highschool psychology manual used to define one's level of intelligence as being directly related to one's capacity of doing more with less. So then, you have your Ramones, your Nirvana, your Stooges, your Mudhoney. Two-note chords, at most five distinct chords in a song. Cheap one-pickup guitars. But you react to it. Things get done. You're moved by it, not necessarily in the same directions as classical music would have taken you, but in directions that are complementary to classical music, and using the same sensory building blocks. We have art.

The snobs may not agree with the previous paragraph, but the Ramones and the Stooges are already considered classics, and if you think simple and direct is bad and you're still not convinced, I invite you to reflect on the way the blues is played: 7th chords, only three chords used in a song, and those three are always I-IV-V. And the snob-friendly Martin Scorsese has gone out of his way to produce a dedicated documentary.

C gets things done. It's almost assembly language, but then again it's not. It's direct access to your computer. And you can learn it from a less-than 100 pages book ("K&R"), as opposed to the ten times as thick "The C++ Programming Language".

Both languages have their place, and as long as they will be actively supported by their respective committees, and there will be people who prefer to go and see a Ramones tribute band over seeing Tosca, this will always be the case.

Saturday, October 11, 2008

The GPS Coordinates of Jung's Tower

As technology workers, we're all too familiar with pro-technology arguments. However, Donald Knuth, author of "The Art of Computer Programming", and king of the geeks, states that "he has been a happy man ever since January 1, 1990, when he no longer had an email address".

C. G. Jung, not a big hacker himself, but still the father of analytical psychology (aptly, also known as "Jungian psychology"), has this to tell us in his autobigraphical project "Memories, Dreams, Reflections", in a chapter called "The Tower":

"Our souls, as well as our bodies, are composed of individual elements which were all already present in the ranks of our ancestors. The 'newness' in the individual psyche is an endlessly varied recombination of age-old components. Body and soul therefore have an intensely historical character, and find no proper place in what is new, in things that have just come into being. That is to say, our ancestral components are only partly at home in such things. We are very far from having finished completely with the Middle Ages, classical antiquity, and primitivity, as our modern psyches pretend. Nevertheless, we have plunged down a cataract of progress which sweeps us on into the future with ever wilder violence the farther it takes us from our roots. Once the past has been breached, it is usually annihilated, and there is no stopping the forward motion. But it is precisely the loss of connection with the past, our uprootedness, which has given rise to the 'discontents' of civilization and to such a flurry and haste that we live more in the future and its chimerical promises of a golden age than in the present, with which our whole evolutionary background has not yet caught up. We rush impetuously into novelty, driven by a mounting sense of insufficiency, dissatisfaction, and restlessness. We no longer live on what we have, but on promises, no longer in the light of the present day, but in the darkness of the future, which, we expect, will at last bring the proper sunrise. We refuse to recognize that everything better is purchased at the price of something worse; that, for example, the hope of greater freedom is canceled out by increased enslavement to the state, not to speak of the terrible perils to which the most brilliant discoveries of science expose us. The less we understand of what our fathers and forefathers sought, the less we understand ourselves, and thus we help with all our might rob the individual of his roots and his guiding instincts, so that he becomes a particle in the mass, ruled only by what Nietzsche called the spirit of gravity.

Reforms by advances, that is, by new methods or gadgets, are of course impressive at first, but in the long run they are dubious and in any case dearly paid for. They by no means increase the contentment or happiness of people on the whole. Mostly, they are deceptive sweetenings of existence, like speedier communications which unpleasantly accelerate the tempo of life, and leave us with less time than ever before. Omnis festinatio ex parte diaboli est - all haste is of the devil, as the old masters used to say.

Reforms by retrogressions, on the other hand, are as a rule less expensive and in addition more lasting, for they return to the simpler, tried and tested ways of the past and make the sparsest use of newspapers, radio, television, and supposedly timesaving innovations."

My point? No such thing...

Saturday, July 19, 2008

Code Reuse: It Would Be a Very Good Idea

As you must have noticed, the title of this article paraphrases a Ghandi quote: When asked what he though of Western civilization, he famously replied "I think it would be a very good idea".

Code reuse in the software industry is an elusive beast. Words such as "reinventing the wheel" are often being paraded around by people across departments even remotely connected to computer programming, and everyone seems to be thoroughly convinced about the perils of re-writing code that's already been written, to solve problems that have already been solved.
Unless, we need to reuse code for our project.

It comes as no surprise to most seasoned software developers that code reuse almost never happens. That is not to say they endorse the situation. But it's just a fact of life. Why? That's a tough question to answer.

In today's fast-paced IT industry, software developers are being hired right out of highschool, or in their junior college years. They come from all possible backgrounds, and most of them have not properly studied their fields for the full time required to master their tools. They tend to do whatever learning they can on the job, and then float towards management positions as soon as they've mastered enough to finally be able to be of proper use.

A programming language such as C++ takes years to master. Add some time for learning how to use a control versioning system tool, some build tools, a couple of non-trivial source code editors, the quirks of a supporting OS, basic network administration, some APIs, and properly study a specific business domain, and you've got a good chunk of your life spent working to master your field. Garbage-collected, user-friendlier languages such as Java or C# take somewhat less time to learn, but still require considerable effort to master.

To be able to reuse code properly one needs to:
  1. have enough experience to know where the project will be going, and anticipate usable solutions. For example, your application might only need to be running on Linux systems for now, but why not be able to compile it on FreeBSD in the future? Making that decision upfront will save development time in the future, and impose some constraints on the supporting software frameworks, very often with little or no overhead in actual development.
  2. know as many tools (programming languages, build tools, operating systems, UML tools, etc.) as possible, or at least know about them, and be willing to properly study them if need arises.
  3. anticipate maintenance costs, and project risks. For example, it might pay off to use a stable open-source library maintained by other people if your team cannot afford to maintain it's own custom version. Your team can find bugs and send patches to the maintainers anyway, so debugging is still possible, but the bulk of the work is done by somebody else. That way, you will also have the benefit of other library users' experience - they will also find bugs and send patches, so by the next version there'll be more bugs fixed than there would have been if you were the only library users. This approach also pays off if your team has little or no resources available for testing this piece of software - you'll be testing it a bit by using it, and so will be it's other users: all in all a lot more testing than you could have done.
  4. have an open mind and realistic expectations about your team's competency level and schedule. This is usually very hard to do, because of the human factor: we all think we are the most competent, and since we are the only ones who really know our project, we are the only ones who deserve to write the code, and custom code is the best, and we don't need all that bloat that somebody else's library will bring into the project anyway. Sometimes, that is indeed the case, but for a huge percent of projects, it is absolutely not the case. We tend to soon discover that "that bloat" was there for good reason, and now we either need to switch to that implementation, or write our own "bloat" to compensate for the latest project changes. Obviously, your team really is the best - you're in the former category. Just keep this guideline in mind for your next job, though...
The conclusion? It almost always pays off to reuse code. Not only from your company, but not reusing company code is only excusable if it's very bad code, and then that's a lame excuse. So, if in doubt, reuse. Find a mature-enough product, look it up and see what support options you can use (forums, mailing lists, live technical support, knowledge bases, etc.), and start using it. You'll have more time to roll out your software on budget and on schedule, you'll avoid grunt work, you'll make the users happy, you'll make the reused software's maintaners happy, and in the end you'll make yourself happy. To continue with the paraphrases, "The Hitchhiker's Guide to the Galaxy" tells us: if you don't know enough about some seemingly cumbersome and unfriendly technology, DON'T PANIC. Extend some food on a stick towards it and see if it doesn't actually want to be your friend.

Thursday, July 3, 2008

A Memetic View on Project Management

Having recently discovered Jon Whitty, I've stumbled upon his critique of project management, with reference to the PMBoK:

"This paper makes no claim of uncovering a secret or hidden message, but it does profess to decipher the memetic code of project management (PM) to reveal the real reason why PM is so prevalent.

Why is PM so prevalent? Even though the discipline of PM is ubiquitous in Western society it exhibits many inexplicable and contradictory aspects. The prevalence of PM continues to increase across all business sectors and all geographical regions, with companies suggesting that projects are a vital contributor to future business success, and that projects are the key enabler of business change. PM is also consuming more of corporate training budgets than ever before. An increasing amount of Universities are also delivering PM courses at undergraduate and postgraduate level, and at least one corporation is supporting the teaching of PM at high-schools. The bulk of such training and teaching is modeled on the Project Management Institute (PMI®) Guide to the Project ManagementBody of Knowledge (PMBOK Guide).
However, despite the prevalence of PM, organizations report that project failure is commonplace and that the delivery of projects to cost, time and benefits is not improving. It is therefore valid to ask why PM is so widely and commonly occurring, accepted, and practiced, when it still fails to live up to expectations."

Whether the gentleman is right or not, I will not venture to say. What I will say, is this: his answer will certainly surprise you, and most likely completely change the way you've been thinking about the issue.

Wednesday, June 25, 2008

C++ System Programming Rule: Always Join() Threads Referencing External Objects

Let's assume you want to instantiate an object of class C in function f(). Function f() then proceeds to create a new thread, and passes a reference to this object to the new thread. The new thread uses the object, and then exits, but f() does not join() the new thread (that is, it does not wait and make sure that the new thread has finished running before f() itself exits). Since a snippet of pseudocode is worth a thousand words:
void thread_fn(const C& c)
{
c.perform();
}

void f()
{
C c;
system::create_thread(thread_fn, c);
}
Creating a thread and then not caring about when it stops seems to especially make sense with UNIX detached threads. But it almost never makes sense to use detached threads with C++. Because C++ programs tend to use objects. And C++ objects get destroyed.

Let's consider the following scenario:

1. F() creates the C instance c. F() then proceeds to create a new thread with the fictional system::create_thread() function. We will assume that this call also starts the new thread, and it runs the thread_fn() function in the new thread.

2. System::create_thread() has now exited. The new thread has been created, but thread_fn() is not yet running in this new thread. F() then exits, and in doing so destroys the C instance, c, the object that had been passed-by-reference to the new thread function.

3. Thread_fn() starts running, but it now has a reference to an already destroyed object.

What's the consequence of this? The program will abort. If perform() is a virtual member function of C, your program will most likely abort with an error along the lines of "pure virtual function call". That is actually what happens most of the time, and I've seen many senior developers scratching their heads in disbelief at the prospect of having had been able to instantiate a C++ abstract base class, which is what the problem appears to be at first sight.

So remember, join() is your friend. Do not allow your main thread to exit until you've join()ed all of the threads it has created. It's actually a good rule of thumb, regardless of the programming language you're using, but it especially applies to C++, where stack objects get destroyed when they go out of scope.

Friday, June 13, 2008

Open Source vs. Closed Source Projects: A Career Choice

Open source does not necessarily mean free software. My favourite CORBA ORB, Orbacus, is not free, but it's open source - that is, you also get the source code with the precompiled stuff. Conversely, CREdit, a lightweight, highly customizable text editor, is free software, but you can't get the source code for it.

This article is aimed at technical people who do not own their own software development company. One of the logical fallacies I've encountered most often in the way programmers think of their career path is that they tend to take the standpoint of the employer. And while it certainly is possible for one to advance one's career to where one can enjoy the dividends of one's labour beyond receiving one's wage, this is, more often than not, and for the most of us, not the case.

So, without further ado, working in open source projects is better because:
  1. it will improve your skills as a craftsman of software. It's a widely known fact that, all other things being equal, we perform better when we know that our peers are watching our creative process. Therefore, you will tend to write better code knowing that your peers will actually see it. With more competent people being able to see it, you will receive bug reports and patches faster, so you'll be able to learn from your mistakes faster and with less debugging.
  2. you will have visibility. Whereas in most closed source software development companies your work will be the work of "<insert your company name here> <insert your project name here> team", in open source projects your name as maintainer will most likely appear in the headers of all of the files you've authored. Thus, your abilities and contributions to the project will be self-evident.
  3. you will make competence-based contacts within the community, thus creating opportunities for meaningful future work, hopefully even skipping repetitive, boring steps such as job interviews (since your technical competency has already been proven).
  4. companies are bought and sold, become bankrupt, or simply decide to shut down whole departaments, because of apparently insignificant top management changes. License permitting, an open source project can continue it's life independently of the parent company's demise, so you'll be more likely to be able to continue your work regardless of the life span of your previous employer's business.
Happy job hunting!

Thursday, May 29, 2008

C/C++ System Programming Rule: If You Must Write Single-Threaded Network Servers, Always Prefer Poll() to Select()

Let's assume you're going to write a C/C++ network server application, and you need to decide how to implement the code responsible for accepting connections from clients, and serving the clients simultaneously after accepting the request. Typically, you'd do this in one of two ways:
  1. Write a multithreaded application, with one dedicated thread blocking in accept(). Once a connection has been established, take the new client socket handle, pass it to a worker thread and resume waiting for a new connection in accept().
  2. Use a single thread for accept()ing and processing client requests. If you'll look this method up, you'll find that most examples illustrate using select().
While conceptually there's nothing wrong with using select(), there's a serious problem with it's implementation that makes it unsuitable for real-world server applications that need to scale well. The problem is this:

"Macro: int FD_SETSIZE

The value of this macro is the maximum number of file descriptors that a fd_set object can hold information about. On systems with a fixed maximum number, FD_SETSIZE is at least that number. On some systems, including GNU, there is no absolute limit on the number of descriptors open, but this macro still has a constant value which controls the number of bits in an fd_set; if you get a file descriptor with a value as high as FD_SETSIZE, you cannot put that descriptor into an fd_set."

On my Slackware Linux system, FD_SETSIZE is #defined as 1024 in /usr/include/bits/typesizes.h. BUT, it's plausible that your application might want to be able to handle more than 1024 socket descriptors at one time. And once you get socket descriptor number 1025, put it in your fd_set and give that to select(), all bets are off.

Some systems will do the right thing if you #define your own FD_SETSIZE, but that's not crossplatform, and even worse, it's very error-prone. What about libraries you might be linking against, that have already been compiled using the default FD_SETSIZE value?

You might think that an application with more than 1024 sockets open at the same is probably not a good idea, and you'd probably be right. However, you don't even need to have more than 1024 socket handles active simultaneously. All you need to do to get to "there be dragons" territory is to have one socket handle with the value 1025 and try to use that in a select() fd_set.

The alternative? Meet your new friend poll().