Saturday, July 30, 2011

Rule 0 - Do It !

So, I've already listed some rules about programming (do I need to recall you that the most important one is don't trust the maxims) but I recently come up with a new one, some kind of rule 0, the rule above the rules. So, the rule is pretty simple:
Do It !
Simple isn't it ?

So what do I mean by Do It ? There are several topic where this rule can apply, I'll cover what seems to be the most important to me: programming learning, fixing bugs and rewriting/refactoring.

For the beginner:
Learning programming (or even learning a new programming language) can be separate in two parts: theoretical principles, syntax and structure studying and practicing.

Many beginner falls into the over-studying symptom: they spent a lot of time trying to understand inner concepts and to recall every specific syntax construction before practicing. But when it comes to code, they just do the same mistakes and face the same issues as if they haven't spend a lot of time to understand how it works.

Why ? Just think of it: if I explain to you how it is to ride a bicycle, I can explain you a lot of things, even show various videos and schema, but until you get yourself on a real bike and fall face to the ground, you won't be able to ride it.

Programming is no exception. You need to practice in order to learn. A damn good example are pointers. Pointers are a mandatory concept in many languages and probably in many traditional  algorithms, so a programmer can not avoid pointers manipulations. When facing it for the first time, they appears as strange entities with complex usage and semantics: they are value by them selves but they also seems to be containers, you can obtain a pointer from various operations (referencing a variable, allocating memory … ) All of this with various syntax a special cases (the worst of all is to begin to learn pointers with different languages using different syntax, say Pascal and C … )

The fact is simply that, you need to practice to learn. I've test a lot of way to explain pointers concepts to various students and never find a better way than practicing. The usual box and arrow diagram always lake important point. Explaining the idea of pointer as array indices for the big array that is memory won' help more. But a bunch of pointer and pointer's arithmetic exercises will always do the job.

Practicing break out another beginner's symptom: the fear of the difficulty. When facing a new kind of problems, measuring the real difficulty of solving it is quite hard. Trying to solve it in your head or with some useless schema on a sheet of paper won't help you, you should give it a try. A good example is writing a memory allocator (a pretty good exercise to understand pointers and memory management), if you had never try it before, you can't figure out how one should do it. But, as you begin to code, you'll find that there's no mystery nor magic and (if you stay on the simple path, of course, good generic and efficient allocators are not so simple) you'll see that the task is really affordable.

So, to summarize: don't waste your time over-thinking your code, just code, do it !

Bug fixing:
When building a projects there's always two possible tasks: continue to the next missing features and fix the existing ones. So should I prefer push the project to the some fixed bounds (achieving a functionality goal) or should I fix issues in what was already written.

My advice is to never postpone bug fix, even if it's a minor one. Bugs, incomplete cases and things like that are bombs awaiting to blow you all works down. There are several reason not to postpone corrections:

  1. It's simpler to fix something when the code is still fresh in your head
  2. The bug fix may induce deeper modifications than it seems
  3. Building a new part of a project on rotten bases will always end up in a big crash
You can extend this idea to real bug fixes: that is, you should not let ugly workaround sit in your base code. Workaround are far pernicious than bug in the sense that they hide the error without (most of the time) really fixing it.

To illustrate this part I'll give a personal example: I'm actually working on an experimental language (mostly like a C variant but with a cleaner syntax and some syntactic sugar), the project began when it appears that we can't escape the use of the C language for writing kernels. We give a try to a lot of languages and face up various issues, mainly because modern languages are oriented toward userland programming, not kernel programming. After a long and trollish discussion on our mailing list, I came up with idea of rewriting the C syntax and extend it with various comfort syntactic sugar and high levels extensions.

So, I lay down a quick version of an AST (Abstract Syntax Tree) for a kind of C language, write up a pretty printer for it and then a parser. When it was clear that the idea works (and facing enthusiastic messages on the list), I go the next step and write down type checking. At that point, the original AST was not completely suited in the sense that it was an immutable value (I was using OCaml for the project) but I found a way to circumvent the difficulties and continue along this axis.

We then begin to write code generation (thanks to the fact that we were writing a compiler-to-compiler rather than a real compiler this wasn't so hard in the first place.) But as the project evolves, my badly designed AST give us more and more difficulties. And then, one day (after almost a year of code), it appears to me that if we want to push the project to its end, I must change my data structure. So, how to do it ? the task seems unaffordable since the AST was the skeleton of the project.

I take out my favorite text editor, allow myself five minutes of reflexion, and decide that if I can't rewrite parsing and typing within a few days, I'll give up and try to fix the original code. Guess what ? A week later, 90% of the project was rewritten and some parts was even more advanced than in the original version ! Of course, the first two days was a total rush, I came up writing 5000 lines of code in 48 hours ! But, thanks to the new design and to experiences gathered on the previous work, most of the task was straightforward.

Two weeks later (yes, this a really recent story ;), our compiler is now more functional than the previous version !

What did I learn ? First that I'm still capable of writing very big amount of code in few days (this good for my self esteem ;) Second, that rewriting is far simpler than it may seems at first glance. There are several reason you should give a try to rewriting:
  1. Previous work wasn't a waste of time, it points you out where difficulties are
  2. A better design and data structure will greatly simplify the rest of the code
  3. If you fail, you can still go back to the previous work
  4. There's always be external part that can be reused
  5. If you feel the need to do it, things gonna be worst if you postpone it !
Of course, rewriting is not always the only solution: this example is quite extreme in a sense that the whole project was based on the part that I want to change, and many things inside the project was to be rewritten.

Refactoring is, most of the time, a better way to fix bad design. The main idea is to rewrite your code chunks by chunks but still preserving the state of the project. There's a lot of patterns to do that (you can find good books on the subject, Marc Espie, which is a person I consider as a real expert programmer, pointed me out with the book: "Refactoring" by Martin Fowler, you should probably read it, I will as soon as I find times for it.) 

So, my new rule, Do It !, is all about going straight to the point rather than wasting your time in useless preparation or postponing important fix or rewrite. There's no better way to do something than actually doing it !

Saturday, July 9, 2011

Functional Programming ?

Do you know functional programming ?

Functional programming (FP) is an interesting subject for a lot of reasons. Those of us that have used (and still used) functional programming languages  (FPL) often pointed out two disappointing facts:

  1. Lots of programmers doesn't really understand what is it
  2. Despite the fact that FP and FPL are pleasant to use but also quite efficient, it is still not largely used

Am I a FP fan ? the quick answer is "no more", the longest one is more complicated. I have used OCaml for now more than 10 years, I've found it very useful and pleasant many time. But, I've also found a lot of unsatisfactory points. It is hard to really explain why, but on bigger project, the absence of constraints on global program structure often leads to unreadable code, some syntax choices are annoying ... Other FPL haven't convince me yet ...

So, here are my humble opinion (how I hate this expression ... )

First, I'll attempt to define FP and what should be an FPL. You should right now forget the erroneous idea that FP is about recursion and recursive functions !
FP arise from early days of formal languages (before computers in fact), exactly λ-calculus is (at least for me) the starting point of FP. So what is the main characteristic of λ-calculus ? Function as first class value ! In fact, functions are the only value in λ-calculus !
First class value are language entities that can be passed (and returned) to (by) functions. So, λ-calculus can directly manipulate functions, this is what we call higher-order. So, the first requirement of a FPL is higher-order.

λ-calculus, being a mathematical model, have no notion of mutable entities and side effects (in fact mutable and side effects can be expressed in pure functional semantics.) Thus, languages inspired by λ-calculus try to avoid these notions to match the original model, but it's a matter of style. 

Another aspect of FP and FPL is the notion of everything evaluate to a value: we should always return something and there's no difference between statement and expression as in imperative languages. This is strongly linked with the absence of side-effect: you don't modify a state, you return a new value.

What about loops in FP ? The fact that in FP we prefer recursion over loops, is a consequences of the everything is an expression: a loop does not return a value. But, FOR-loops (i.e. bounded iterations) can exist without mutable entities and thus fit in the model (we can somehow handle the loop returns nothing, since in FPL nothing can be something) on the other hand, integrating WHILE-loops in the semantics of a pure FPL is far more difficult (we need to express a changing state.)

So, to summarize: an FPL is language with higher-order where everything is expression and where we prefer to limit mutable entities and side-effect. Everything else is comes from those facts (partial evaluation and curryfied functions, recursion rather than loop and even lazy evaluation.)

Is there good FPL ? This is hard and dangerous question. I tend to prefer strongly typed FPL, and thus my list of good FPL will be limited to the ML family (and derived languages.) My favorite is OCaml for various reason: I learnt FP with it, it is not too purist (in OCaml you have imperative aspects and even objects), it has a good module approach (functors ... ) and resulting code works well. I found Haskell interesting but too extremist to my taste. Lisp and scheme derivatives haven't convince me (but again, I prefer typed languages ... )

Interesting FPL features: except for pure FP aspects, many FPL have interesting features.:

  • Pattern Matching: this probably the most impressive features, it lets you describe complex cases and extract values. Combined with variants, it offers the best way to traverse abstract syntax trees.
  • Module systems: to organize your code, module are very useful and some FPL provides very powerful framework. For example, the module system of OCaml lets you define nested modules, independant module interfaces and the powerful functors (function over modules.)
  • Lazy evaluation: lazy values is mix between functions and expressions, as functions it is not evaluated when defined, but as expressions, it is evaluated only once. This lets you define expressions that will be evaluated only when needed but then resulting value is memoized. A good
    example is infinite generated lists: elements of the lists are generated only when traversed, but then you don't need to compute it for each access.

Now, how about the lack of users of FPL ? Since, I'm myself a disappointed user of FPL, I can understand the relatively low audience of FPL.

First of all, most FPL are research projects and as such are never quite yet finished: the language changes too often, priority is put on new features rather than production necessity ... Thus, it is hard to develop a long and stable project using an FPL.
On the side of the languages them selves, there are also some disturbing points. Even compiled FPL have an original syntax designed for interpretors, thus program written with those syntax are often badly organized. For example, there's no explicit entry point in OCaml, you must carefully organize your code so that the execution flow go directly where you want.

From a syntaxic point of view, language history has shown that language with explicitly delimited blocks and instructions have a better impact than too permissive one. Most FPL are syntactically permissive languages.

So, I think that despite being very interesting programming languages, FPL are not a good choice for practical reason. These reasons are not tied to the FP model but to the implementations of this model. If we want to see FPL regarded as usable as other main languages some day, we need a more stable language with something like a C syntax.