Booleans are boring. They can be true, they can be false, and that's all there is to say, right? Well... Yes and no.

### Selling the skin of a bear

Many programming languages provide some form of short-circuit evaluation when booleans are involved. The reasoning goes like this: When the first operand of an 'and' operation is false, we do not have to evaluate the second operand any more. We already know that the full answer will be false, so we can save ourselves some time and just ignore the second part of the expression. A similar trick can be applied to the 'or' operator.

This is not just a performance optimization. It is frequently used for writing very elegant and compact validation code. You know the drill:

if(n > 0 && total/n > 100) ...

The idea is that we want to calculate an average (the total divided by the number of entries), but we want to avoid division by zero. The first operand, left of the '&&', validates the value of n. The second operand is only executed if n is strictly positive, so we know that our division is safe.

This behavior is unique for booleans. Other types such as integers do not provide any short-circuiting operations:

amount := 0 total_price := amount * calculate_item_price()

The first operand of the multiplication is zero. We know that this will make the total price equal to zero as well, but this does not stop the program from fully evaluating the second operand. This can cause a lot of unnecessary silicon cycles, and even a modern compiler may have a hard time optimizing the call away (e.g. when the calculation has side-effects).

Short-circuiting is a form of lazy evaluation. If you think of the 'and' operator as a function, you could write it like this:

fn and(b1 : bool, b2 : bool) : bool good_deal := and(shoot_bear(), sell_skin())

But unlike normal function calls, this one does not evaluate all its operands before making the call. It postpones the evaluation of 'sell_skin()' until after it knows the exact outcome of 'shoot_bear()'. In languages without short-circuiting (and without support for lazy evaluation), you can only do this with an explicit 'if' or other control statement, not with a function call.

In the automatic migration tools we create at Anubex, a lot of effort goes into transforming or simulating such special behaviors.

### Small booleans, big booleans, booleans in boxes

In theory, a boolean can be stored in a single bit. This is as small and efficient as it gets. Unfortunately, reality is not that simple.

Consider COBOL. It does not have explicit support for booleans. Instead, you use a kind of enumeration instead, with only 2 allowed values:

77 my-boolean pic x. 88 true value "Y". 88 false value "N".

Basically, this stores a boolean as a Y/N character, which takes up more than 1 bit (depending on the platform, it could be anywhere between 4 and 32 bits).

In most modern programming languages, each value in memory must be addressable. This means that each value must be stored in at least a separate byte (because bytes are the smallest addressable units of memory). Padding can add even more storage (e.g. between a 1-byte boolean and a 4-byte integer), so you may end up losing 32 bits for only a single bit of value.

In languages like C# and Java, booleans are value types. They are copied around whenever you pass them as an argument or return them from a function. To pass a boolean by reference, you have to "box" it: Store it in an object (which serves as a kind of box) and then pass that object around by reference.

This consumes even more memory, because every object has additional data on board. Note that boxed booleans can evaluate to 3 values: true, false, and null.

### The name is implied

We all know our relational operators such as '>' (greater than) and '!=' (not equal to). They take two numbers (or strings, or other comparable items) and return a boolean result. Some legacy languages like COBOL allow you to write these operators in an abbreviated form. Readable or not, in COBOL you can replace this:

SCORE > 50 AND SCORE < 80

with this:

SCORE > 50 AND < 80

Shorter code, same result. In the second comparison, the name of the variable is implied. This seems like an innocent feature of the language, but it spells trouble for those who intend to develop COBOL parsers. Traditional parsing mechanisms (as embodied in popular tools such as Lex & Yacc) fail to "understand" the abbreviated syntax. They require help during parsing, or they require a post-processing step which "fixes" the syntax tree by injecting the variable name at the right place. For people who build migration tools for legacy languages, this is a classic challenge.

### A bit of algebra: the algebra of bits

Truth is essential, especially in mathematics. The vast body of mathematical axioms and inference rules has only 1 purpose: To derive only true statements. The "value" of any statement is either true or false, and mathematicians don't like the false ones. So just as in programming, booleans are at the foundation of math.

And yes, there is an entire field of mathematics dedicated to the study of booleans. We can only scratch the surface here, of course. Let's write 0 for false, and 1 for true. We define our two familiar binary operations by using truth tables:

and 0 1 0 0 0 1 0 1

or 0 1 0 0 1 1 1 1

These are familiar to programmers (except that they do not short-circuit in any way). What may be less familiar, is that these operations are best understood in terms of ordering: zero is less than one. Think of it this way: The 'and' operator actually takes the /minimum/ of its operands. If either of the parts is 0, the minimum is also 0. Similarly, the 'or' operator takes the /maximum/.

Even the mysterious "flipped-letter" operators ∀ (forall) and ∃ (exists) are basically just an extension of this idea. ∀ applies to a set of booleans, and takes their minimum. The result is 1, only if all the individual values are 1.

So the statement

∀ x: x > 5

means that you loop across all the elements, apply the relational operator, and then take the minimum over all the resulting boolean values. The ∃ operator gives us the maximum, which equals 1 if any of the individual values is 1.

Another operator is implication, often written as a double-arrow:

=> 0 1 0 1 1 1 0 1

In high school, I had a lot of trouble memorizing this truth table, because it feels counter-intuitive. Why should falsehood imply truth? It becomes a lot easier when you think of this operator as standing for "less than or equal to". Indeed, 0 <= 1. (It's ironic that we write this operator in exactly the opposite direction as the original.) The ordering of the bit values makes it easier to memorize the table. It also leads to a new interpretation: when X implies Y, this means that X is somehow "less true" or "less certain" than Y.

Math keeps leading us to ever greater truths.

### Migrations & More

Some programming languages allow you to use integers as booleans. Some treat 0 as false, and any non-zero value as true. But there are languages, believe it or not, in which all even values are false, and all odd values are true.

Other languages treat an empty string as false, but some also treat the strings "0" or "false" as false (but not necessarily the string "False"). These subtle differences can cause quite a bit of pain for developers.

When migrating code from one language to another, all of these difficulties and more need to be taken into account.

Only a well-tested tool suite can do that.