Bitwise Operators and Specification-compliant Integer Overflow Optimizations

If you use the upcoming UInteger32.isEven or isOdd methods, you’ll notice that it uses a bitwise AND operation. The reason, as described in a previous post, is because it improves performance.

However, while this is straightforward for all other integer wrapper classes, UInteger32 is an exception. According to ECMAScript 3 11.10:

The production A : A @B, where @ is one of the bitwise operators in the productions above, is evaluated as follows:

  1. Evaluate A.
  2. Call GetValue(Result(1)).
  3. Evaluate B.
  4. Call GetValue(Result(3)).
  5. Call ToInt32(Result(2)).
  6. Call ToInt32(Result(4)).
  7. Apply the bitwise operator @ to Result(5) and Result(6). The result is a signed 32 bit integer.
  8. Return Result(7).

The operands (and, thus, the result type) for the bitwise AND operation in ECMAScript are converted to 32-bit signed integers. System.UInteger32 represents an unsigned 32-bit integer.

This is inconvenient because we’d obviously have to fall back to the slower modulus operation for isEven/isOdd on UInteger32. Unless…

((Math.pow(2, 32) + 1) >>> 0 | 0) & 1 // true
((Math.pow(2, 32) + 2) >>> 0 | 0) & 1 // false

We can take advantage of overflow behavior. (Note: Since we’re able to get the correct result by leveraging overflow behavior, we actually don’t perform the extraneous zero-fill right shift as illustrated in the example.)

This is completely safe because:

A) Mathematically, in base 2, the last bit will always be 1 for odd numbers and 0 for even numbers… no matter how big the number is.

B) Bitwise AND will compare both bits in equal-length binary forms. Thus, no matter how big the number is, when you AND against 1, it will always be 0000001 (or zero-padded to whatever length is needed). Therefore, all the preceding bits don’t matter because they will always be ANDed against a zero bit. The only bit that matters is the trailing bit; see A for why this will always work.

Design Patterns: The Template Method Pattern and Final Methods

C# got it wrong. Java got it right.

C# doesn’t let you “seal” methods that aren’t overridden. Java allows you to mark any method as “final” – whether it’s an overridden method or not. This subtle detail has interesting implications.

The Gang of Four’s “Design Patterns: Elements of Reusable Object-Oriented Software” was published in 1994—many years before C# was first released and just two years before Java was released. (Although, most of the code in the book was presented in C++.) The book presents the “Template Method” behavioral pattern. Template methods define the “framework” of an algorithm and leave subclasses to define the behavior. Unfortunately, if you cannot mark a template method as ‘final’ (or ‘sealed’ in C#), it enables subclasses to—potentially by accident—change the behavior of the template method.

Here’s a template method in action, written in JS++, to demonstrate how this might be undesirable:

abstract class Model
{
	abstract public bool validate();
	abstract protected void saveTemplate(IDatabase db);
	public final void save(IDatabase db) {
		if (!this.validate()) {
			throw new ValidationException();
		}

		this.saveTemplate(db);
	}
}

In the above class, the template method (“save”) is marked as ‘final’. Without ‘final’, an overriding method (or overwriting method, to use JS++ terminology) can potentially forget to perform business object validation before altering the database. This can lead to potential bugs and security threats.

Beginning with the next release of JS++, you will be able to mark any method as ‘final’.

Design Notes: Why isn’t System.Array.length an ‘unsigned int’?

It makes sense, doesn’t it? Array sizes should be unsigned because they can never be negative. Yet, JS++ chose to make System.Array<T>.length return a signed 32-bit integer (int). We’ve discussed this internally, and the underlying reasons are not so simple.

JavaScript

The most important reason is that this is a bug in JavaScript and ECMAScript 3’s original design. ECMAScript 3 15.4 specifies that Array#length is an unsigned 32-bit integer. However, it gets a bit tricky when you view these method signatures:

  • T[] Array#slice(int start) (ES3 15.4.4.10)
  • T[] Array#slice(int start, int end) (ES3 15.4.4.10)
  • T[] Array#splice(int startIndex) (ES3 15.4.4.12)
  • T[] splice(int startIndex, int deleteCount) (ES3 15.4.4.12)
  • T[] splice(int startIndex, int deleteCount, ...T replaceElements) (ES3 15.4.4.12)
  • int Array#indexOf(T element) (ES5 15.4.4.14)
  • int Array#indexOf(T element, int startingIndex) (ES5 15.4.4.14)
  • int Array#lastIndexOf(T element) (ES5 15.4.4.15)
  • int Array#lastIndexOf(T element, int endingIndex) (ES5 15.4.4.15)

All of the above deal with array indexes as signed 32-bit integers even though the specification clearly states array lengths are unsigned. Specifically, if we indexed arrays using unsigned int, we would break JavaScript’s indexOf and lastIndexOf (because they return -1 when the element is not found). This gets further complicated because Array#push and Array#unshift, which return Array#length, return unsigned 32-bit integers.

Just know that I brought the proposal forward internally for indexing arrays as unsigned int, but I shut down my own proposal after the self-realization that it would break indexOf and lastIndexOf — it was just unacceptable.

In other words, we were handicapped by JavaScript in our design (as we often are).

Java and C#

A lot of website backends are written in Java, C#, PHP, and – nowadays – JavaScript. JavaScript and PHP are dynamically-typed, so you don’t have to worry about signed/unsigned, but this brings me to Java and C#.

Java doesn’t have unsigned integer types. I actually feel like this can be a good design decision in some ways. It makes reverse array iteration intuitive and obvious: just flip the logic for forward random-access iteration around. Likewise, in C#, List<T>.Count returns a signed integer (32-bit). Just as in Java, reverse iteration with a for loop is just flipping the logic around.

With signed integers, you don’t have to worry about integer overflow. If you perform forward iteration with:

for (int i = 0; i < list.Count; ++i);

Then, intuitively, reverse iteration might look like:

for (int i = list.Count - 1; i >= 0; --i);

Of course, this won't work for C/C++ because, on the final iteration, you get integer overflow.

Once again, in dynamic languages like JavaScript, you don't even have to worry about such things. It was all abstracted away by dynamic typing.

Reverse Array Iteration

Reverse array iteration over unsigned types becomes non-trivial. Anyone that has done this in C/C++ will know what I mean. The correct way to do it is to do it in a way that takes integer overflow into account. In C and C++, array sizes are unsigned, and C doesn't have C++ reverse iterators. Here's the code in C:

int arr[3] = { 1, 2, 3 };
size_t len = sizeof(arr)/sizeof(arr[0]);

for (size_t i = len; i --> 0;) {
    printf("%d\n", arr[i]);
}

So you initialize to the length of the array (without subtracting 1) and i --> 0 is better formatted as (i--) > 0. Thus, inside the loop body, you will only access - at most - length - 1 and it will count down until zero.

However, this isn't intuitive unless you come from a C/C++ background, and most C/C++ programmers are not web developers.

Conclusion

Reverse iteration in for loops may or may not be intuitive for you I didn't want users tearing their hair out over a basic programming exercise of iterating over an array backwards. Coupled with the fact that ECMAScript 3's original design was buggy, it only made sense to use int instead of unsigned int to avoid breaking old code from JavaScript.

Oh, and int is just so much more pleasant to type than unsigned int with casts everywhere.

Scaling JS++: Abstraction, Performance, and Readability

Sometimes, people will argue that C++ scales better than C for development teams and large projects because it has classes. While code organization is certainly helpful, Linus Torvalds will argue it is unnecessary because there are other ways of achieving code organization – such as prefixing function names with a “namespace.” JS++ is influenced a lot by C++ and Bjarne Stroustrup’s philosophies. While most people point to classes as the reason for C++’s success and scalability, there is a more subtle reason it scales so well: readability.

The C++ STL provides a level of abstraction without sacrificing performance. Stroustrup said he wanted std::vector to be as fast as C arrays. You can implement these details yourself, but why not just use std::vector?

While working on the JS++ Standard Library, Stroustrup’s philosophies have once again come into play.

As an example, think about converting a JavaScript number from decimal (base 10) to hexadecimal (base 16). Can you think – off the top of your head – how it would be done? There is a certain pleasure derived when you can read through someone else’s code as clearly and effortlessly as you might be able to read a book. We enforce these standards internally at Onux, and I’m going to reveal how it influences the design of JS++.

Getting back to the example, this is how it’s done in JavaScript:

var x = 97;
x.toString(16);

I’ve argued that we need to deprecate JavaScript’s Number.prototype.toString(base). Instead, we are proposing a toBase(unsigned int base) method to convert from base 10 (decimal) to a specified arbitrary base. Consider the readability of the following functions:

x.toString(16);
x.toBase(16);

Both functions do the same thing, but I’ve argued that no programmer will ever have to look up what toBase(16) means versus toString(16). The beauty of a compiled language is that we have full control over optimization. Of course, in JavaScript, you can do this:

function toBase(number, base) {
    return number.toString(base);
}

All else being equal, and you care about performance, wouldn’t you much rather have this?

x.toString(base);

In JS++, we can perform this exact optimization in a process known as “function inlining”. Here’s an example from Wikipedia. In other words, even though JS++’s toBase(16) is clearer than toString(16), there is no loss of performance because toBase just gets compiled to toString with no additional function calls; the toBase method basically doesn’t even exist in the final generated code.

We can take this one step further. Converting from base 10 (decimal) to base 16 (hexadecimal) is quite common. Why not provide a Standard Library function to do this?

char x = `a`;
x.toHex();

In this case, toHex() is just an alias for toBase(16), which is in turn an alias for toString(16). With each layer of abstraction, we gain more clarity and readability in our code. This makes it easy for others to work on our code, it makes it easy to scale development for teams, and – most of all – each layer of abstraction results in no performance loss. In fact, toHex() just gets compiled to toString(16).

Think for a moment. You’re tired, groggy, and unmotivated. Would you rather read code that looks like toString(16) or toHex()? This readability gain has only been focused on 1-3 lines of code so far. What we want to achieve with JS++ will expand across your entire code base.

When I meet programmers and they say they have problems with “spaghetti code”, I almost always guess (correctly) that they’re working at a JavaScript house. Classes – by themselves – won’t stop developers from writing unreadable code. We need abstraction… without sacrificing performance. This is what’s coming with the JS++ Standard Library. Here’s to a better future!

Compiler Architecture: GLR Parsing and Disambiguation

Today I want to talk about GLR parsing and the internals of the JS++ parser.

The Problem

In JS++, there is the potential for code to be “ambiguous”. For instance, consider the following example:

Foo<bar> baz;

There are two interpretations for the above statement:

1. A comparison operation: Foo is less than bar is greater than baz.
2. A variable declaration with type Foo<bar> (where Foo is a generic type with type argument bar)

Since JS++ is a superset of the JavaScript programming language, we would naturally expect the first case since JS++ inherited this from JavaScript. However, in order to achieve a concise syntax for generic types, we also need to consider how we can enable the second case.

Continue reading “Compiler Architecture: GLR Parsing and Disambiguation”

Optional Types: ‘null’ without NullPointerException

Optional (nullable) types will be coming to JS++. This usually stokes fears of null pointer dereferencing. For example, in Java, this is known as a “NullPointerException”.

In fact, Turing Award winner Tony Hoare calls it his billion-dollar mistake:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn’t resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

However, it’s possible to have nulls in a language without ever having NullPointerExceptions.

void doStuff(Foo foo) {
    foo.bar(); // Error, potentially null
}

In the above example, foo can be either an instance of Foo or null at runtime. This creates the potential for dereferencing a null pointer. This is an easy fix though – we just check that foo is not null:

void doStuff(Foo foo) {
    if (foo != null) {
        foo.bar();
    }
}

By checking that foo is not null, we can guarantee foo is at least an instance of class Foo – thus making it very straightforward for us to check that the bar method exists. This is a process known as type refinement. What’s interesting is that the compiler is able to actually ensure you’ve made the check before you ever run your program. It achieves this by examining the control and data flow. This is cutting-edge research known as flow-sensitive type systems.

In addition, by examining the control and data flow, if we can ascertain that null was never passed into the doStuff function, type refinement would not need to occur, and you would not need to check for null. Thus, this code – with no null check – will still pass a compile:

void doStuff(Foo foo) {
    foo.bar();
}

doStuff(new Foo());

On the other hand, as soon as you introduce null, you need to check for null too:

// Attempt 1
void doStuff(Foo foo) {
    foo.bar(); // Error, 'foo' can be 'null'
}

doStuff(new Foo()); // ok
doStuff(null);      // now we need a null check
// Attempt 2
void doStuff(Foo foo) {
    if (foo != null) { // do the 'null' check so the code compiles
        foo.bar();
    }
}

doStuff(new Foo()); // ok
doStuff(null);      // ok

null is a first-class type in the JS++ type system so, combined with JS++’s sound type system, JS++ is able to categorically determine that you’ve checked foo is non-null without ever executing any code.

Design Notes: void and undefined

JS++ replaces the “void” keyword as a type annotation rather than being like JavaScript which dictates that “void” is a keyword used to evaluate an expression and always returns undefined. This doesn’t serve much practical purpose. You can accomplish the same effect in many other ways:

// Original
var a = void foo();

// Equivalent to:
var b = (foo(), undefined);
var c = (function(){ foo(); })();
// ... etc

Furthermore, “void” is very rarely used in real-world JavaScript. This is why I always say it takes an experienced JavaScript programmer to design a JavaScript superset. It takes someone with deep experience in JavaScript to identify this in order to make such a breaking change with confidence. If you’re introducing a breaking change, make sure it’s just breaking the 1% and not the 99%.

The enlightened few will use void 0 in place of undefined because the ECMAScript 3 standard defines undefined as a global property that can be defined!

JS++, on the other hand, makes undefined a keyword. void is then used as a type which represents both null and undefined.

Finally, there have been informal plans for a JS to JS++ translator. This will handle those few instances where void 0 is used to mean undefined and such. These cases are easy to parse and identify, and it makes introducing such a breaking change fathomable.

Design Notes: ‘const’

JS++ does not implement a const keyword. This was a design decision.

For instance, consider the following ECMAScript 6 code:

const x = {};
// Adding a new property, not "constant" at all
x.foo = 1;
// Changing the value of an existing property; once again, not "constant"
x.foo = 2;

x.foo; // 2

The variable x is declared as a constant using the const keyword. However, it very clearly is not constant (e.g. mathematical constants, which are typically our first exposure to constants). Where references are concerned, const does not actually create a “constant” and should thus be regarded as either a misnomer, and, in a worst case, should be disallowed as a matter of best practice in order to discourage semantic confusion and – consequently – bugs.

The situation is bad enough that Brendan Eich, the creator of JavaScript, even retweeted a blog post about this confusion:

es6-const

Instead, JS++ uses the final keyword. final prevents any further assignments to a variable; in other words, this is the last and “final” assignment. Thus, while the reference can be modified, the actual variable is still referring to that same reference. It was a final reference, and the reference we were referring to cannot be changed. Simple semantics and functionally identical, but it can go a long way in complex projects.

You never know if you’ll have a team member that doesn’t know the nuances of const, and – going back to semantic confusion – naturally assumes it works a certain way without looking it up and researching it. This is a common problem with JavaScript, where a lot of developers assumed it would be like C/C++ or Java and, thus, did not spend too much (if any) time learning it properly. Personally, I was guilty of much of the same in my early days. const being introduced into the official language specification did not take this behavioral issue into account and risks exacerbating the problem. (ASIDE: Issues like this, while minor and seemingly innocuous on the surface, can create exponentially many problems in larger applications and forced the hand for JS++ to break away from the ECMAScript specification – among a host of other problems.)

Yes, languages like C++ get away with using const. However, it can be argued that it’s so deeply embedded into the language in special ways it almost forces programmers to read the documentation and learn about it in depth (see: “const correctness” in C++). When you see:

const int *const ptr = &x;

You begin to realize that there is more to const in C++ than you may have figured out through a pure dictionary definition. This is not so when you see this:

const x = {};

The latter example (in JavaScript) does not encourage you to read the documentation or best practices. When we make assumptions that we “just know,” it can be dangerous.

We made a design decision based on experience and observation in the hope that final will be better because it doesn’t necessarily encourage the “I just know what it does” attitude. When it does, they should already be familiar with how it works from Java – a correct, consistent, non-confusing, and – most importantly – safe expectation.