JS++ 0.9.0: Efficient Compile Time Analysis of Out-of-Bounds Errors

I promised a breakthrough for our next release.

We are proud to announce JS++ efficiently analyzes and prevents out-of-bounds errors. An out-of-bounds error occurs when you attempt to access a container element that doesn’t exist in the container. For example, if an array has only three elements, accessing the tenth element is a runtime error.

In C, you risk buffer overflows. In C++, you risk buffer overflows and exceptions. In Java and C#, you get an exception at runtime. If exceptions are uncaught, the application terminates. If segmentation faults occur, the application terminates. In the case of buffer overflows, you open your application to a variety of exploits.

As we will show, we can perform out-of-bounds analysis with only a ±1-2ms (milliseconds) overhead on complex projects. There is virtually no effect on compile times with our invention.

Out-of-bounds errors have plagued computer science and programming for decades. Detecting these errors at compile time has ranged from slow to impossible, depending on the language design. With that said, let’s first explore the problems which influenced the design.

Problems

Basic Cases to Handle

In all of the following cases, you cannot predict the value at compile time:

import System;

int[] arr = [ 1, 2, 3 ];

Console.log(arr[Math.random(1, 100)]);
Console.log(arr[getUserInput()]);
Console.log(arr[getValueFromFile()]);
Console.log(arr[API.getTweetLimit()]);

JS++ doesn’t stop at array indexes. Array indexes are limited to numeric values. What about arbitrary string keys on System.Dictionary<T>? Yes, we handle these too:

import System;

auto dict = new Dictionary<string>();

Console.log(dict[Math.random(1, 100).toString()]);
Console.log(dict[getUserInput()]);
Console.log(dict[getTextFromFile()]);
Console.log(dict[API.getTwitterUsername()]);

These are the basic cases. It gets more complex with branching logic:

import System;

Dictionary<string> dict = {
    "1":  "a",
    "10": "b"
};

bool yes() {
    return Math.random(0, 100) > 50;
}

if (yes()) {
    dict["20"] = "c";
}

string key = Math.random(0, 100).toString();
if (dict.contains(key)) {
    Console.log(dict[key]);
}
else {
    Console.log(dict[key + "0"]);
}

These are the very basic cases. There are more… a lot more. All the corner cases you need to explore are outside the scope of this announcement.

Compile Times Must Be Fast

Efficiency is the key. We can’t announce 30% faster compile times in the previous release and simultaneously promise a breakthrough that will cause compile times to explode exponentially.

Clearly, following every branch, virtual function call, external function call, and then some would not be a realistic proposal.

First, let’s look at a basic benchmark so we know what we’re comparing against. In the last release, 0.8.10, I measured “Hello World” compile times. With all of the analyses we added in 0.9.0 (the latest release), how much did it increase compile times for “Hello World”? A little under two (2) milliseconds:

Version Total Time
JS++ 0.8.10 72.6ms
JS++ 0.9.0 74.2ms
(Lower is better)

The test system is the exact same as the one we used to measure compile times for 0.8.10:

Intel Core i7-4790k
32gb DDR3 RAM
Samsung 960 EVO M.2 SSD
Debian Linux 9

However, “Hello World” is not a perfect benchmark. How long does it take to compile real-world projects with thousands of lines of code that make lots of array and dictionary accesses? Here are three projects before we introduced compile-time analysis of out-of-bounds errors:

Compile times for 0.8.10 – before out-of-bounds checking
Line Count Source Files Count Total Time
1,137 lines 27 files 124.8ms
4,210 lines 42 files 164.4ms
6,019 lines 72 files 224.6ms
(Lower is better)

Here are compile times after we introduced analysis of out-of-bounds errors:

Compile times for 0.9.0 – detection of out-of-bounds at compile time
Line Count Source Files Count Total Time
1,140 lines 27 files 124.4ms
4,148 lines 41 files 165.4ms
5,942 lines 71 files 224.2ms
(Lower is better)

There’s a slight change in line and file counts due to the inclusion of a ‘Base64’ library, which – during the 0.9.0 refactoring – I removed and replaced with the Standard Library’s System.Encoding.Base64. (The code is the exact same.)

The above projects include both frontend and backend code. They include lots of modules, classes, arrays, dictionaries, and other complexities. I’ve included source file counts to account for disk I/O.

It can be observed that there is virtually no performance penalty for dealing with out-of-bounds errors at compile time. The results are within ±1ms (milliseconds).

Nullable Types are a Problem

Expressing nullability is important in computer programming. For example, a file might have a creation date and last access time. For a new file, there may never have been a “last access time”; thus, it might be ideal to use a nullable data type in this case.

Nullable types are a solved problem in other languages. We considered having Array<T> return T?, but there would be issues with that as presented by Anton Rapetov, our lead compiler engineer:

int[] intArr = [ 1, 2 ];
int? intEl2 = intArr[2];
if (intEl2 == null) {
    Console.log("Definitely out of bounds");
}

int?[] nullIntArr = [ 1, null ];
int? nullIntEl2 = nullIntArr[2];
if (nullIntEl2 == null) {
    Console.log("Might be out of bounds, might just be an access of a null element");
}

Usability

Even if returning nullable types worked, there would be significant usability issues. For example, the following is common code:

int[] arr = [ 1, 2, 3 ];
for (int i = 0, len = arr.length; i < len; ++i) {
    arr[i]++;
    // or
    arr[i] += 1;
}

In the above code, it's clear an out-of-bounds access can never occur. Nonetheless, if an array access returns T?, type conversions would be necessary before the ++ or += 1 operations can occur so we aren't adding to a null value. We need a way to avoid making the user do this for common operations. In fact, for common operations, we want you to be able to write the code exactly as you would above.

Exceptions

In a statically-typed programming language, exceptions allow us to return T for an Array<T> without compromising correctness on an out-of-bounds access. For example, if we declare an Array<int>, a 'pop' method can only return a value of type int or throw an exception. If an exception is thrown, it is none of the concern of the type checker. At compile time, it would not be possible to determine whether or not the exception will be thrown. Yet, an uncaught exception will result in premature program termination at runtime.

Here's an example of how exceptions might be implemented for a container in JS++:

class Array<T>
{
    var data = [];

    T pop() {
        if (this.data.length > 0) {
            return this.data[this.data.length - 1];
        }
        else {
            throw new OutOfBoundsException("Array is empty.");
        }
    }
}

By using exceptions, we never sacrifice type checker performance and compile times. Bounds checking is still performed, but the dark side of exceptions is that it can terminate the application if uncaught.

If we avoid throwing exceptions, and just let JavaScript return undefined, we'd be walking into TypeScript territory and just letting our type system become unsound because it would be "practical." While you might convert undefined to int as zero, there aren't always sensible default values for all JS++ types (e.g. a callback type or a non-nullable class Foo with no default constructor). Speaking of default values...

Default Initialization

Facebook discovered a problem with C++ maps and default initialization in their code that can lead to bugs:

std::unordered_map<std::string, int> settings{};

// ...

std::cout << "Timeout: " << settings["timeout"] << std::endl;

In the above code, simply printing the value of "timeout" can cause it to be zero-initialized. This led us to conclude that default initialization of missing keys would not be a solution. Default initialization of a map of word counts to zero for missing words is innocuous, but an accidental initialization of timeout or price values to zero can lead to substantially different bug severities.

The Breakthrough: null vs undefined

We wanted to have nullable types in the language. We wanted programmers to be able to express the following:

class Person
{
    string firstName = "";
    string? middleName = null;
    string lastName = "";
}

As of the latest release (0.9.0), the above code will work because we've introduced nullable types.

However, I want to revisit an example on nullable types earlier. When we decided to move forward with nullable types, a suggestion was brought up to return T? from array accesses. This example was given as a counter-argument:

int[] intArr = [ 1, 2 ];
int? intEl2 = intArr[2];
if (intEl2 == null) {
    Console.log("Definitely out of bounds");
}

int?[] nullIntArr = [ 1, null ];
int? nullIntEl2 = nullIntArr[2];
if (nullIntEl2 == null) {
    Console.log("Might be out of bounds, might just be an access of a null element");
}

We keep a record of all our meetings. While we didn't explicitly discuss undefined at all, I was in a hurry and summarized our meeting as:

>>>> * There's a problem differentiating between 'null' and
>>>> 'undefined':
>>>>
>>>> ```
>>>> Foo?[] arr = [new Foo(), null];
>>>> auto el1 = arr[1];
>>>> auto el2 = arr[2];
>>>> ```
>>>>
>>>> el1 has type Foo?
>>>> el2 has type Foo?
>>>>
>>>> el1 has value null // el1 exists but is null
>>>> el2 has value null // el2 does NOT exist but is also null
>>>>
>>>> To deal with this, we can add a `hasIndex(int i)` method to
>>>> containers.

Subconsciously, this led to the realization that all we had to do was differentiate between null and undefined in our type system.

Introducing Existent Types

In JavaScript, null means that a value exists but is an "empty value," and undefined means no value exists at all. A basic example is here:

var x = null;
var y;

console.log(x); // null
console.log(y); // undefined

This illustrates the basic concept; unfortunately, JavaScript is inconsistent:

var x = null;
var y;
var z = undefined;

console.log(x); // null
console.log(y); // undefined
console.log(z); // undefined

JS++ has different semantics. First of all, in JS++, all variables must be initialized; therefore, you can't have a variable reference return undefined... ever. Secondly, null means "empty value," but undefined in JS++ means "out-of-bounds error."

JS++ introduces existent types, which uses the + syntax, to describe container accesses:

int[] arr = [ 7, 8, 9 ];

int+ x = arr[0];
int+ y = arr[1000];

We can think of existent types as the "bounds-checked type." I'm a big believer in simplicity. Rather than trying to calculate whether the container access is within-bounds or out-of-bounds at compile time, we delay this check to runtime via the code generator. Existent types are not purely a type checking innovation. The type provides guidance to the code generator to generate code such as the following:

int[] arr = [ 7, 8, 9 ];

int+ x = 0 < arr.length ? arr[0] : undefined;
int+ y = 1000 < arr.length ? arr[1000] : undefined;

We don't actually generate code this way, but it helps illustrate the concept for developers coming from backgrounds in C, C++, C#, Java, etc.

By default, int+ and int are not compatible types. I'll start by introducing the "safe default operator":

int[] arr = [ 7, 8, 9 ];

int+ x = arr[0];
int+ y = arr[1000];

int a = x ?? 0;
int b = y ?? 1;

The "safe default operator" will check if the left-hand side is undefined. If the value is undefined, the evaluated value of the right-hand side of the ?? operator is returned. Otherwise, the left-hand side is returned. In the case of the example above, 'a' will have the value of 7 because 'x' was within-bounds. 'b' will have the value of '1' because 'y' was out-of-bounds, and, thus, the alternative value provided to the ?? operator was used.

T+ cannot be the element type

The problem with JavaScript is that you can have an array of undefined values:

var arr = [ undefined, undefined, undefined ];

In the above case, JavaScript would not be able to differentiate between a within-bounds undefined and an out-of-bounds undefined. In JS++, an existent type cannot be the element type of an array or other container:

int+[] arr = []; // ERROR

[ ERROR ] JSPPE5204: Existent type `int+' cannot be used as the element type for arrays

Therefore, the invention of existent types cannot be retroactively applied to JavaScript.

If you want to represent an array element as having an "empty" value, you have to use nullable types...

Nullable Types + Existent Types

The following describes the basic syntax for the nullable and existent types being introduced in version 0.9.0:

int a = 1;  // 'int' only
int? b = 1; // 'int' or 'null'
int+ c = 1; // 'int' or 'undefined'

However, sometimes we want an array element to contain the "empty" value. In this case, we can combine nullable types with existent types using the following syntax:

int?+ d = 1; // 'int' or 'null' or 'undefined'

In this way, JS++ doesn't have the ambiguity of an undefined value that can be a within-bounds access and also an out-of-bounds access.

Usage with Dictionaries

Existent types can also be used with System.Dictionary<T>. We just introduced how nullable and existent types can be combined so let's use the combination:

import System;

Dictionary<bool?> inviteeDecisions = {
    "Roger": true,
    "Anton": true,
    "James": null, // James is undecided
    "Qin": false
};

bool?+ isJamesAttending = inviteeDecisions["James"]; // 'null'
bool?+ isBryceAttending = inviteeDecisions["Bryce"]; // 'undefined'

In the above code, we use the ?+ syntax to combine nullable and existent types. We're throwing a party, and we want to keep track of the decisions of our invitees. If the invitee's decision is true, he's coming to the party. If the invitee's decision is false, he won't be attending. If the invitee's decision is null, he is undecided. Finally, if the invitee's decision evaluates to undefined, he was not actually invited.

Naturally, the operators that apply to nullable types and existent types (such as the ?? safe default operator) will also apply to the combined ?+ type as well. Code will just be generated to check for null and undefined when using the combined ?+ type.

Beyond arrays and dictionaries, existent types can be applied to the other Standard Library containers (such as Stack<T> and Queue<T>) and even user-defined containers.

Safe Navigation Operator

Besides not being able to differentiate from a within-bounds undefined from an out-of-bounds undefined, JavaScript suffers from another problem:

var arr = [ 1 ];
console.log( arr[1000].toString() );
console.log( "This will never get logged." );

The above code will never reach line 3. The reason is because arr[1000] evaluated to undefined, and you can't call the toString() method on undefined so you'll get a runtime 'TypeError'. In JS++, this isn't a problem because the compiler will detect your attempt to use the . operator and suggest for you to use the ?. safe navigation operator instead:

import System;

int[] arr = [ 1 ];
Console.log( arr[1000].toString() );
Console.log( "This will eventually get logged." );

[ ERROR ] JSPPE5200: The '.' operator cannot be used for nullable and existent types (`int+'). Please use the '?.' safe navigation operator instead at line 4 char 13

If we refactor, we'll discover that, unlike the ?? safe default operator, ?. can return undefined and evaluates to an existent type T+:

import System;

int[] arr = [ 1 ];
Console.log( arr[1000]?.toString() );
Console.log( "This will eventually get logged." );

[ ERROR ] JSPPE5024: No overload for `System.Console.log' matching signature `System.Console.log(string+)' at line 4 char 0

So one possible fix is to provide a default value:

import System;

int[] arr = [ 1 ];
Console.log( arr[1000]?.toString() ?? "out of bounds" );
Console.log( "This will eventually get logged." );

It finally compiles, and we get the following output:

out of bounds
This will eventually get logged.

No crashes and no exceptions can occur.

Inspecting 'undefined'

Oftentimes, when you encounter an out-of-bounds error, you might want to skip to the next iteration over the container or return from a function. Essentially, you want to "skip" code that was written for within-bounds accesses. In JS++, it's as simple as comparing against the undefined value:

import System;

int[] arr = [ 1 ];

for (int i = 0; i < 10; ++i) {
    int+ element = arr[i];
    if (element == undefined) {
        continue;
    }

    int x = (int) element;

    Console.log(x + 1);
    Console.log(x + 2);
    Console.log(x + 3);
}

The C-style cast to int is safe because we already checked for and skipped out-of-bounds accesses. We can also use the safe default operator instead in the code above.

Finally, our output:

2
3
4

This allows us to elegantly write large chunks of code for within-bounds accesses while skipping, returning, or just ignoring out-of-bounds accesses. We can even log the out-of-bounds error to stderr by using System.Console.error.

Downloads

We're providing download links for the latest release (0.9.0) and the previous version (0.8.10). We want you to be able to verify our claims and benchmarks.

JS++ 0.9.0 (latest) – includes out-of-bounds checking
Platform Download Link
Windows Download (32- and 64-bit)
Mac OS X Download (32- and 64-bit)
Linux Download (32-bit)
Download (64-bit)

JS++ 0.8.10 – before out-of-bounds checking
Platform Download Link
Windows Download (32- and 64-bit)
Mac OS X Download (64-bit)
Linux Download (32-bit)
Download (64-bit)

What's Next?

Our first priority is to manage engineering complexity. We have to refactor our tests, and none of this will show up for you, the user. As I write this, I don't know what to expect. Existent types can bring demand for JS++, but we don't have the resources to manage this demand. Instead, we have to stay disciplined in sticking to our own internal schedules to ensure the long-term success of JS++. We listen to user input, but we don't (and can't) follow hype and trends. JS++ over the next 25 years will be more important than JS++ over the next 25 days. I point to Haskell as an example: it's a programming language that is well thought-out and has persisted for 29 years.

We have users that have followed us for years, and we thank all of them for giving us the motivation to persist. If you're willing to be patient and watch JS++ evolve, I urge you to join our email list. The sign-up form for our email list can be found by scrolling to the bottom of this page.

Final Words

Existent types were co-invented by me and Anton Rapetov (lead compiler engineer for JS++).

We solved compile-time analysis of out-of-bounds errors via traditional nominal typing. Thus, there is no performance difference for JS++ checking whether int can be assigned to string or whether int+ can be assigned to string. This explains the ± 1ms compile time difference for compile time out-of-bounds analysis.

We place heavy emphasis on compile times because we know long compile times hurt developer productivity.

When existent types are used correctly, you should never get premature or unexpected program termination.

There is a full tutorial on nullable and existent types available here.