Nullable and Existent Types

Roger PoonBy Roger Poon

JS++ Designer and Project Lead

In the previous chapter, we explored arrays and dictionaries. Arrays can access elements by index, and dictionaries can access elements by string key. However, what if we access an array index that is higher than our total array elements count? What if we attempt to access a non-existent dictionary key?

    import System;

    int[] arr = [ 1, 2, 3 ];
    arr[1000]; // ?

    Dictionary<int> dict = { "a": 1 };
    dict["b"]; // ?
    

The idea of accessing non-existent elements of containers, such as arrays and dictionaries, led to the invention of "existent types." Existent types are exclusive to JS++ and are co-invented by me and Anton Rapetov.

In languages preceding JS++, such as C++, Java, and C#, accessing a non-existent element would result in program termination from segmentation faults or uncaught exceptions. In JS++, your program can't crash or exit prematurely from an "out-of-bounds" access, and this is checked by the compiler. The checking incurs almost no compile time overhead; in fact, we've shown that existent types can result in only a ± 1ms (millisecond) difference in compile times for complex projects.

Existent Types

An existent type describes whether a container access is within-bounds or out-of-bounds. Here's a basic example:

        int[] arr = [ 1, 2, 3 ];
        int+ x = arr[0];    // within-bounds
        int+ y = arr[1000]; // out-of-bounds
        

Existent types use the + type annotation syntax. Existent types are also known as the "bounds-checked type;" to aid your understanding of existent types, here's how code for existent types might be generated:

        int[] arr = [ 1, 2, 3 ];
        int+ x = 0 < arr.length ? arr[0] : undefined;       // within-bounds
        int+ y = 1000 < arr.length ? arr[1000] : undefined; // out-of-bounds
        

In other words, existent types don't just stop at the type checker. When existent types are encountered, code will be generated to perform bounds checking. Bounds checking means that - at runtime - the container access will be checked to ensure that it is "within-bounds" and not attempting to access a non-existent element. If an out-of-bounds access occurs, the variable with the existent type will be assigned a value of undefined.

In our example above, x is within-bounds and will have the value of the first element of arr (1). Meanwhile, y is out-of-bounds because the index 1000 is larger than the array's size of three elements. Thus, y has a value of undefined.

'null' vs. 'undefined'

An existent type cannot be the element type for a container such as an array. It's a compile-time error if you try:

        int+[] arr = [ 1, 2, undefined, 3 ];
        

[ ERROR ] JSPPE5204: Existent type `int+' cannot be used as the element type for arrays at line 1 char 0

In order to understand this concept, we have to understand some JavaScript. In principle, JavaScript differentiates between two values: null and undefined. null means the value exists but is an "empty value" while undefined means no value exists at all. This is most easily understood via variable declarations:

        var a = null;
        var b;

        console.log(a); // null
        console.log(b); // undefined
        

In the above JavaScript code, the variable 'a' was declared and initialized to null (a value exists but represents an "empty" value). Meanwhile, the variable 'b' was declared but not initialized. Thus, b has the value 'undefined' (no value exists at all).

This sounds fine at first. However, JavaScript doesn't actually apply this rule in practice:

        var a = null;
        var b;
        var c = undefined;

        console.log(a); // null
        console.log(b); // undefined
        console.log(c); // undefined
        

JS++ has different semantics. We want you to be able to express "empty" values. For example, a file that was just created might have a creation date, but it won't necessarily have a "last access" date. JS++ allows you to express "empty" values with nullable types. In JS++, null means "empty" value, but undefined only means "out-of-bounds access." This distinction is important to understand if you want to understand why JS++ does not allow existent types to be the element type. In JavaScript, you can have an array of undefined values:

        var arr = [ undefined, undefined, undefined ];
        

Thus, JavaScript is unable to distinguish a within-bounds 'undefined' value from an out-of-bounds 'undefined' value. JS++ does not have this problem. However, if you want to express emptiness...

Nullable Types

Nullable types allow you to declare that data can also have the value null:

        int? x = 1;
        x = null; // OK
        

Note that a nullable type cannot be assigned the value 'undefined'. Besides this, nullable types don't suffer the restrictions of existent types and should be used when emptiness needs to be expressed. For example, the following array can contain int values and empty (null) values:

        int?[] arr = [ 1, null, 2, null, 3 ];
        

However, the above array now poses a new problem: what happens if we access an out-of-bounds element?

The Combined Nullable + Existent Type

When we have an array of nullable-type elements, out-of-bounds accesses are still represented with existent types. JS++ allows us to combine nullable and existent types using the ?+ syntax:

        int?[] arr = [ 1, null, 2, null, 3 ];

        int?+ x = arr[0];   // 1, within-bounds
        int?+ y = arr[1];   // null, within-bounds
        int?+ z = arr[100]; // undefined, out-of-bounds
        

The following illustrates the possible type combinations for nullable and existent types:

        int a = 1;   // 'int' only
        int? b = 1;  // 'int' or 'null'
        int+ c = 1;  // 'int' or 'undefined'
        int?+ d = 1; // 'int' or 'null' or 'undefined'
        

So far, we've only discussed arrays. However, nullable and existent types can also be used for dictionaries and other containers. Here's a creative example using System.Dictionary<T>:

        import System;

        Dictionary<bool?> inviteeDecisions = {
            "Roger": true,
            "Anton": true,
            "James": null, // James is undecided
            "Qin": false
        };
         
        bool?+ isJamesAttending = inviteeDecisions["James"]; // 'null'
        bool?+ isBryceAttending = inviteeDecisions["Bryce"]; // 'undefined'
        

In the above code, we use the ?+ syntax to combine nullable and existent types. We're throwing a party, and we want to keep track of the decisions of our invitees. If the invitee's decision is 'true', he's coming to the party. If the invitee's decision is 'false', he won't be attending. If the invitee's decision is 'null', he is undecided. Finally, if the invitee's decision evaluates to 'undefined', he was not actually invited.

Safe Default Operator

By default, there is no automatic conversion from an existent type T+ to T. Concretely, you cannot assign a value of type int+ to int without getting an error:

        int[] arr = [ 1, 2, 3 ];
        int+ x = arr[0];    // within-bounds
        int+ y = arr[1000]; // out-of-bounds

        int a = x;
        

[ ERROR ] JSPPE5206: Cannot convert existent type (`int+') to `int'. To manually convert and avoid exceptions, try using the safe default operator '??'. You can also perform an explicit cast to `int', but it can cause a runtime 'System.Exceptions.CastException' at line 5 char 8

We will explore later in this chapter how to cast correctly (to avoid runtime exceptions... despite what the error message says), but, first, let's explore the better option: the safe default operator ??:

a ?? b

The safe default operator ?? will start by evaluating the expression on its left-hand side (a). If the expression on the left-hand side evaluates to undefined (or null for nullable types), then the evaluated value on the right-hand side of the operator (b) is returned. If the left-hand side does not evaluate to undefined (or null for nullable types), then the evaluated value of the left-hand side (a) is returned.

In simpler terms, the safe default operator (??) allows you to provide an alternative value if an out-of-bounds access occurred (or if a null value is encountered for nullable types). Let's change our code above to use the safe default operator:

        import System;

        int[] arr = [ 1, 2, 3 ];
        int+ x = arr[0];    // within-bounds
        int+ y = arr[1000]; // out-of-bounds

        int a = x ?? 0;
        int b = y ?? 0;

        Console.log(a);
        Console.log(b);
        

On Windows, right-click the file and select "Execute with JS++". In Mac or Linux, run the following command in your terminal:

> js++ --execute test.jspp

You should see the following output:

1
0

Observe that our within-bounds access (x) was successfully converted from int+ to int and retained its value: the first element of arr (1). Meanwhile, our out-of-bounds access (y) was also converted, but we used the "alternative value" (the right-hand side) of the ?? safe default operator. Thus, y was also converted to int but with a value of zero.

In most cases, when using the safe default operator, the right-hand side will usually be supplied with the default value for the type you want to convert to. For example, zero (0) for int, an empty string ("") for string, and false for bool. JS++ does not provide default values for you because conversions involving existent types should be handled on a case-by-case basis; for example, accidentally default initializing a timeout or price value to zero will incur different bug severities compared to default initializing a word count to zero for a missing word. Requiring a default value via the safe default operator is by design. It would have been a simple change for the design team to add an automatic conversion from T+ to T to make existent types less verbose, but we didn't want to open up your code to bugs.

The following table describes what the safe default operator checks for each type:

Type Checks for...
Nullable (?) null
Existent (+) undefined
Nullable + Existent (?+) null and undefined

Safe Navigation Operator

Once again, in order to understand JS++, we have to understand JavaScript. Besides being unable to distinguish a within-bounds undefined from an out-of-bounds undefined, JavaScript also suffers from another problem when it comes to arrays and other containers:

        var arr = [ 1 ];
        console.log( arr[1000].toString() ); // this will crash
        console.log( "This will never get logged." );
        

In the above code, we try to access the 1000th element of 'arr' via arr[1000]. However, the 'arr' array only has one element. Since JavaScript doesn't perform compilation, it doesn't perform compile-time error checking like JS++. Thus, the method call of 'toString' on the 1000th element, arr[1000], escapes notice. Finally, when the script gets executed, it crashes with a TypeError. Line 3 will not get executed.

Now, let's try and convert the above code to JS++. Start by changing the type of the array from 'var' to 'int[]', and we use 'Console.log' (via the 'System' module) rather than 'console.log':

        import System;

        int[] arr = [ 1 ];
        Console.log( arr[1000].toString() );
        Console.log( "This will eventually get logged." );
        

Notice that I also changed the string from "This will never get logged" to "This will eventually get logged." You'll also notice we still attempt the same out-of-bounds access on the array by trying to access the 1000th element even though the array contains only one element. Try to compile. You'll get this error:

[ ERROR ] JSPPE5200: The '.' operator cannot be used for nullable and existent types (`int+'). Please use the '?.' safe navigation operator instead at line 4 char 13

The error message is very descriptive and tells us exactly the location where the error occurred and suggests a fix. Let's try that fix:

        import System;

        int[] arr = [ 1 ];
        Console.log( arr[1000]?.toString() );
        Console.log( "This will eventually get logged." );
        

You'll notice that we are now using the safe navigation operator '?.'. The safe navigation operator will check if the left-hand side is undefined (or null for nullable types); if it is, the safe navigation operator will evaluate to undefined (or null for nullable types). Otherwise, if the object exists, the safe navigation operator will try to access the member of the object. In this case, we are trying to access the 'toString' member if 'arr[1000]' exists. Since 'arr[1000]' does not exist, 'undefined' will be returned. Thus, since either the object member is accessed or 'undefined' is returned, we can understand that - for existent types - the safe navigation operator '?.' will return a type of T+ (an existent type).

Here's a table describing the return types for the safe navigation operator:

Type Returns...
Nullable (?) T?
Existent (+) T+
Nullable + Existent (?+) T?+

However, since '?.' will return an existent type, we have a problem if we try to compile with only the '.' changed to '?.':

[ ERROR ] JSPPE5024: No overload for `System.Console.log' matching signature `System.Console.log(string+)' at line 4 char 0

'System.Console.log' will accept an argument of type 'string' but not 'string+'. This is a common error you'll have to deal with when using existent types so I wanted to make sure it was covered in the tutorial. While there are multiple ways to deal with converting 'string+' to 'string' - as you'll discover when you finish reading this full chapter - we'll use the operator that we've already covered: the safe default operator '??'.

        import System;

        int[] arr = [ 1 ];
        Console.log( arr[1000]?.toString() ?? "out of bounds" );
        Console.log( "This will eventually get logged." );
        

The output should look like this:

out of bounds
This will eventually get logged.

Unlike JavaScript, as long as you are using internal types, JS++ cannot crash or throw exceptions from an access on an undefined value. When existent types are used correctly, you should never get premature or unexpected program termination.

Let's also try a within-bounds access using the safe navigation operator:

        import System;

        int[] arr = [ 1 ];
        Console.log( arr[0]?.toString() ?? "out of bounds" );
        Console.log( arr[1000]?.toString() ?? "out of bounds" );
        Console.log( "This will eventually get logged." );
        

The final output should look like this:

1
out of bounds
This will eventually get logged.

Inspecting 'undefined'

It's not uncommon to write code inside a loop or function that assumes only within-bounds accesses occur. Oftentimes, rather than desiring an exception that can terminate the program, we'd rather just "skip" complex logic when we detect an out-of-bounds error. We can do that in JS++ with an 'if' statement that checks for 'undefined':

        import System;

        int[] arr = [ 1 ];
         
        for (int i = 0; i < 10; ++i) {
            int+ element = arr[i];
            if (element == undefined) {
                continue;
            }
         
            int x = element ?? 0; // the ?? 0 path never gets followed
         
            Console.log(x + 1);
            Console.log(x + 2);
            Console.log(x + 3);
        }
        

In the code above, we simply skip the iteration if an out-of-bounds access was detected. The rest of the code, after the 'if' statement, operates with the assumption that we are only dealing with within-bounds values. The output will be:

2
3
4

Notice I marked a line with a comment:

int x = element ?? 0; // the ?? 0 path never gets followed

The reason the ?? 0 path never gets followed is because we've already checked that the 'element' variable does not equal 'undefined'. Alternatively, we can cast...

Casting Unsafely

In JS++, some type conversions cannot be proven to be safe by the compiler and need to be performed explicitly. Consider the following example:

        import System;

        int x = 1;
        byte y = x;
        

[ ERROR ] JSPPE5016: Cannot convert `int' to `byte'. A cast is available at line 4 char 9

Since we know the value one (1) is within the range of the 'byte' data type (0-255), we can provide an explicit cast to make the error go away:

        import System;

        int x = 1;
        byte y = (byte) x;
        

The syntax for a type cast is:

(type) expression

Let's re-visit a previous existent types example where we received an error suggesting to use either the safe default operator or a cast. We decided to go with the safe default operator because it was... safe. The safe default operator can never result in runtime program termination. In using existent types, the safe default operator will suffice for the vast majority of cases. However, there are times when you might know the cast is safe or might not want to supply a default value.

Here's our previous example:

        int[] arr = [ 1, 2, 3 ];
        int+ x = arr[0];    // within-bounds
        int+ y = arr[1000]; // out-of-bounds

        int a = x;
        

[ ERROR ] JSPPE5206: Cannot convert existent type (`int+') to `int'. To manually convert and avoid exceptions, try using the safe default operator '??'. You can also perform an explicit cast to `int', but it can cause a runtime 'System.Exceptions.CastException' at line 5 char 8

Let's first experiment by casting the variable 'x' from 'int+' to 'int':

        import System;

        int[] arr = [ 1, 2, 3 ];
        int+ x = arr[0];    // within-bounds
        int+ y = arr[1000]; // out-of-bounds

        int a = (int) x;
        Console.log(a);
        

There should be no problems, and you should see this output:

1

However, let's try casting the variable 'y' (which made an out-of-bounds access) from 'int+' to 'int':

        import System;

        int[] arr = [ 1, 2, 3 ];
        int+ x = arr[0];    // within-bounds
        int+ y = arr[1000]; // out-of-bounds

        int a = (int) y;
        Console.log(a);
        

Execution Error: System.Exceptions.CastException: Failed to cast `undefined' to `int'

An incorrect cast led to a runtime error. Let's re-visit the original error message before we started casting (with relevant details in bold):

JSPPE5206: Cannot convert existent type (`int+') to `int'. To manually convert and avoid exceptions, try using the safe default operator '??'. You can also perform an explicit cast to `int', but it can cause a runtime 'System.Exceptions.CastException'

Now you can see why we started with and recommended the safe default operator: it cannot cause runtime errors. Now you can also see why the error message warns you against the explicit cast. However, there is a way to use casts safely.

Casting Safely

Now that we've learned what unsafe casting looks like, we can learn how to cast safely and correctly when using existent types. The rule is very simple: check that you don't have an out-of-bounds access before you perform a cast. Here's an example:

        int[] arr = [ 1, 2, 3 ];
        int+ x = arr[0];    // within-bounds
        int+ y = arr[1000]; // out-of-bounds

        if (x != undefined) {
            int a = (int) x;
        }
        if (y != undefined) {
            int b = (int) y;
        }
        

Now you should never get runtime errors if you want to "cast away" the existent type.

Once again, we want to stress: When existent types are used correctly, you should never get premature or unexpected program termination.

Before we announced existent types, we tested the feasability and usability of existent types on 11,000 lines of code. Our findings were that you should almost never need to cast, and the safe default operator '??' will handle most cases. In our refactoring of the 11,000 lines of code, we only had one or two instances where casts were needed, and, in those instances, the casts were performed safely by checking for 'undefined' first.

Here's the relevant original code (before existent types were introduced):

        FeedItem item = this.queue.pop();

        Crawler _this = this;
        auto tokenizer = new TextTokenizer();
        auto docProcessor = new DocumentProcessor();
        string url = item.url;
        Date pagePublishedDate = item.date;
        

And here's the refactoring:

        FeedItem+ item = this.queue.pop();
        if (item == undefined) {
            return;
        }
        FeedItem feedItem = (FeedItem) item;

        Crawler _this = this;
        auto tokenizer = new TextTokenizer();
        auto docProcessor = new DocumentProcessor();
        string url = feedItem.url;
        Date pagePublishedDate = feedItem.date;
        

It would also be equivalent to refactor like so (without the need for casts):

        FeedItem+ item = this.queue.pop();
        if (item == undefined) {
            return;
        }

        Crawler _this = this;
        auto tokenizer = new TextTokenizer();
        auto docProcessor = new DocumentProcessor();
        string url = item?.url ?? "";
        Date+ pagePublishedDate = item?.date;
        if (pagePublishedDate == undefined) {
            return;
        }
        

Notice in our code that we are applying the same concepts we've been teaching in this chapter: checking for 'undefined', "skipping" code (e.g. with the 'return' or 'continue' statements), casting safely, etc.

If the code from our test project is confusing, it's because we haven't taught classes and user-defined types yet. We'll get to that starting in the next chapter. For readers coming to JS++ from languages that have classes, the above code should illustrate further how to use existent types. Casts are the one corner case with existent types that can cause runtime exceptions, but, if used correctly, it should never happen.

Nevertheless, in the vast majority of cases, the safe default operator '??' should suffice and should be favored over casts.

Intuition and User Experience (UX)

In C++, there is operator[] and .at(). The former does not perform bounds checking; the latter differs from the former because it performs bounds checking and throws an std::out_of_range exception. Needless to say, the reason this design exists in C++ is because C++ programmers intuitively have a sense of whether they might perform an out-of-bounds access and don't want to pay the performance penalty for bounds checking.

... And it's not just C++ programmers. Most programmers have a sense of intuition beyond the compiler's knowledge. Intuition was one of the fundamentals behind JS++ "type guarantees," and JS++ introduced a sound, gradual type system that is fault-tolerant and scales for complex projects. Likewise, for containers, we have a sense of intuition:

        import System;

        int[] arr = [ 1, 2, 3 ];
        for (int i = 0, len = arr.length; i < len; ++i) {
            arr[i]++;
        }
        

A programmer only needs to look at the above code to know that an out-of-bounds error will never occur.

Why is intuition important? Because, for existent types, it is solves major usability issues that would have prevented existent types from becoming practical. The above increment code is actually valid JS++ code with existent types. For common operations (such as ++ and +=), code will be generated so that an error value (undefined) will be returned if the operation occurred on an out-of-bounds element. If the operation occurred for a within-bounds element, it will succeed.

Back to intuition. If you intuitively believe you might make an out-of-bounds access, you can customize how you want to handle the error by first checking for the error value occurring:

        import System;

        int[] arr = [ 1, 2, 3 ];
        for (int i = 0, len = arr.length; i < len; ++i) {
            int+ x = arr[i]++;
            if (x != undefined) {
                Console.log("Success!");
            }
            else {
                Console.error("Increment on out-of-bounds element.");
                continue;
            }

            // ...
        }
        

I explicitly added the continue statement even though it seems unnecessary. Oftentimes, we don't actually want our program to potentially terminate with an IndexOutOfBoundsException or the like. When we're iterating an array, usually we can just "skip" to the next iteration (e.g. via continue) if an error condition occurred. This is much more eloquently described with if/else than try/catch.

Effectively, you get a NOP ("no operation") instead of an exception for ++, +=, and assignment (=) if the operation is performed on an out-of-bounds element. Semantically, a NOP operation remains correct, and it's also better than an exception that can terminate the program. We could have designed this more "safely," but we chose user experience (UX). If you intuitively fear there might be an error condition, check for undefined at runtime.