Design Notes: Why isn’t System.Array.length an ‘unsigned int’?

It makes sense, doesn’t it? Array sizes should be unsigned because they can never be negative. Yet, JS++ chose to make System.Array<T>.length return a signed 32-bit integer (int). We’ve discussed this internally, and the underlying reasons are not so simple.

JavaScript

The most important reason is that this is a bug in JavaScript and ECMAScript 3’s original design. ECMAScript 3 15.4 specifies that Array#length is an unsigned 32-bit integer. However, it gets a bit tricky when you view these method signatures:

  • T[] Array#slice(int start) (ES3 15.4.4.10)
  • T[] Array#slice(int start, int end) (ES3 15.4.4.10)
  • T[] Array#splice(int startIndex) (ES3 15.4.4.12)
  • T[] splice(int startIndex, int deleteCount) (ES3 15.4.4.12)
  • T[] splice(int startIndex, int deleteCount, ...T replaceElements) (ES3 15.4.4.12)
  • int Array#indexOf(T element) (ES5 15.4.4.14)
  • int Array#indexOf(T element, int startingIndex) (ES5 15.4.4.14)
  • int Array#lastIndexOf(T element) (ES5 15.4.4.15)
  • int Array#lastIndexOf(T element, int endingIndex) (ES5 15.4.4.15)

All of the above deal with array indexes as signed 32-bit integers even though the specification clearly states array lengths are unsigned. Specifically, if we indexed arrays using unsigned int, we would break JavaScript’s indexOf and lastIndexOf (because they return -1 when the element is not found). This gets further complicated because Array#push and Array#unshift, which return Array#length, return unsigned 32-bit integers.

Just know that I brought the proposal forward internally for indexing arrays as unsigned int, but I shut down my own proposal after the self-realization that it would break indexOf and lastIndexOf — it was just unacceptable.

In other words, we were handicapped by JavaScript in our design (as we often are).

Java and C#

A lot of website backends are written in Java, C#, PHP, and – nowadays – JavaScript. JavaScript and PHP are dynamically-typed, so you don’t have to worry about signed/unsigned, but this brings me to Java and C#.

Java doesn’t have unsigned integer types. I actually feel like this can be a good design decision in some ways. It makes reverse array iteration intuitive and obvious: just flip the logic for forward random-access iteration around. Likewise, in C#, List<T>.Count returns a signed integer (32-bit). Just as in Java, reverse iteration with a for loop is just flipping the logic around.

With signed integers, you don’t have to worry about integer overflow. If you perform forward iteration with:

for (int i = 0; i < list.Count; ++i);

Then, intuitively, reverse iteration might look like:

for (int i = list.Count - 1; i >= 0; --i);

Of course, this won't work for C/C++ because, on the final iteration, you get integer overflow.

Once again, in dynamic languages like JavaScript, you don't even have to worry about such things. It was all abstracted away by dynamic typing.

Reverse Array Iteration

Reverse array iteration over unsigned types becomes non-trivial. Anyone that has done this in C/C++ will know what I mean. The correct way to do it is to do it in a way that takes integer overflow into account. In C and C++, array sizes are unsigned, and C doesn't have C++ reverse iterators. Here's the code in C:

int arr[3] = { 1, 2, 3 };
size_t len = sizeof(arr)/sizeof(arr[0]);

for (size_t i = len; i --> 0;) {
    printf("%d\n", arr[i]);
}

So you initialize to the length of the array (without subtracting 1) and i --> 0 is better formatted as (i--) > 0. Thus, inside the loop body, you will only access - at most - length - 1 and it will count down until zero.

However, this isn't intuitive unless you come from a C/C++ background, and most C/C++ programmers are not web developers.

Conclusion

Reverse iteration in for loops may or may not be intuitive for you I didn't want users tearing their hair out over a basic programming exercise of iterating over an array backwards. Coupled with the fact that ECMAScript 3's original design was buggy, it only made sense to use int instead of unsigned int to avoid breaking old code from JavaScript.

Oh, and int is just so much more pleasant to type than unsigned int with casts everywhere.