For Loops: Patterns and Low-Level Tour

I wanted the challenge of taking an elementary programming concept (‘for’ loops) and writing an article for advanced programmers. All of the patterns discussed today are applicable to high-level languages that have ‘for’ loops, but we will examine ‘for’ loops at a low level to discover why things are the way they are. By “low level,” I mean compilers and hardware. Examples are in C, C++, x86 assembly, and, of course, JS++. In addition to covering for loop patterns for everyday programming, I’m going to cover for loops from the perspective of optimizing compilers.

Length/Size Caching

Compilers may not automatically cache your loop length or array size just because the array you’re looping does not change. In JS++:

int[] arr = [ 1, 2, 3 ];

for (int i = 0; i < arr.length; ++i) {
    // ...
}

// is not the same as:

for (int i = 0, len = arr.length; i < len; ++i) {
    // ...
}

Now, for more advanced users, I'm going to show you why. We consider C/C++ optimizing compilers to be the state of the art. As of this writing, the latest gcc 11.2 (and clang 13) do not perform this optimization for this basic case:

// Assuming 'v' is const std::vector<int> &
for (size_t i = 0; i != v.size(); ++i) {
    printf("%zu\n", i);
}

... with special emphasis on the type being a const reference.

On -O3, the comparison on each iteration becomes:

.L3:
        mov     rsi, rbx
        mov     edi, OFFSET FLAT:.LC0
        xor     eax, eax
        add     rbx, 1
        call    printf
        mov     rax, QWORD PTR [rbp+8] # occurs
        sub     rax, QWORD PTR [rbp+0] # every
        sar     rax, 2                 # iteration
        cmp     rbx, rax
        jne     .L3

I'm using printf versus std::cout here because the generated assembly code is easier to read. Furthermore, it doesn't do any exception bookkeeping.

Now, if we cache the size:

for (size_t i = 0, sz = v.size(); i != sz; ++i) {
    printf("%zu\n", i);
}
        push    rbp
        push    rbx
        sub     rsp, 8
        mov     rbp, QWORD PTR [rdi+8] # before
        sub     rbp, QWORD PTR [rdi]   # the
        sar     rbp, 2                 # loop
        je      .L1
        xor     ebx, ebx
.L3:
        mov     rsi, rbx
        mov     edi, OFFSET FLAT:.LC0
        xor     eax, eax
        add     rbx, 1
        call    printf
        cmp     rbx, rbp
        jne     .L3

The assembly code is subtracting a start pointer from an end pointer (at a +8 byte offset, denoting a 64-bit size type). If the pointers are equal, there are zero elements, so we jump to the basic block immediately following the for loop. SAR rbp, 2 is equivalent to >> 2 (division by 4, sizeof(int)). (pointer_end - pointer_start) / sizeof(T) gives us the number of elements. We can confirm in the libstdc++ source code:

struct _Vector_impl_data
{
    pointer _M_start;
    pointer _M_finish;
    // ...
};
      /**  Returns the number of elements in the %vector.  */
      _GLIBCXX_NODISCARD _GLIBCXX20_CONSTEXPR
      size_type
      size() const _GLIBCXX_NOEXCEPT
      { return size_type(this->_M_impl._M_finish - this->_M_impl._M_start); }

The problem, from the perspective of compilers, is that the side effect (writing to standard output) causes the optimizer to discard the loop length caching optimization. The fix is simple: use only pure functions if you want a compiler to automatically apply this optimization. The alternative is to simply cache the size if no mutations occur inside the loop.

One more reason compilers may not perform this optimization, which is important to JS++ but maybe not for C/C++, is compile times. The analysis required to prove the loop length does not change can be expensive.

There are a lot of moving parts: the language, the compiler, the CPU cache architecture, and so on. Whether or not you need this optimization at all should depend on your benchmarks. If the size is in the L1 CPU cache, it would make no practical difference unless you needed to shave 3-4 cycles per iteration (e.g. high-frequency trading). The key takeaway for general developers is that you cannot assume the compiler will do this for you—even when it seems obvious that the size or loop length never changes.

Unsigned Reverse Iteration

Oftentimes, I'm surprised when programmers don't know how to manually loop in reverse when the data type is unsigned. Yes, C++ has rbegin/rend/crbegin/crend, but I'm talking about manual reverse iteration.

First, let's see what does not work. I'll use the JS++ syntax because, in my opinion, it's the most readable for a general audience:

int[] arr = [ /* ... */ ];
for (unsigned int i = (unsigned int) arr.length - 1;
     i > 0;
     i--)
{
    // incorrect
}

The above code will never print the first element.

"No problem," you say to yourself. "I'll just compare against -1."

Wait. You're going to compare an unsigned (non-negative) value against -1? Two things can happen. In C and C++, -1 will wrap. If you're comparing a 64-bit number, -1 becomes 0xFFFFFFFFFFFFFFFF. Your loop condition will never be true (because i will never be "greater than" UINT64_MAX), and the optimizing compiler will simply eliminate the loop. In JS/JS++, your i counter variable will wrap, and you'll get an infinite loop. (The difference, then, is that C/C++ will wrap the RHS; while JS++ will wrap the LHS. JS++ does this for type system soundness reasons, which extend beyond the scope of for loops.)

The code discussed would be perfectly fine if the container size type were signed. Thus, I presented the "intuitive" (but incorrect) method of unsigned reverse iteration. Instead, the proper way to do unsigned reverse iteration looks something like this:

for (unsigned int i = arr.length; i --> 0; ) {
    // ...
}

There's a curiously-titled Stack Overflow thread on this:

What is the "-->" operator in C/C++?

The above code can be rewritten as:

for (unsigned int i = arr.length; (i--) > 0; ) {
    // ...
}

You should prefer the latter—for readability.

Interestingly, being that unsigned reverse iteration is deceptively unintuitive, it was brought up as a consideration to make all JS++ container size types signed. I was in the unsigned camp, but, besides being easier, what pushed us into the signed types direction is because JS++ is a superset of JavaScript (ECMAScript 3). If you want to see a bug in language design and specification for a very popular language (JavaScript), please read Design Notes: Why isn’t System.Array.length an ‘unsigned int’? As far as I know, JS++ was the first to uncover this design bug.

The common misconception with signed container size types is that you have to pay for two bounds checks instead of one to check for negative index accesses:

if (i < 0 && i >= arr.length) {
    // out of bounds
}

However, besides being more beginner-friendly, signed container size types are actually equally efficient to unsigned:

if ((unsigned int) i >= (unsigned int) arr.length) {
    // out of bounds
}

If the index is a negative value, it will wrap (and, thus, will always be greater than the length property cast to unsigned because length is limited to the signed System.Integer32.MAX_VALUE). In other words, the "negative index check" is redundant. The casts do not result in additional instructions.

Labeled break/continue

This brings me to an example that will not work in C or C++, because those languages do not have labeled break/continue statements.

Somebody actually emailed me to remark on how impressed he was with the quality of the JS++ documentation. He spent all these years programming and never knew about labeled break/continue in JavaScript! The following will work in JS++, JS, and Java. I'll just take it directly from the JS++ documentation:

outerLoop: for (int x = 1; x <= 5; ++x) {
    innerLoop: for (int y = 1; y <= 5; ++y) {
        break outerLoop;
    }
}

Notice we labelled the loops (outerLoop and innerLoop) at lines 1 and 2, respectively. We also provided a label (outerLoop) to the break statement at line 3. By referring to the outerLoop in our break statement, we were able to exit the outermost loop; without the label, we would not have been able to exit the outermost loop. (The innermost loop would have been exited since it is the closest loop to the break statement.)

Source: Label Statement (JS++ Documentation)

One reason we might want to break out of an outer loop is if we are searching a jagged array. Once the element is found, we break out of both loops by breaking out of the outer loop.

Looping in Row-Major Order

Given the following C code, would it take more CPU cycles to loop the columns first or the rows first?

const int rows = 64, cols = 64;
int matrix[rows][cols];

In the code above, assume int requires 32 bits of storage space (4 bytes). Furthermore, we'll assume a cache line size of 64 bytes. We can also be assured the arrays are contiguous. From the C11 language specification, 6.2.5 Types:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type.

For visualization, treat each cell as consuming 256 bytes of memory (sizeof(int) ✕ 64 cols ✕ 1 row). The matrix will look like this in memory:

0 1 2 3 4 63

Notice the "shape" of the matrix in memory is not a square, as we might expect in mathematics for a 64 x 64 matrix. Instead, each cell represents one row containing 64 columns consuming 256 bytes each.

Equipped with this visualization, let's first examine looping columns first:

for (int i = 0; i < cols; i++) {
    for (int j = 0; j < rows; j++) {
        // ...
    }
}

This code has an access pattern that looks like this:

r0
c0, c1, c2, …
r1
c0, c1, c2, …
r2
c0, c1, c2, …
r3
c0, c1, c2, …
r4
c0, c1, c2, …
r…
c0, c1, c2, …
r…
c0, c1, c2, …
r63
c0, c1, c2, …

I've prefixed the cell values with lowercase "r" to mark row 1, 2, 3, 4, and so on. Likewise, the columns are marked c0, c1, and so on. There's a problem here though. We're jumping around in memory.

Each "row" is actually divided into 64 columns, stored contiguously. You can imagine the storage as 4096 (64 x 64) columns stored contiguously in memory.

In the first iteration of the loop, we access the first column. We enter the innermost loop, which iterates the rows. We're at c0, r0. On the next iteration of the innermost loop, we're at c0, r1. Then we're at c0, r2, and so forth. When the innermost loop finishes, we start at c1, r0.

We have a cache line size of 64 bytes. At c0, r0, we can cache the first 16 columns of r0 (64 bytes / sizeof(int) = 16). Suppose we have an L1 cache hit with a cost of ~4 cycles and a cache miss of ~100 cycles. Thus, a cache miss would be roughly 25x slower. We have the first 16 columns cached (c0, c1, c2, c3, and so on for r0), but the innermost loop immediately jumps to r1. Thus, instead of getting the data we need from the cache, we have to fetch the data from DRAM... costing us 100 cycles. We have to pay for this penalty 64 times in the innermost loop.

Clearly, this would not be efficient. It is better to just access the contiguous data in the order which it is stored:

for (int i = 0; i < rows; i++) {
    for (int j = 0; j < cols; j++) {
        // ...
    }
}
c0
r0
c1
r0
c2
r0
c3
r0
c4
r0
c…
r0
c…
r0
c63
r0

Notice in the visualization, all of the columns of row 0 are accessed first. We do not move on to row 1 until all the columns of row 0 are accessed. Coincidentally, this results in fewer cache misses. We pay for a first fetch from DRAM (100 cycles) for the first 16 columns, the next accesses come from SRAM (~4 cycles), and we only pay for three more DRAM fetches for the full 64 columns. In the end, this cache-friendly code costs us ~10x fewer cycles in the innermost loop.

Loop-Invariant Code Motion

I had a family member studying computer science at a top 10 CS school. The homework assignment presented a multiple choice question, and she was asked, "Does this code do anything?" The choices were yes/no.

int main(void) {
    int x;
    for (int i = 0; i < 10; i++) x = 2;
    return 0;
}

The correct answer is "no." Loop-invariant code motion might sound intimidating or arcane, but we can break it up into its components: 1) loop invariant, and 2) code motion. First, the compiler begins by identifying "loop invariants"—essentially, code that does not change across loop iterations. The assignment to x = 2 does not change with each iteration. (If the assignment were x = i + 1 instead, it would not qualify as a "loop invariant.") The second half of the optimization, "code motion", permits us to move code (code motion = moving code) that does not change outside of the loop—if safe. I'm emphasizing "if safe" because, if the loop condition were never true (e.g. if we changed the condition to i != 0), x = 2 should never occur.

And that, in a nutshell, is loop-invariant code motion: moving code that doesn't change outside of the loop. The code should now look like this, in high-level code:

int main(void) {
    int x;
    x = 2; // moved outside loop
    for (int i = 0; i < 10; i++);
    return 0;
}

However, we can dive deeper into modern compilers. After loop-invariant code motion, the compiler can perform a second pass: dead code elimination. The for loop is no longer doing anything, and it gets eliminated. The variable x is written to but never read; thus, it can also be eliminated. The final program, in high-level code:

int main(void) {
    return 0;
}

At a low level, if we compile the dead code, gcc -S -O2 finally gives us proper code:

int main(void) {
    int x;
    x = 2; // moved outside loop
    for (int i = 0; i < 10; i++);
    return 0;
}
main:
        xor     eax, eax
        ret

In the end, my family member reported back that the correct answer was indeed "no." I asked if she gave my detailed explanation. She said, "No, because the professor would know I cheated." (?)

There's a question relating to loop-invariant code motion (via gcc -fmove-loop-invariants) on Stack Overflow dating back 5 years, but the only answer is incorrect. The proper answer is buried in the gcc documentation:

Most optimizations are only enabled if an -O level is set on the command line. Otherwise they are disabled, even if individual optimization flags are specified.

Source: GCC Manual 3.10 Options That Control Optimization

The DCE Metaprogramming Pattern: Writing JavaScript Libraries using JS++

One of the primary concerns when writing JavaScript is the issue of cross-browser compatibility. As web browsers continue to evolve and change quickly, it is important for your code to support this rapid change and to provide a consistent user experience across all web browsers and devices. However, while JavaScript libraries like Modernizr provide a web GUI interface for you to manually select and de-select which components you need, experience has shown it is better if this process is seamless and automatic—or, rather, the compiler or build tool should automatically know which components you need or do not need. This automatic process is known as dead code elimination (DCE).

In theory, DCE performs best with static typing and static structure. Since JS++ is the first sound gradually typed language, it is possible for DCE to perform optimally, such as in the case of identifying which function overloads are necessary. This is, generally, not possible with other JavaScript supersets or with JavaScript itself. In particular, given JavaScript’s most popular runtime environment is execution via JIT engines like V8 (for Google Chrome), it is natural that DCE provides advantages by reducing parse times, analysis times, and compile times for JIT environments; thus, page load times are reduced and responsiveness improves. Beyond JIT execution, avoiding the execution of irrelevant operations reduces program running time, and smaller program sizes allow websites to load faster by reducing network payloads.

In this article, we will explore writing a JavaScript library via JS++, which supports DCE, and subsequently show you how that library can be used by JavaScript developers with no knowledge of JS++. In addition to “automatic” and seamless DCE, which is built into the JS++ language, we will also explore “programmable DCE”—a metaprogramming pattern unique to JS++.

What is Dead Code Elimination (DCE)?

Dead code elimination (DCE) means that if you don’t use a module, class, or function, it will not be compiled into your final output.

Here’s a simple example:

void A() {}
void B() {}

A();

In the above example, only the function A gets called. The function B never gets called and never gets executed. Therefore, it is safe for the compiler to not include B in the final compiled output.

A real-world example of the need for dead code elimination in JavaScript (but not JS++) is jQuery. You have to include the entire jQuery library even if you only use one function.

In JS++, it is sufficient to just write the code and compile it. Dead code elimination in JS++ is seamless and automatic—it ships as a default with JS++. The JS++ compiler is able to determine that the function B is never used, and it will not compile the code for function B—by default. In most cases, if all you want is dead code elimination, this is all you need to know, but, for cross-browser and cross-device/mobile development, it is useful to explore more sophisticated DCE.

Programmable DCE: Mobile & Library Development

Arguably, the most important reason to understand DCE is for library development. We will explore library development for mobile devices by example. At the time of this writing, the HTML5 Vibrate API is not supported on the iPhone. In a very basic example, we will develop a small library that allows the user of the library to specify which phones he wants to support (iPhone or Android), and the library will provide notifications to the end user based on the features supported by the requested device(s). We will name this library: Notify.

class Notify
{
}

Save the file as Notify.jspp.

Next, let’s define the implementation:

import Externals.DOM;

class Notify
{
    private static void vibrate() {
        window.navigator.vibrate(2000);
    }
    private static void infobox() {
    	var el = document.createElement("div");
    	el.style.border = "1px solid #000";
    	el.innerText = el.textContent = "You have a new notification.";

    	document.body.appendChild(el);
    }
}

The first line, which imports the Externals.DOM module, allows us to use the JavaScript DOM API.

The method vibrate does exactly what the method name suggests: it will make the phone vibrate (for supported devices).

Finally, the infobox method creates a DIV element and inserts it into the DOM.

Both methods are private and static. The reason is because these methods are platform-specific implementation details. For the library user, we only want to expose to him whether he wants iPhone notifications or Android notifications. Here’s how we expose this to the user:

import Externals.DOM;

class Notify
{
    private static void()? iphoneNotify = null;
    private static void()? androidNotify = null;

    public Notify(int platforms) {
    }
	
    public static property int IPHONE() {
    	iphoneNotify = infobox;
    	return 1 << 0;
    }
    public static property int ANDROID() {
    	androidNotify = vibrate;
    	return 1 << 1;
    }
    
    private static void vibrate() {
        window.navigator.vibrate(2000);
    }
    private static void infobox() {
    	var el = document.createElement("div");
    	el.style.border = "1px solid #000";
    	el.innerText = el.textContent = "You have a new notification.";

    	document.body.appendChild(el);
    }
}

Here we are defining two getter methods: IPHONE and ANDROID. These two getter methods will allow the user to specify which platforms he wants to support. In order to specify the desired platforms, we instantiate the library like so:

new Notify(Notify.IPHONE | Notify.ANDROID); // iPhone *and* Android support
new Notify(Notify.IPHONE);  // iPhone support only
new Notify(Notify.ANDROID); // Android support only

You can try compiling with the variations in the instantiation and confirm that, indeed, only the specified platform code is compiled into the final output. In essence, we get “programmable DCE.” Furthermore, rather than specifying an int return type on the getter methods, one can define an enum to create a more specific type, but this is left as an exercise to the reader.

While the example we’ve explored is very basic, in real-world applications with complex dependency graphs, a library user can experience significant reductions in code size.

Exporting to JavaScript

While JavaScript cannot support DCE—and especially not the advanced DCE patterns of JS++—we can still “pre-DCE” our code before shipping it to JavaScript users. In order to do this, we should first wrap our class in a module. In Notify.jspp:

module NotifyLib
{
    class Notify
    {
        // ...
    }
}

In JS++, there is a toExternal design pattern for exposing JS++ code and libraries to JavaScript users. We need to define a `toExternal` method in our class:

module NotifyLib
{
	class Notify
	{
		// ...

		public function toExternal() {
			void() send;
			
			if (null != iphoneNotify) {
				send = iphoneNotify ?? dummy;
			}
			else if (null != androidNotify) {
				send = androidNotify ?? dummy;
			}
			else {
				send = dummy;
			}
			
			return {
				send: void() {
					send();
				}
			};
		}
		
		private static void dummy() {
			/* INTENTIONALLY EMPTY */
		}
	}
}

Notice how, in the toExternal method, we are transitioning from static to dynamic programming. We use if statements to determine, at runtime, which method to execute. In statically-typed programming languages, one would normally have the compiler resolve the method(s) to call. The purpose of the toExternal design pattern in JS++ is to facilitate complex transitions between the static and dynamic worlds.

Next, we create three files:

  1. Notify.iPhone.jspp
  2. Notify.Android.jspp
  3. Notify.All.jspp

In this tutorial, we will implement Notify.iPhone.jspp, and the other files are left as an exercise for the reader.

In Notify.iPhone.jspp:

import NotifyLib;
import Externals.JS;

auto notify = new Notify(Notify.IPHONE);
global.Notify = notify.toExternal();

First, we import the NotifyLib library. We also import Externals.JS, which defines all JavaScript (ECMAScript 3) symbols as external (such as `Math`, `Array`, `Object`, and so forth). However, Externals.JS does define one symbol that is not in the ECMAScript 3 specification: global. It gives us universal access to JavaScript’s global object, and this non-standard object was added for convenience so that JS++ users would not need to learn all the edge cases that come with trying to access JavaScript’s global scope (such as window being a DOM API object that is not defined in Node.js). Once we are able to access JavaScript’s global scope, we just export our JS++ library to it by converting it to the `external` type (via calling the toExternal method).

Compile Notify.iPhone.jspp:

> js++ Notify.iPhone.jspp Notify.jspp -o Notify.iPhone.js

Now, it should be straightforward to use the `Notify` library you developed completely in JS++ from plain JavaScript:

<!DOCTYPE html>
<html>
<head>
<title>Notify</title>
</head>
<body>
<script type="text/javascript" src="Notify.iPhone.js"></script>
<script type="text/javascript">
Notify.send();
</script>
</body>
</html>

The example for the iPhone above will insert a DOM notification on page load. (Note that, for Android devices, which will vibrate, user interaction is required before the vibration will trigger on the phone for security reasons. Keep the security restriction in mind when compiling Notify.Android.js.) As you compile the remaining files, you will observe that the file sizes are very different—reflecting how only the code for the specified platforms are shipped.

Conclusion

In this article, we have learned several advanced techniques unique to JS++, from programmable DCE to exporting an entire library to JavaScript. It should be clear JS++ is a powerful language, but, since its libraries can be used in JavaScript, there is no “vendor lock-in.”

JS++ 0.9.2: ‘final’ variables and macOS Catalina 64-bit Defaults

JS++ 0.9.2 is now available for download and features final variables and fields. Additionally, due to Apple’s decision to stop supporting 32-bit applications beginning with macOS Catalina, we have changed the default binary to the 64-bit compiler for Mac.

final variables can now be used:

import System;

final int x = 1;
Console.log(x);

final can also be applied to fields:

import System;

class Foo
{
    public static final int bar = 1;
    public int baz = 2;
}

Console.log(Foo.bar);
Console.log((new Foo).baz);

The final keyword, when applied to a class or method, had already been implemented in previous versions.

macOS Catalina (released Oct 7, 2019) has ended support for 32-bit applications. Previously, JS++ distributed a 32-bit build of the compiler as the default for universality. Going forward, we will be distributing the 64-bit build as the default for macOS. If you still need the 32-bit binary, it is included with all releases going forward as the js++-x32 binary. All guides and tutorials have been updated.

Compiler Software Engineering Methods

In our last release, I announced that JS++ core is down to nine (9) minor bugs after 3.5 years of engineering.

Here is the data for our lines of code:

Language files blank comment code
C++ 511 28,449 9,478 120,915
C/C++ Header 578 25,427 30,498 110,327
C 1 11,333 18,124 57,047
JS++ 21,118
Python 35 862 401 2,817
Data collected May 6, 2019 9:50AM PDT

In total, JS++ core consists of over 400,000 lines of code constructed over 3.5 years of engineering. This article discusses the software engineering methods behind JS++ to deliver high-quality and reliable software. Specifically, this article may be of particular interest because we will focus on a software engineering method for real-world compiler engineering; compilers sit at the foundation (bottom) of a technology stack, and a single bug with the compiler can make-or-break a software project. Thus, when high reliability and high correctness are needed in software, what can we do?

Continue reading “Compiler Software Engineering Methods”

JS++ 0.9.1: Bug Fixes

The JS++ core is now down to 9 low-priority bugs after 3.5 years of engineering and 400k+ lines of source code thanks to our lead engineer, Anton Rapetov.

Building on top of our release of “existent types” to prevent out-of-bounds errors, we’ve had enough time for testing and feedback to discover that existent types work, and they will remain a cornerstone of JS++ going forward into the future.

This latest release, version 0.9.1, focuses primarily on bug fixes. We value disciplined engineering, and we wanted to pay back technical debts. However, despite the bug fix theme, there are a few notable features and announcements. I’ll highlight what’s new or changed, and the list of bug fixes will be at the end.

Console.trace

While we’ve had Console.log for a long time, it doesn’t include critical information on where the log message originated from. You might find the new Console.trace method more useful:

filename.js++:3:0: Now we have the original file location, line number, and column number for our logs!

Block Scoping

We’ve also had block scoping semantics for a long time. JS++ brings you high-performance block scoping — even in ECMAScript 3 environments.

In the latest version, block scoping is finalized and all corner cases should be covered. Specifically, we fixed code generation corner cases of lowering to the function scoping semantics of ECMAScript 3.

(We are keenly aware of the block scoping available in ECMAScript 3+ ‘catch’ blocks, which we addressed since the very first release of JS++ as being too costly in terms of performance. This is a gem of the ECMAScript specification that few people know about, and it highlights why you should trust our knowledge and experience – rather than choosing Microsoft just on brand name – going into the future.)

Existent Types

In the last version, 0.9.0, we announced existent types to prevent runtime out-of-bounds errors. In this release (0.9.1), we are doubling down on existent types and revising the rules based on experience, feedback, and re-design:

  • If type T has an implicit conversion to external, T+ also has an implicit conversion to external
  • The safe navigation operator (??) has a higher operator precedence. See the operator precedence documentation.
  • Relax getter/setter type rules for nullable and existent types
  • Disallow void+ and any usages of void except as a return type
  • Forbid external as a common type for int and string for the safe navigation operator (??). Thanks to user @h-h in the JS++ Community Chat for reporting this bug.
  • Fix error message when upcasting/downcasting in the context of nullable/existent types to make it clearer

Nested Generics with Generic Parameters

In previous releases, we had left nested generics (with generic parameters) unimplemented. Previously, the following code worked:

Array<Array<int>> arr1; // nested, non-parametric
Array<T> arr2; // non-nested, parametric

The latest version now enables nested generics with generic parameters:

Array<Array<T>> foo; // nested *and* parametric

Bug Fixes

Finally, here is a list of all the other bug fixes. There are a lot. The latest 0.9.1 release brings JS++ software quality to its highest level yet by addressing technical debt rather than delivering new features.

Fixed:

  • ‘foreach’ looping over external “array-like objects”
  • SyntaxError in generated code for generic functions/casting
  • Error message for super() on base class
  • Duplicated error messages
  • Line terminators in strings generated via escape sequences
  • Fix segfault when casting away nullable type of non-static field
  • Don’t raise a redundant error of accessing a parameter that was shadowed
  • Raise a special error “Re-declaration of parameter”, if a parameter was shadowed
  • Allow hoisting of static class fields across different classes in the same module
  • Do parameter and arity checking on new
  • Wrong line number for dot operator
  • Inaccurate error message for generic instanceof
  • Only one error message for method overriding when inheriting from generic class with non-existent argument

Conclusion

Software quality at JS++ has always remained high, and that’s a testament to the ability of our lead engineer, Anton Rapetov, and our engineering approach. In more than 3.5 years of engineering, we currently have fewer than 10 open bug reports in the core compiler.

Tips & Tricks: Object-oriented Sorting in JS++ with IComparable<T>

JS++ makes object-oriented sorting easy with the IComparable<T> interface and the Comparison enumeration for type-safe (and readable) comparisons.

Here’s the code. (Don’t worry; I’ll dissect it.)

import System;

class Employee : IComparable<Employee>
{
    private string firstName;
    private string lastName;

    public Employee(string firstName, string lastName) {
        this.firstName = firstName;
        this.lastName = lastName;
    }

    public Comparison compare(Employee that) {
        // Sort by employee surname
        return this.lastName.compare(that.lastName);
    }

    public override string toString() {
    	return this.firstName + " " + this.lastName;
    }
}

Employee zig  = new Employee("Zig", "Ziglar");
Employee john = new Employee("John", "Smith");
Employee abe  = new Employee("Abe", "Lincoln");

Employee[] employees = [ zig, john, abe ];
employees.sort();
Console.log(employees.join(", "));

// Output:
// Abe Lincoln, John Smith, Zig Ziglar

This is beautiful, object-oriented code. All of the custom sorting logic is one line of code. Let’s break down how that happens step-by-step.

1. Implement IComparable<T>

The first step is to implement the IComparable<T> interface. The interface provides only one method to implement: compare.

compare expects the Comparison enumeration as a result. As we can see from the documentation, Comparison can have three possible results: LESS_THAN, GREATER_THAN, and EQUAL. While Java/C# expect -1, 0, and 1, JS++ gives you type-safe and readable comparisons.

IComparable<T> and Comparison form the basis for custom sorting.

2. Determine how to sort

We want to sort Employee objects based on the employee’s last name. In order to do this, we want to compare strings and sort in alphabetical order. While we can do this manually, the JS++ Standard Library already provides these comparisons for us for primitive types.

All primitive types in JS++ are auto-boxed. (Don’t worry, it gets optimized away.) In addition, all primitive types implement IComparable<T> (which provides the compare method).

Thus, since all primitive types provide the compare method, sorting is as easy as this one line of code:

return this.lastName.compare(that.lastName);

This is calling the System.String.compare method, which compares strings lexicographically (in alphabetical order). (Likewise, if you wanted to compare by employee ID number, you might declare an unsigned int and use System.UInteger32.compare.)

Thus, our sorting code and implementation of IComparable<T>.compare is just:

public Comparison compare(Employee that) {
    // Sort by employee surname
    return this.lastName.compare(that.lastName);
}

3. Define toString() Behavior

In addition, we want to be able to easily visualize our sorted arrays. Therefore, we should define how our Employee class looks when converted to a string so we can easily call System.Console.log on it.

JS++ internal types use a “unified type system” where everything inherits from System.Object. If we look at the System.Object.toString documentation, we can see that System.Object.toString is a virtual method based on its signature:

public virtual string toString()

We override it with this code:

public override string toString() {
    return this.firstName + " " + this.lastName;
}

Thus, whenever we want a string representation of our Employee object, we will get the employee’s first name followed by his last name. This will help us visualize our sorted employees.

4. Instantiate some Employees

The next lines of code instantiate the Employee class and inserts them in an array:

Employee zig  = new Employee("Zig", "Ziglar");
Employee john = new Employee("John", "Smith");
Employee abe  = new Employee("Abe", "Lincoln");

Employee[] employees = [ zig, john, abe ];

Currently, the array is unsorted, and “Zig Ziglar” will be the first element.

5. Sort the Array

Sorting is as simple as one line of code:

employees.sort();

It’s just one line of code because we implemented IComparable<T>. Instead of implementing IComparable<T>, we could have also used the other overload of Array.sort, which expects a callback:

employees.sort(Comparison(Employee a, Employee b) {
    return a.lastName.compare(b.lastName);
});

The callback allows flexibility; for example, you may choose to sort by employee first name in some cases.

Implementing IComparable<T> simply provides a default sort so you can use System.Array.sort without a callback. These are the signatures for the System.Array.sort overloads:

public T[] sort() where T: IComparable<T>
public T[] sort(Comparison(T element1, T element2) comparator)

Thus, if you do not provide a callback, you are using the overload that expects a class implementing IComparable<T>. If you try to sort objects whose respective classes do not implement the IComparable interface, you’ll receive an error:

[  ERROR  ] JSPPE5056: System.Array.sort()' can only sort classes implementing 'IComparable'. Please implement 'IComparable' for `Employee' or use 'System.Array.sort(Comparison(T element1, T element2) comparator) at line 23 char 0 at test.js++

6. Print the Result

The final step is to just print the result:

Console.log(employees.join(", "));

Et voila!

(The toString method we implemented earlier will get called for each element that gets joined. Thus, you get a readable output.)

JS++ 0.9.0: Efficient Compile Time Analysis of Out-of-Bounds Errors

I promised a breakthrough for our next release.

We are proud to announce JS++ efficiently analyzes and prevents out-of-bounds errors. An out-of-bounds error occurs when you attempt to access a container element that doesn’t exist in the container. For example, if an array has only three elements, accessing the tenth element is a runtime error.

In C, you risk buffer overflows. In C++, you risk buffer overflows and exceptions. In Java and C#, you get an exception at runtime. If exceptions are uncaught, the application terminates. If segmentation faults occur, the application terminates. In the case of buffer overflows, you open your application to a variety of exploits.

As we will show, we can perform out-of-bounds analysis with only a ±1-2ms (milliseconds) overhead on complex projects. There is virtually no effect on compile times with our invention.

Out-of-bounds errors have plagued computer science and programming for decades. Detecting these errors at compile time has ranged from slow to impossible, depending on the language design. With that said, let’s first explore the problems which influenced the design.

Problems

Basic Cases to Handle

In all of the following cases, you cannot predict the value at compile time:

import System;

int[] arr = [ 1, 2, 3 ];

Console.log(arr[Math.random(1, 100)]);
Console.log(arr[getUserInput()]);
Console.log(arr[getValueFromFile()]);
Console.log(arr[API.getTweetLimit()]);

JS++ doesn’t stop at array indexes. Array indexes are limited to numeric values. What about arbitrary string keys on System.Dictionary<T>? Yes, we handle these too:

import System;

auto dict = new Dictionary<string>();

Console.log(dict[Math.random(1, 100).toString()]);
Console.log(dict[getUserInput()]);
Console.log(dict[getTextFromFile()]);
Console.log(dict[API.getTwitterUsername()]);

These are the basic cases. It gets more complex with branching logic:

import System;

Dictionary<string> dict = {
    "1":  "a",
    "10": "b"
};

bool yes() {
    return Math.random(0, 100) > 50;
}

if (yes()) {
    dict["20"] = "c";
}

string key = Math.random(0, 100).toString();
if (dict.contains(key)) {
    Console.log(dict[key]);
}
else {
    Console.log(dict[key + "0"]);
}

These are the very basic cases. There are more… a lot more. All the corner cases you need to explore are outside the scope of this announcement.

Compile Times Must Be Fast

Efficiency is the key. We can’t announce 30% faster compile times in the previous release and simultaneously promise a breakthrough that will cause compile times to explode exponentially.

Clearly, following every branch, virtual function call, external function call, and then some would not be a realistic proposal.

First, let’s look at a basic benchmark so we know what we’re comparing against. In the last release, 0.8.10, I measured “Hello World” compile times. With all of the analyses we added in 0.9.0 (the latest release), how much did it increase compile times for “Hello World”? A little under two (2) milliseconds:

Version Total Time
JS++ 0.8.10 72.6ms
JS++ 0.9.0 74.2ms
(Lower is better)

The test system is the exact same as the one we used to measure compile times for 0.8.10:

Intel Core i7-4790k
32gb DDR3 RAM
Samsung 960 EVO M.2 SSD
Debian Linux 9

However, “Hello World” is not a perfect benchmark. How long does it take to compile real-world projects with thousands of lines of code that make lots of array and dictionary accesses? Here are three projects before we introduced compile-time analysis of out-of-bounds errors:

Compile times for 0.8.10 – before out-of-bounds checking
Line Count Source Files Count Total Time
1,137 lines 27 files 124.8ms
4,210 lines 42 files 164.4ms
6,019 lines 72 files 224.6ms
(Lower is better)

Here are compile times after we introduced analysis of out-of-bounds errors:

Compile times for 0.9.0 – detection of out-of-bounds at compile time
Line Count Source Files Count Total Time
1,140 lines 27 files 124.4ms
4,148 lines 41 files 165.4ms
5,942 lines 71 files 224.2ms
(Lower is better)

There’s a slight change in line and file counts due to the inclusion of a ‘Base64’ library, which – during the 0.9.0 refactoring – I removed and replaced with the Standard Library’s System.Encoding.Base64. (The code is the exact same.)

The above projects include both frontend and backend code. They include lots of modules, classes, arrays, dictionaries, and other complexities. I’ve included source file counts to account for disk I/O.

It can be observed that there is virtually no performance penalty for dealing with out-of-bounds errors at compile time. The results are within ±1ms (milliseconds).

Nullable Types are a Problem

Expressing nullability is important in computer programming. For example, a file might have a creation date and last access time. For a new file, there may never have been a “last access time”; thus, it might be ideal to use a nullable data type in this case.

Nullable types are a solved problem in other languages. We considered having Array<T> return T?, but there would be issues with that as presented by Anton Rapetov, our lead compiler engineer:

int[] intArr = [ 1, 2 ];
int? intEl2 = intArr[2];
if (intEl2 == null) {
    Console.log("Definitely out of bounds");
}

int?[] nullIntArr = [ 1, null ];
int? nullIntEl2 = nullIntArr[2];
if (nullIntEl2 == null) {
    Console.log("Might be out of bounds, might just be an access of a null element");
}

Usability

Even if returning nullable types worked, there would be significant usability issues. For example, the following is common code:

int[] arr = [ 1, 2, 3 ];
for (int i = 0, len = arr.length; i < len; ++i) {
    arr[i]++;
    // or
    arr[i] += 1;
}

In the above code, it's clear an out-of-bounds access can never occur. Nonetheless, if an array access returns T?, type conversions would be necessary before the ++ or += 1 operations can occur so we aren't adding to a null value. We need a way to avoid making the user do this for common operations. In fact, for common operations, we want you to be able to write the code exactly as you would above.

Exceptions

In a statically-typed programming language, exceptions allow us to return T for an Array<T> without compromising correctness on an out-of-bounds access. For example, if we declare an Array<int>, a 'pop' method can only return a value of type int or throw an exception. If an exception is thrown, it is none of the concern of the type checker. At compile time, it would not be possible to determine whether or not the exception will be thrown. Yet, an uncaught exception will result in premature program termination at runtime.

Here's an example of how exceptions might be implemented for a container in JS++:

class Array<T>
{
    var data = [];

    T pop() {
        if (this.data.length > 0) {
            return this.data[this.data.length - 1];
        }
        else {
            throw new OutOfBoundsException("Array is empty.");
        }
    }
}

By using exceptions, we never sacrifice type checker performance and compile times. Bounds checking is still performed, but the dark side of exceptions is that it can terminate the application if uncaught.

If we avoid throwing exceptions, and just let JavaScript return undefined, we'd be walking into TypeScript territory and just letting our type system become unsound because it would be "practical." While you might convert undefined to int as zero, there aren't always sensible default values for all JS++ types (e.g. a callback type or a non-nullable class Foo with no default constructor). Speaking of default values...

Default Initialization

Facebook discovered a problem with C++ maps and default initialization in their code that can lead to bugs:

std::unordered_map<std::string, int> settings{};

// ...

std::cout << "Timeout: " << settings["timeout"] << std::endl;

In the above code, simply printing the value of "timeout" can cause it to be zero-initialized. This led us to conclude that default initialization of missing keys would not be a solution. Default initialization of a map of word counts to zero for missing words is innocuous, but an accidental initialization of timeout or price values to zero can lead to substantially different bug severities.

The Breakthrough: null vs undefined

We wanted to have nullable types in the language. We wanted programmers to be able to express the following:

class Person
{
    string firstName = "";
    string? middleName = null;
    string lastName = "";
}

As of the latest release (0.9.0), the above code will work because we've introduced nullable types.

However, I want to revisit an example on nullable types earlier. When we decided to move forward with nullable types, a suggestion was brought up to return T? from array accesses. This example was given as a counter-argument:

int[] intArr = [ 1, 2 ];
int? intEl2 = intArr[2];
if (intEl2 == null) {
    Console.log("Definitely out of bounds");
}

int?[] nullIntArr = [ 1, null ];
int? nullIntEl2 = nullIntArr[2];
if (nullIntEl2 == null) {
    Console.log("Might be out of bounds, might just be an access of a null element");
}

We keep a record of all our meetings. While we didn't explicitly discuss undefined at all, I was in a hurry and summarized our meeting as:

>>>> * There's a problem differentiating between 'null' and
>>>> 'undefined':
>>>>
>>>> ```
>>>> Foo?[] arr = [new Foo(), null];
>>>> auto el1 = arr[1];
>>>> auto el2 = arr[2];
>>>> ```
>>>>
>>>> el1 has type Foo?
>>>> el2 has type Foo?
>>>>
>>>> el1 has value null // el1 exists but is null
>>>> el2 has value null // el2 does NOT exist but is also null
>>>>
>>>> To deal with this, we can add a `hasIndex(int i)` method to
>>>> containers.

Subconsciously, this led to the realization that all we had to do was differentiate between null and undefined in our type system.

Introducing Existent Types

In JavaScript, null means that a value exists but is an "empty value," and undefined means no value exists at all. A basic example is here:

var x = null;
var y;

console.log(x); // null
console.log(y); // undefined

This illustrates the basic concept; unfortunately, JavaScript is inconsistent:

var x = null;
var y;
var z = undefined;

console.log(x); // null
console.log(y); // undefined
console.log(z); // undefined

JS++ has different semantics. First of all, in JS++, all variables must be initialized; therefore, you can't have a variable reference return undefined... ever. Secondly, null means "empty value," but undefined in JS++ means "out-of-bounds error."

JS++ introduces existent types, which uses the + syntax, to describe container accesses:

int[] arr = [ 7, 8, 9 ];

int+ x = arr[0];
int+ y = arr[1000];

We can think of existent types as the "bounds-checked type." I'm a big believer in simplicity. Rather than trying to calculate whether the container access is within-bounds or out-of-bounds at compile time, we delay this check to runtime via the code generator. Existent types are not purely a type checking innovation. The type provides guidance to the code generator to generate code such as the following:

int[] arr = [ 7, 8, 9 ];

int+ x = 0 < arr.length ? arr[0] : undefined;
int+ y = 1000 < arr.length ? arr[1000] : undefined;

We don't actually generate code this way, but it helps illustrate the concept for developers coming from backgrounds in C, C++, C#, Java, etc.

By default, int+ and int are not compatible types. I'll start by introducing the "safe default operator":

int[] arr = [ 7, 8, 9 ];

int+ x = arr[0];
int+ y = arr[1000];

int a = x ?? 0;
int b = y ?? 1;

The "safe default operator" will check if the left-hand side is undefined. If the value is undefined, the evaluated value of the right-hand side of the ?? operator is returned. Otherwise, the left-hand side is returned. In the case of the example above, 'a' will have the value of 7 because 'x' was within-bounds. 'b' will have the value of '1' because 'y' was out-of-bounds, and, thus, the alternative value provided to the ?? operator was used.

T+ cannot be the element type

The problem with JavaScript is that you can have an array of undefined values:

var arr = [ undefined, undefined, undefined ];

In the above case, JavaScript would not be able to differentiate between a within-bounds undefined and an out-of-bounds undefined. In JS++, an existent type cannot be the element type of an array or other container:

int+[] arr = []; // ERROR

[ ERROR ] JSPPE5204: Existent type `int+' cannot be used as the element type for arrays

Therefore, the invention of existent types cannot be retroactively applied to JavaScript.

If you want to represent an array element as having an "empty" value, you have to use nullable types...

Nullable Types + Existent Types

The following describes the basic syntax for the nullable and existent types being introduced in version 0.9.0:

int a = 1;  // 'int' only
int? b = 1; // 'int' or 'null'
int+ c = 1; // 'int' or 'undefined'

However, sometimes we want an array element to contain the "empty" value. In this case, we can combine nullable types with existent types using the following syntax:

int?+ d = 1; // 'int' or 'null' or 'undefined'

In this way, JS++ doesn't have the ambiguity of an undefined value that can be a within-bounds access and also an out-of-bounds access.

Usage with Dictionaries

Existent types can also be used with System.Dictionary<T>. We just introduced how nullable and existent types can be combined so let's use the combination:

import System;

Dictionary<bool?> inviteeDecisions = {
    "Roger": true,
    "Anton": true,
    "James": null, // James is undecided
    "Qin": false
};

bool?+ isJamesAttending = inviteeDecisions["James"]; // 'null'
bool?+ isBryceAttending = inviteeDecisions["Bryce"]; // 'undefined'

In the above code, we use the ?+ syntax to combine nullable and existent types. We're throwing a party, and we want to keep track of the decisions of our invitees. If the invitee's decision is true, he's coming to the party. If the invitee's decision is false, he won't be attending. If the invitee's decision is null, he is undecided. Finally, if the invitee's decision evaluates to undefined, he was not actually invited.

Naturally, the operators that apply to nullable types and existent types (such as the ?? safe default operator) will also apply to the combined ?+ type as well. Code will just be generated to check for null and undefined when using the combined ?+ type.

Beyond arrays and dictionaries, existent types can be applied to the other Standard Library containers (such as Stack<T> and Queue<T>) and even user-defined containers.

Safe Navigation Operator

Besides not being able to differentiate from a within-bounds undefined from an out-of-bounds undefined, JavaScript suffers from another problem:

var arr = [ 1 ];
console.log( arr[1000].toString() );
console.log( "This will never get logged." );

The above code will never reach line 3. The reason is because arr[1000] evaluated to undefined, and you can't call the toString() method on undefined so you'll get a runtime 'TypeError'. In JS++, this isn't a problem because the compiler will detect your attempt to use the . operator and suggest for you to use the ?. safe navigation operator instead:

import System;

int[] arr = [ 1 ];
Console.log( arr[1000].toString() );
Console.log( "This will eventually get logged." );

[ ERROR ] JSPPE5200: The '.' operator cannot be used for nullable and existent types (`int+'). Please use the '?.' safe navigation operator instead at line 4 char 13

If we refactor, we'll discover that, unlike the ?? safe default operator, ?. can return undefined and evaluates to an existent type T+:

import System;

int[] arr = [ 1 ];
Console.log( arr[1000]?.toString() );
Console.log( "This will eventually get logged." );

[ ERROR ] JSPPE5024: No overload for `System.Console.log' matching signature `System.Console.log(string+)' at line 4 char 0

So one possible fix is to provide a default value:

import System;

int[] arr = [ 1 ];
Console.log( arr[1000]?.toString() ?? "out of bounds" );
Console.log( "This will eventually get logged." );

It finally compiles, and we get the following output:

out of bounds
This will eventually get logged.

No crashes and no exceptions can occur.

Inspecting 'undefined'

Oftentimes, when you encounter an out-of-bounds error, you might want to skip to the next iteration over the container or return from a function. Essentially, you want to "skip" code that was written for within-bounds accesses. In JS++, it's as simple as comparing against the undefined value:

import System;

int[] arr = [ 1 ];

for (int i = 0; i < 10; ++i) {
    int+ element = arr[i];
    if (element == undefined) {
        continue;
    }

    int x = (int) element;

    Console.log(x + 1);
    Console.log(x + 2);
    Console.log(x + 3);
}

The C-style cast to int is safe because we already checked for and skipped out-of-bounds accesses. We can also use the safe default operator instead in the code above.

Finally, our output:

2
3
4

This allows us to elegantly write large chunks of code for within-bounds accesses while skipping, returning, or just ignoring out-of-bounds accesses. We can even log the out-of-bounds error to stderr by using System.Console.error.

Downloads

We're providing download links for the latest release (0.9.0) and the previous version (0.8.10). We want you to be able to verify our claims and benchmarks.

JS++ 0.9.0 (latest) – includes out-of-bounds checking
Platform Download Link
Windows Download (32- and 64-bit)
Mac OS X Download (32- and 64-bit)
Linux Download (32-bit)
Download (64-bit)

JS++ 0.8.10 – before out-of-bounds checking
Platform Download Link
Windows Download (32- and 64-bit)
Mac OS X Download (64-bit)
Linux Download (32-bit)
Download (64-bit)

What's Next?

Our first priority is to manage engineering complexity. We have to refactor our tests, and none of this will show up for you, the user. As I write this, I don't know what to expect. Existent types can bring demand for JS++, but we don't have the resources to manage this demand. Instead, we have to stay disciplined in sticking to our own internal schedules to ensure the long-term success of JS++. We listen to user input, but we don't (and can't) follow hype and trends. JS++ over the next 25 years will be more important than JS++ over the next 25 days. I point to Haskell as an example: it's a programming language that is well thought-out and has persisted for 29 years.

We have users that have followed us for years, and we thank all of them for giving us the motivation to persist. If you're willing to be patient and watch JS++ evolve, I urge you to join our email list. The sign-up form for our email list can be found by scrolling to the bottom of this page.

Final Words

Existent types were co-invented by me and Anton Rapetov (lead compiler engineer for JS++).

We solved compile-time analysis of out-of-bounds errors via traditional nominal typing. Thus, there is no performance difference for JS++ checking whether int can be assigned to string or whether int+ can be assigned to string. This explains the ± 1ms compile time difference for compile time out-of-bounds analysis.

We place heavy emphasis on compile times because we know long compile times hurt developer productivity.

When existent types are used correctly, you should never get premature or unexpected program termination.

There is a full tutorial on nullable and existent types available here.

JS++ 0.8.10: Faster Compile Times, Stacks/Queues, Unicode, Base64, and More

The next version of JS++ (not this one) will be a breakthrough. It is on par with ‘external’ in its importance to JS++. Stay with us, and stay tuned.

In this latest release of JS++, we’ve done a lot: we’ve improved on our compile times which already lead the competition by an order of magnitude (with room for more improvement), we’ve substantially expanded the Standard Library, made a UX improvement to generics (without breaking any existing code), and we’ve fixed a lot of bugs (which are mostly minor at this point after years of engineering).

Faster Compile Times

import System;

Console.log("Hello World");

On the Core i7-4790k:

Version Total Time
JS++ 0.8.5 96.2ms
JS++ 0.8.10 72.6ms
(Lower is better)

System Specifications:

  • Intel Core i7-4790k
  • 32gb DDR3 RAM
  • Samsung 960 EVO M.2 SSD

As evidenced in the table above, the latest version is now compiling “Hello World” 32.51% faster. There is still room for more improvement to compile times, but it is not our current priority. We improved compile times in this release by pre-parsing and caching the JS++ Standard Library. There is more to Standard Library compilation than this single step, but we wanted to address this problem as it was the largest performance regression in our profiling. Practically, this allows us to substantially expand the JS++ Standard Library with an O(log N) cost to compile times versus the previous O(n) cost of adding new libraries.

And because we’re able to add substantially more libraries… we’ve done exactly that.

System.Stack<T> and System.Queue<T>

JS++ now provides stack (LIFO) and queue (FIFO) data structures. Stacks are an abstraction over JavaScript arrays and are very fast.

import System;

auto stack = new Stack<int>();
stack.push(1);
stack.push(2);
Console.log(stack.pop()); // 2

In the generated code, the ‘push’ call is a direct ‘push’ call on the internal array representation behind the stack so there is no performance loss in using the stack abstraction instead of arrays.

Next, we also have queues via System.Queue. It’s not as simple as an abstraction over JavaScript arrays like stacks because re-indexing an array would be an O(n) operation. Instead, to guarantee O(1) pop operations, we use a ring buffer. (Credit goes to Anton, our lead engineer.)

import System;

auto queue = new Queue<int>();
queue.push(1);
queue.push(2);
Console.log(queue.pop()); // 1

At a micro-optimization level, you might question our decision on the ring buffer backing store. However, computer science is important. If you take a look at my JS++ stringset library, you’ll see this benchmark over a dictionary of ~49,000 terms:

StringSet : 70ms     (00.07 seconds)
string[]  : 82299ms  (82.29 seconds)

Clearly, the performance difference here is substantial, but it should be no surprise to anyone that understands data structures: an array has O(n) lookups, and a set has O(1) lookups.

Documentation:

System.Encoding

This module introduces a lot of useful new features, but I’m going to break it down.

Base64 Encoding and Decoding

A common operation in web development is Base64 encoding and decoding. For example, the HBase REST API requires Base64 encoding/decoding. You can also use Base64 to encode binary data and files as ASCII, such as converting canvas image data into data URIs.

Wouldn’t it be nice to have all of this functionality readily available in the language you use?

import System;
import System.Encoding;

string quote = "Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.";

string encoded = Base64.encode(quote);
Console.log(encoded);

string decoded = Base64.decode(encoded);
Console.log(decoded);

The more you master JS++ and know how to use the Standard Library, the more efficiently you can get work done compared to JavaScript.

Documentation: System.Encoding.Base64

UTF-8, UTF-16, and UTF-32

Dealing with Unicode is a key aspect of writing world-ready software. UTF-8, UTF-16, and UTF-32 are encoding schemes defined in the Unicode Standard, and JS++ now supports encoding and decoding of all of these in the System.Encoding module.

Here’s an example of UTF-8 encoding:

import System;
import System.Encoding;

byte[] encoded = UTF8.encode("€");

string toHex = "";
foreach(byte b in encoded) {
    toHex += "\\x" + b.toHex().toUpperCase();
}

Console.log(toHex); // "\xE2\x82\xAC"

We just want to take a moment to remind you the importance of the JS++ type system in scenarios like this and how it naturally interoperates with so many areas of computing. In this case, the ‘byte’ data type is a natural fit for dealing with standard Unicode encoding schemes.

Documentation:

URI Encoding and Decoding

This module provides the ECMAScript 3 encodeURI, encodeURIComponent, decodeURI, and decodeURIComponent functions.

Documentation: System.Encoding.URI

Superior Documentation

Documentation is one of the strengths of JS++. We have over 600+ pages of handwritten documentation.

With the release of the System.Encoding.URI module, we wanted to expand on this. Have you ever wondered about the difference between encodeURI and encodeURIComponent in JavaScript? This unfortunate naming scheme, dating back to ECMAScript 3, is a source of confusion. Developers often cite the Mozilla Developer Network (MDN) for documentation, but their explanation is equally as confusing as the naming scheme and lacks useful information:

encodeURIComponent

Stack Overflow wasn’t tremendously more helpful either, and most answers once again just go over which characters get converted and which do not. This doesn’t help the practicing developer learn or memorize which function to use.

Fortunately, JS++ has you covered. We explain the difference between encodeURI and encodeURIComponent clearly, and we also provide an explanation of best practices to help you navigate the confusion:

https://docs.onux.com/en-US/Developers/JavaScript-PP/Language-Guide/encodeuri-vs-encodeuricomponent

Improving Generic Programming UX: Default Constraint

We’ve improved the user experience (UX) for generic programming in JS++.

Previously, the default constraint for generic type parameters was System.Object. However, for performance reasons, the default constraint did not allow primitive types. For example:

class Foo<T> // same as 'Foo<T: System.Object>'
{
}

auto foo = new Foo<string>(); // ERROR (previously)

Instead, you needed to specify the “wildcard constraint” (see the docs) if you wanted to allow primitive types as arguments, as System.Array and System.Dictionary do.

However, this was not the most useful default. All your old code will still work, but we just made the defaults more useful. Starting from version 0.8.10, all your generic classes will have the wildcard constraint as the default. You can declare a plain generic class and instantiate it with primitive types as type arguments.

‘this’ Semantics

First and foremost, before we announce this change, we have to say all semantics relating to this are subject to change.

We’ve been well aware of how JavaScript’s this rules differ from the semantics that users from other programming languages are familiar with. However, previous versions of JS++ raised a ‘0000’ (unimplemented) error here, and the fix was non-obvious:

class Foo
{
    string message = "Test";
    void bar() {
        $(document).click(void() {
            $("#button").text(this.message); // JSPPE0000 error
        });
    }
}

You had to manually capture the ‘this’ value for closures inside classes:

class Foo
{
    string message = "Test";
    void bar() {
        Foo _this = this;
        $(document).click(void() {
            $("#button").text(_this.message); // OK
        });
    }
}

First of all, I apologize for the cryptic error message. We work very hard to make sure our error messages are easy to understand, and we always try to suggest the fix in the error message itself where possible. In this case, we always thought the ‘this’ semantics would be settled within a reasonable time and thought the “unimplemented” error would only be temporary. We were wrong. Thus, it’s important to know we may change the rules and semantics again prior to JS++ 1.0.

In the current release, the this keyword – when used inside classes – refers to the class instance by default. There are cases where you might want it to retain JavaScript this semantics, and, at least for closures inside classes declared with the function return type, you can cast it to external:

class Foo
{
    string message = "Test";
    void attachEvents() {
        $("#button").click(function() {
            var $this = (external) this;
            $this.text(this.message); // OK
        });
    }
}

We are open to input on this functionality.

You can review our plans on the this documentation page under the header, “‘this’ Casting inside Classes”.

Bug Fixes

  • Disable auto creation of the arguments object
  • File extensions are case-insensitive
  • Fix crash during access to a property with undeclared type (reported by @lorveg in JS++ chat)
  • Fix contravariance check in foreach loop
  • Incorrect scoping and code generation for ‘external’
  • Segfault for interface with generic variants
  • Fix missing class name for inheritance in error message

Tips & Tricks: Overriding ‘toString’

JS++ has a default ‘toString’ method implementation but, sometimes, it is necessary to override this implementation. For example, when using Console.log, it may be desirable to be able to fully log and inspect a complex JS++ object.

In addition to the Unified External Type, there is also a “Unified Internal Type”: System.Object. All JS++ classes, including user-defined classes, inherit from System.Object. Due to auto-boxing, even primitive types such as int (wrapped by System.Integer32), inherit from System.Object.

Aside: Don’t worry about the performance implications of auto-boxing. JS++ is able to optimize auto-boxing to the point that toString is actually 7.2% faster in JS++ than JavaScript in the worst case (assuming the JavaScript variable is monomorphically-typed) and more than 50% faster for polymorphically-typed (and potentially type-unsafe) JavaScript variables as shown in benchmarks here.

System.Object has a toString method which is marked as virtual. In other words, this method can be overridden by derived classes – which are effectively all classes in JS++. Here’s an example of how to do it:

import System;

class Point
{
    int x;
    int y;

    Point(int x, int y) {
        this.x = x;
        this.y = y;
    }

    override string toString() {
        return "(" + x.toString() + ", " + y.toString() + ")";
    }
}

Point p = new Point(1,2);
Console.log(p); // "(1, 2)"

You’ll notice the Console.log statement doesn’t even make an explicit toString call. The reason is because passing any JS++ object to Console.log will call the toString method on the object for you.

Join the JS++ Chat Room

If you need instant help or just want to talk about JS++, feel free to join us in the JS++ Chat Room:

https://chat.onux.io/signup_user_complete/?id=6eoxd4erotgwdgin8p6x9uupyc

The special thing about this chat room is that our internal team uses this chat software to collaborate. While we are mostly in private channels, we are almost always available in the public chat rooms. If you need help, you can get help instantly from the creators of JS++.