JS++ 0.8.4: Advanced Generics and System.String Expansion

We have significantly expanded the Standard Library with this release. In particular, System.String has undergone significant expansion.

System.String Highlights

between

string quotedWords = '"duck" "swan" "crab"';
// 'between' is smart enough to allow the same string to be used as a start and end delimiter
string[] words = quotedWords.between('"', '"');
Console.log(words); // [ "duck", "swan", "crab" ]

Documentation page: click here

format: C-like printf

"%s is %d years old".format("Joe", 10) // Joe is 10 years old

Documentation page: click here

escape

"a\r\nb".escape() // a\\r\\nb

Documentation page: click here

truncate

string text = "The quick brown fox jumped over the lazy dog.";
 
Console.log(text.truncate(9)); // "The quick..."

Documentation page: click here

repeat

"*".repeat(5) // *****

Documentation page: click here

count

"foobar".count("foo") // 1
"FOOBAR".icount("foo") // 1

Documentation (count): click here
Documentation (icount): click here

contains

"abc".contains("b") // true
"ABC".icontains("b") // true

Documentation (contains): click here
Documentation (icontains): click here

New System.String Methods

Here are all the new methods available for strings in JS++:

  • between – Gets substrings between two delimiters (does not use regex)
  • compact – Removes whitespace globally
  • contains/icontains
  • count/icount
  • countLines
  • countNonEmptyLines
  • startsWith/endsWith
  • escape/unescape – Escape the escape sequence characters (e.g. \n -> \\n)
  • escapeQuotes/unescapeQuotes
  • format – Similar to C’s printf
  • insert/append/prepend
  • isEmpty – uses .length === 0 rather than str1 === “” for performance, not everyone has time to benchmark every detail
  • isLowerCase
  • isUpperCase
  • isWhitespace
  • joinLines – collapses a string composed of multiple lines into a single line
  • joinNonEmptyLines
  • padLeft/padRight – remember the NPM debacle?
  • quote/quoteSingle – wraps the string in quotes
  • unquote – removes quote pairs
  • repeat – “*”.repeat(3) == “***”
  • reverse
  • splitLines – splits a string into a string[] (array) based on newlines
  • trim, trimLeft, trimRight, trimMulti, trimMultiLeft, trimMultiRight
  • truncate – Cuts off the string at the specified length (with support for custom ellipsis)

There are close to 50 new string methods (48 including overloads, 39 otherwise), and these methods should cover most application-level usages. With documentation, this resulted in +1400 new lines of code to System.String. I’m happy to announce we actually still have more methods (for System.String and others) on the way.

Every single method is documented. All documentation is online and available at the System.String index page.

We avoided regular expressions as much as possible to avoid runtime FSM construction, which takes time and space. Therefore, prefer JS++ methods such as "abc".endsWith("c") over the traditional regex/JavaScript /c$/.test("abc").

The best thing about JS++ is that it’s a compiled language. This gives you performance benefits that a JavaScript library with string utilities can never give you. For example:

if ("abc".isEmpty());

becomes:

if ("abc".length===0);

and

"abc".quote()

becomes:

'"'+"abc"+'"'

The astute observer will notice that both the above methods can be further optimized to reach “perfect” optimization. However, there is no optimizing compiler inside JS++ yet, and inserting branching logic into the code generator will result in technical debt.

Our goal with the Standard Library is to make it easier than ever to write applications compared to JavaScript. Side effects of our work on the JS++ Standard Library are performance, size, and correctness. JS++ dead code elimination means we can add hundreds of methods to System.String, but you only pay for the methods you actually use. For performance, not every team can afford to hire a JavaScript performance expert. Even if you have the performance expert, he can’t be expected to micro-optimize and benchmark every method.

Finally, with the JS++ Standard Library, we can fully avoid the NPM left-pad debacle.

import System;

Console.log("1".padLeft(4, "0")); // "0001"

fromString

Previously, to convert a string to number in JS++, it was a little unintuitive. For example:

int x = +"1000"; // use the unary + operator

For all numeric types, we’ve introduced the fromString, fromStringOr, and fromStringOrThrow static methods. The above example can be re-written to use Integer32.fromString:

int x = Integer32.fromString("1000");

Advanced Generics

JS++ 0.8.4 introduces covariant and contravariant generic types (including upcasting and downcasting for types with variants). Covariance and contravariance are based on use-site variance. At this time, we are not introducing declaration-site variance at all; we have higher priorities. In addition, we’ve introduced generic constraints (subtype constraints, multiple constraints, wildcard constraints, and more).

Finally, we have support for generic functions and generic static methods.

Everything from basic to advanced generic programming in JS++ is covered in our generic programming documentation.

When we released version 0.8.0, we introduced only basic generics. In today’s 0.8.4 release, you can consider generics fully implemented.

I highly encourage reading the generic programming documentation. To put it all together, here’s generic covariance and contravariance together with use-site variance:

import System;
 
abstract class Animal {}
class Tiger : Animal {}
 
abstract class Pet : Animal {}
class Dog : Pet {}
class Cat : Pet {}
 
class PetCollection
{
    Pet[] data = [];
 
    void insert(descend Pet[] pets) {
        foreach(Pet pet in pets) {
            this.data.push(pet);
        }
    }
 
    ascend Pet[] get() {
        return this.data;
    }
}
 
auto myPets = new PetCollection();
 
// Write operations (descend, covariance)
myPets.insert([ new Dog, new Cat ]);
// myPets.insert([ new Tiger ]); // not allowed
 
// Read operations (ascend, contravariance)
Pet[] getPets = [];
Animal[] getAnimals = [];
ascend Pet[] tmp = myPets.get(); // read here
foreach(Pet pet in tmp) { // but we still need to put them back into our "result" arrays
    getPets.push(pet);
    getAnimals.push(pet);
}
 
// Now we can modify the arrays we read into above
getPets.push(new Dog);
getAnimals.push(new Dog);
getAnimals.push(new Tiger);
// getPets.push(new Tiger); // ERROR

Other Changes

  • Fix return types for System.String.charAt and System.String.charCodeAt
  • Fix type promotion to ‘double’. We now handle this better than languages like Java and C#. Thanks to our lead engineer, Anton, for the idea.
  • isEven() and isOdd(). You might think this is fizz buzz, but if you’re using the modulus operator, it’ll be slower. We use bitwise operations, and you might be interested in reading this article on how we took advantage of overflow behavior to improve performance while preserving correctness.
  • Fix System.Array.map and System.Array.reduce to support wildcard generic types.
  • Type inference of generic parameters for function calls. This is needed for System.Array.map and System.Array.reduce, but it’s also available for user-side code.
  • Fix System.Console.error when no console is available
  • Fixed error message with incorrect type for setters defined with no accompanying getters.
  • Fixed private access modifier for modules in a multi-file setting.
  • Fix callback types as generic arguments
  • Fix enum bitwise operations to reduce explicit casting

Bitwise Operators and Specification-compliant Integer Overflow Optimizations

If you use the upcoming UInteger32.isEven or isOdd methods, you’ll notice that it uses a bitwise AND operation. The reason, as described in a previous post, is because it improves performance.

However, while this is straightforward for all other integer wrapper classes, UInteger32 is an exception. According to ECMAScript 3 11.10:

The production A : A @B, where @ is one of the bitwise operators in the productions above, is evaluated as follows:

  1. Evaluate A.
  2. Call GetValue(Result(1)).
  3. Evaluate B.
  4. Call GetValue(Result(3)).
  5. Call ToInt32(Result(2)).
  6. Call ToInt32(Result(4)).
  7. Apply the bitwise operator @ to Result(5) and Result(6). The result is a signed 32 bit integer.
  8. Return Result(7).

The operands (and, thus, the result type) for the bitwise AND operation in ECMAScript are converted to 32-bit signed integers. System.UInteger32 represents an unsigned 32-bit integer.

This is inconvenient because we’d obviously have to fall back to the slower modulus operation for isEven/isOdd on UInteger32. Unless…

((Math.pow(2, 32) + 1) >>> 0 | 0) & 1 // true
((Math.pow(2, 32) + 2) >>> 0 | 0) & 1 // false

We can take advantage of overflow behavior. (Note: Since we’re able to get the correct result by leveraging overflow behavior, we actually don’t perform the extraneous zero-fill right shift as illustrated in the example.)

This is completely safe because:

A) Mathematically, in base 2, the last bit will always be 1 for odd numbers and 0 for even numbers… no matter how big the number is.

B) Bitwise AND will compare both bits in equal-length binary forms. Thus, no matter how big the number is, when you AND against 1, it will always be 0000001 (or zero-padded to whatever length is needed). Therefore, all the preceding bits don’t matter because they will always be ANDed against a zero bit. The only bit that matters is the trailing bit; see A for why this will always work.

Standard Library Performance: isEven() and isOdd()

Remember what I always advise: always use the JS++ Standard Library if you can. The methods aren’t just well-tested for validity, but we also test for performance.

Checking if a number is even or odd is the classic fizzbuzz test. Most professional developers can use the modulus operator. However, that’s not always the fastest implementation.

> var t = new Date(); var x; for (var i = 0; i < 50000000; ++i) x = (i & 1) == 0; console.log(x); new Date - t;
false
87
> var t = new Date(); var x; for (var i = 0; i < 50000000; ++i) x = (i & 1) == 0; console.log(x); new Date - t;
false
90
> var t = new Date(); var x; for (var i = 0; i < 50000000; ++i) x = (i & 1) == 0; console.log(x); new Date - t;
false
86
> var t = new Date(); var x; for (var i = 0; i < 50000000; ++i) x = (i & 1) == 0; console.log(x); new Date - t;
false
86
> var t = new Date(); var x; for (var i = 0; i < 50000000; ++i) x = (i & 1) == 0; console.log(x); new Date - t;
false
86

= 87ms

> var t = new Date(); var x; for (var i = 0; i < 50000000; ++i) x = (i % 2) == 0; console.log(x); new Date - t;
false
105
> var t = new Date(); var x; for (var i = 0; i < 50000000; ++i) x = (i % 2) == 0; console.log(x); new Date - t;
false
100
> var t = new Date(); var x; for (var i = 0; i < 50000000; ++i) x = (i % 2) == 0; console.log(x); new Date - t;
false
100
> var t = new Date(); var x; for (var i = 0; i < 50000000; ++i) x = (i % 2) == 0; console.log(x); new Date - t;
false
104
> var t = new Date(); var x; for (var i = 0; i < 50000000; ++i) x = (i % 2) == 0; console.log(x); new Date - t;
false
101

= 102ms

Node.js v8.11.1 Linux x64
Core i7-4790k, 32gb RAM
var t = new Date(); var x; for (var i = 0; i < 5000000; ++i) x = (i & 1) == 0; console.log(x); console.log(new Date - t);
false debugger eval code:1:80
1948 debugger eval code:1:96
var t = new Date(); var x; for (var i = 0; i < 5000000; ++i) x = (i & 1) == 0; console.log(x); console.log(new Date - t);
false debugger eval code:1:80
2072 debugger eval code:1:96
var t = new Date(); var x; for (var i = 0; i < 5000000; ++i) x = (i & 1) == 0; console.log(x); console.log(new Date - t);
false debugger eval code:1:80
2086 debugger eval code:1:96
var t = new Date(); var x; for (var i = 0; i < 5000000; ++i) x = (i & 1) == 0; console.log(x); console.log(new Date - t);
false debugger eval code:1:80
2092 debugger eval code:1:96
var t = new Date(); var x; for (var i = 0; i < 5000000; ++i) x = (i & 1) == 0; console.log(x); console.log(new Date - t);
false debugger eval code:1:80
2102

= 2060ms

var t = new Date(); var x; for (var i = 0; i < 5000000; ++i) x = (i % 2) == 0; console.log(x); console.log(new Date - t);
false debugger eval code:1:80
2058 debugger eval code:1:96
var t = new Date(); var x; for (var i = 0; i < 5000000; ++i) x = (i % 2) == 0; console.log(x); console.log(new Date - t);
false debugger eval code:1:80
2082 debugger eval code:1:96
var t = new Date(); var x; for (var i = 0; i < 5000000; ++i) x = (i % 2) == 0; console.log(x); console.log(new Date - t);
false debugger eval code:1:80
2114 debugger eval code:1:96
var t = new Date(); var x; for (var i = 0; i < 5000000; ++i) x = (i % 2) == 0; console.log(x); console.log(new Date - t);
false debugger eval code:1:80
2102 debugger eval code:1:96
var t = new Date(); var x; for (var i = 0; i < 5000000; ++i) x = (i % 2) == 0; console.log(x); console.log(new Date - t);
false debugger eval code:1:80
2104 debugger eval code:1:96

= 2092ms

Firefox 59.0.2, Linux x64
Core i7-4790k, 32gb RAM

While the results are not statistically significant in Firefox (because it's very possible SpiderMonkey is manually optimizing this case via a pattern-matched optimization), you can get a 17% performance gain in Node.js via bitwise AND.

Due to all the layers of abstraction in JavaScript, it's not entirely evident how much faster a bitwise AND for isEven/isOdd can really be. In our benchmarks, we were able to achieve a 17% performance improvement in Node.js. As our lead engineer pointed out via email, according to this table, "for Intel Skylake-X `div` has a latency of 26 (for 32-bit integers), whereas `and` has latency 1 ("reciprocal throughput" has similar difference) so it is an order of magnitude slower, not 20% as in your tests."

Look for isEven() and isOdd() to appear in a future version of the JS++ Standard Library.

You may also be interested in reading Part II of this post which describes how we leveraged overflow behavior to improve performance while preserving correctness for UInteger32.