Asynchronous JavaScript

JavaScript Chats with ACM Hack Session 2

October 14, 2019

Definitions
JavaScript and asynchrony
The event loop
Introducing JavaScript Promises
async functions

Definitions

What does it mean for an operation to be synchronous, or asynchronous?

Let’s first get on the same page with regards to the terminology. For the purpose of this workshop, we define an operation to be synchronous or blocking if the operation would wait until the desired result is available, and then return it. A classic example for a synchronous operation would be POSIX’s read() system call by default, but Java’s InputStream.read() acts in a similar way. Consider, for instance, this piece of Java code:

int ch = System.in.read();

The InputStream.read() function would return the next byte of data on standard input directly. On the other hand, if the standard input is connected to a terminal and the user does not type anything, then this function is essentially going to wait – and block your application – forever. Assuming your application is single-threaded, there is no way for it to do any work until the user types something.

On the other end of the spectrum, we have asynchronous, or non-blocking, operations. The defining characteristic of this class of operations is that they would not wait for data to become available, but variations exist on how data is actually delivered. Some, like POSIX’s read() system call using non-blocking mode, returns an error if the data is not yet available or the data if it is. Others, like many APIs found in JavaScript, would notify the application when the data becomes available at a future time.

The main benefit of making operations asynchronous is enabling concurrency. While the operation is being carried out in the background, because the function returns quickly we could continue doing computation while waiting for the background task to finish. We could also launch multiple asynchronous operations at the same time.

// Web browsers
fetch('https://example.com/').then(res => {
  // At some future time, when a preliminary response has been received, this
  // function will be called.
  console.log('Response status:', res.status);
});

// This will be printed first, as fetch() immediately returns after kicking off
// the HTTP request.
console.log('After call to fetch()');

// Prints:
// After call to fetch()
// Response status: 200

// Node.js
fs.open('file.txt', (err, fd) => {
  // At some future time, when a file descriptor has been opened, this function
  // will be called.
  console.log('File descriptor opened:', fd);
});

console.log('After call to fs.open()');

// Prints:
// After call to fs.open()
// File descriptor opened: 3

JavaScript and asynchrony

Why is JavaScript asynchronous?

First, a point of clarification. People often say that JavaScript is inherently asynchronous. But that’s not strictly true. Until the introduction of Promises in ES2015 (also known as ES6), the JavaScript language was completely synchronous. But embedders of JavaScript, like web browsers and Node.js, have provided useful APIs that are available only in asynchronous form.

That doesn’t quite answer our question though. Why do web browsers decide to provide so many asynchronous APIs?

As it turns out, there are two factors that went into this design decision:

JavaScript is generally used in a single-threaded setting, and
a lot of the Web APIs listed have performance that is I/O-bound rather than CPU-bound.

Let’s consider the first point. JavaScript in browsers is single-threaded, so all of the UI updates and JavaScript execution happen on the same thread. If the fetch() operation had been synchronous, then the main JavaScript thread would not be able to execute any code until fetch() returns.

If we give this a moment of thought, it could seem quite disastrous. Imagine trying to watch a 2-hour-long movie on YouTube, where the page tries to fetch() the video. Because fetch() blocks JavaScript execution, the entire page would freeze until the movie download completes! In other words, such a fetch() function wouldn’t be too different from an infinite loop: it would block even visual updates like scrolling from working.

A synchronous fetch().

Now, onto the second point. If fetch() is synchronous, the network card could do all the work in the background until the entire response is stored in memory. In other words, we delay CPU processing until a future point in time, thus freeing the CPU for other tasks while the network transmission is being completed.

On the other hand, with CPU-bound tasks like computing cryptographic hashes, simply delaying its execution doesn’t quite help as we still need to do the work on the main thread – only the timing would be different.

An asynchronous fetch().

Here’s a question for you. We’ve considered why browsers have so many asynchronous APIs. Can you think of why Node.js, a JavaScript runtime used to write servers, would be similar in that way?

Click here for an answer.
Node.js servers, like most web servers, are commonly expected to handle a large number of simultaneous requests. Synchronous operations would not allow for any concurrency between requests, as they require the server to finish one request before starting another almost by definition. This makes asynchrony essential.
Additionally, Node.js servers are similar to the browser in that they are quite I/O-heavy as well. Web servers do a lot of I/O, from communicating with the client using HTTP, to reading or writing to a file system, or talking to a remote database server. The ability to do I/O concurrently would be a great boon.

The event loop

How are asynchronous operations implemented in JavaScript runtimes?

As we have seen earlier, asynchronous operations in JavaScript generally has the following form:

Upon the function call, an operation is started in the background, and the function immediately returns.
The JavaScript code right after the function call runs, without the results of the operation being available.
At some future time, when the operation completes, a callback function which receives the results is called.

Let’s take a deeper look at how this process works, by implementing our very own asynchronous fetch() function. 🍏🍏🍏

Framing the problem

The actual fetch() function in browsers returns a JavaScript Promise object, which we will get to soon. For now, let’s try to make a very simple basicFetch() function that takes a URL and a callback function, and doesn’t return anything.

basicFetch(url, data => {
  console.log(data);
});

Starting the operation

We often say that JavaScript is a single-threaded programming language. This is true, but only for JavaScript execution, so that we don’t have to worry about synchronization (race conditions) and other lower-level issues. Browsers and other JavaScript runtimes often use multi-threading and other techniques to parallelize I/O.

With this in mind, let’s start writing this function!

// This function runs in a worker thread, where we can use synchronous
// operations without blocking the main thread.
function fetchWorker(url, callback) { /* … */ }

function basicFetch(url, callback) {
  // Get an available worker thread.
  const thread = getAvailableThread();

  // Send the task type as well as the parameters to the worker thread. The
  // worker thread would then call fetchWorker(url, callback) immediately.
  thread.run(fetchWorker, url, callback);

  // The fetch has been started, so our job here is done.
}

Getting the results back: naïve approach

Well, that hopefully wasn’t too difficult to understand. Let’s now write the fetchWorker() function.

After the I/O thread finishes the task, it would need to call the JavaScript callback function to pass the data back to JavaScript code. Let’s do that.

// This function runs in a worker thread.
function fetchWorker(url, callback) {
  // Fetch the URL synchronously, and put its result in `buffer`.
  const buffer = fetchURLSync(url);
  callback(buffer);
}

But wait… This function runs on a separate thread. By calling the JavaScript callback directly from a worker thread, we would be violating the JavaScript single-thread constraint, and cause all sorts of race conditions in a language not prepared to deal with multithreading. No!!!

Introducing the event loop

This section is also available in video form on https://youtu.be/cCOL7MC4Pl0?t=430. Jake Archibald can explain it several fold better than I can.

To solve this problem, people invented the “event loop” approach. Instead of calling the JavaScript callback directly, the worker thread would enqueue the callback in a task queue that belongs to the JavaScript main thread.

// This function runs in a worker thread.
function fetchWorker(loop, url, callback) {
  // Fetch the URL synchronously, and put its result in `buffer`.
  const buffer = fetchURLSync(url);

  // Assume enqueueTask() does all the synchronization (locking) necessary to
  // prevent race conditions.
  loop.enqueueTask(callback, buffer);
}

On the main thread, a mechanism known as the “event loop” repeatedly polls the task queue to see if there are any outstanding tasks on the task queue. If so, it would run the tasks in FIFO order. However, it would only do so after the existing functions all finish executing so as to not interrupt execution of the current JavaScript function – in other words, when the JavaScript stack becomes empty.

Network responses is only one type of task. Web browsers have all sorts of APIs that hook onto the event loop, like setTimeout() and DOM events.

JavaScript, Web APIs, and the event loop.

JavaScript asynchrony in academic terms

In systems design, this approach to asynchronous programming using event loops is often given two epithets:

Event-driven. This essentially means that the program wouldn’t do anything unless a task gets queued upon an external event (like when network download finishes, or a timeout is up, or the user clicked a button).
Cooperative multitasking. JavaScript allows multitasking, but tasks wouldn’t get executed unless the current script finishes executing. This is in contrast with modern operating system schedulers, which would interrupt and preempt long-running threads.

Introducing JavaScript Promises

What are Promises in JavaScript? How do they ease asynchronous programming?

By now, we have a pretty good idea of how asynchronous programming works in JavaScript, with callbacks. But despite all those concurrency improvements, the programming flow really isn’t great. A phenomenon known as “callback hell” is often present in callback-based code that does a series of function calls sequentially, with callbacks inside callbacks inside callbacks:

// callback hell
fs.readFile(filename, (err, file) => {
  if (err) {
    console.log('Error reading file: ' + err);
  } else {
    fs.writeFile(filename + '.new', file, err => {
      if (err) {
        console.log('Error writing file: ' + err);
      } else {
        console.log('Done!');
      }
    });
  }
});

Using Promises is a way to simplify that.

At its most basic level, a JavaScript Promise object is simply a container of JavaScript value, with a state attached to it that indicates if the operation succeeds. By default, the Promise container is empty, indicating that the data is not yet available.

Creating Promises

If we want to use Promises with an existing callback-based API, we can use the Promise constructor.

// This Promise will become fulfilled when the fetch finishes.
new Promise(resolve => {
  basicFetch(url, resolve);
});

// This Promise becomes fulfilled when the file is fully read. If an error
// occurred, the Promise is rejected with the error.
new Promise((resolve, reject) => {
  fs.readFile(filename, (err, fd) => {
    if (err) reject(err);
    else resolve(fd);
  });
});

// This Promise will remain empty forever.
new Promise(() => {});

🍏 Application: Implementing `Promise.resolve()`

In addition to the new Promise API as above, JavaScript provides a shorthand for creating a Promise that is assigned a value at creation:

// This Promise is assigned the value 1 upon creation.
Promise.resolve(1);

How would you go about implementing this method using new Promise?

Answer

function resolve(value) {
  return new Promise(resolve => {
    resolve(value);
  });
}

Consuming promises

But unlike a usual container object, Promise objects don’t allow direct access to the contained value. Instead, to access it or to wait for the value to become available, one attaches a promise reaction callback function using the .then() function.

// When the promise is fulfilled with a value, print it out.
promise.then(data => { console.log(data); });

If we were to rewrite our callback-hellish code using Promises instead, they would look much more linear and intuitive.

// Clean code with Promises
fs.readFile(filename)
  .then(file => fs.writeFile(filename + '.new', file));
  .then(() => { console.log('Done!'); })
  .catch(err => {
    console.log('An error occurred: ' + err);
  });

`async` functions

How do async functions further simplify asynchronous code, and how do they work?

Promises themselves can be pretty useful already, but their real power lies in unlocking a new JavaScript feature called async functions. Remember all those .then()’s? async functions give us a way of getting rid of them.

Here’s the code that does the same thing as above… except .then doesn’t appear:

// Clean code with async
async function main() {
  try {
    const file = await fs.readFile(filename);
    await fs.writeFile(filename + '.new', file)
    console.log('Done!');
  } catch (err) {
    console.log('An error occurred: ' + err);
  }
}
main();

This looks really really clean. The await operator would take a Promise object, and, well, waits until the Promise resolves and get its value. In fact, it looks pretty much like a synchronous function, in that we could directly get the result of an asynchronous action. But how is it possible for this not to have the same problems as synchronous code, and not block the entire main thread?

It turns out that there are three special things about an async function:

All async functions always return a Promise immediately, so calling the function does not directly yield the returned value, fitting our definition of an asynchronous operation.
The execution of this specific function could be suspended and restarted. When the await operator is encountered, only the function will be paused, without blocking the rest of the main thread. This makes async functions not suffer from the same problems as synchronous code.
The await operator would register special Promise reaction callbacks to the given Promise, which would restart that particular instance of the async function using the values in the Promise.

With these three things set up, async functions are able to do the amazing things they do.