Handling Binary Data in JavaScript

Handling Binary Data in JavaScript

Due to the rise in popularity of Artificial intelligence, concerns rightly so, have risen about the intelligence of machines and even the possibilities of its sentience.

Although it may seem so, we all know one thing for sure: Computers need electricity to function and therefore only understand 2 things: 0 and 1. These represent the state of a transistor in the CPU, 0 meaning the absence of electrical current and 1 meaning otherwise.

The data stored in a transistor is a bit and 8 bits make up one byte. This is why and how using binary numbers only (machine code), we're able to communicate with a computer. Every data, input, text, etc has to be converted to binary for our computers to understand.

Usually, the device drivers, operating system and system applications handle all of this for us, giving us a seamless experience as though our javascript engine was a human capable of understanding our commands. However, when performing some operations such as reading a file in storage we may come across raw binary data.

BUFFERS

A buffer is a space in memory used to temporarily store binary data. They are found everywhere in the computer. Javascript provides us with a buffer object that wraps around binary data returned when reading a file (using FileReader API in the browser or fs.readFile in nodejs). We can also create our own buffer using the javascript built-in arrayBuffer class.

ArrayBuffer

This represents a raw binary data buffer. It is an array of integers. These integers represent bytes in our memory, hence it is also called an array of bytes.

We create one using the arrayBuffer constructor passing the size we want (in bytes) as a number.

const buff = new ArrayBuffer(10)

We cannot do much with our arrayBuffer object as we cannot directly manipulate its content.

To do that we need a TypedArray object

TypedArray

These are objects that represent our arrayBuffer in different specific formats and can read and write the contents of the buffer.

There are several examples of them.

Examples are

  • Uint8Array

  • Uint16Array

  • Uint32Array

They are created by passing an arrayBuffer as an argument to their respective constructors. We can also construct a typedArray object without an arraybuffer, but under the hood, they still create an arrayBuffer instance.

For the rest of this article, we are going to focus on a special type of TypedArray. The buffer object present only in Nodejs

Buffer Class

This is a global module in nodejs and calling it with the new keyword returns a buffer object. However, this raised some security concerns and is now deprecated.

We use static methods Buffer.alloc and Buffer.from to create buffer objects

Buffer.alloc()

This creates a buffer object of the allocated size in bytes. It takes three arguments: a required size(a number ), fill: an optional value to fill the buffer with, 0 is used if not specified and lastly the encoding if we want to store strings. (Will come back to this)

var buf = Buffer.alloc(10);
console.log(buf);
//<Buffer 00 00 00 00 00 00 00 00 00 00>

Buffer.from()

This creates a new buffer object from a specified object. This object must be a string, array, or existing buffer object. Optionally we can pass the encoding as a second argument if the first argument is a string.

var buf = Buffer.from('abc');

console.log(buf); 
//<Buffer 61 62 63>

The Buffer Object

This is an array of bytes in the form of integers.

Our buffer object has array properties like length and its content can be indexed, it also has methods like map, filter, reduce and toString().

It is however of fixed length and cannot be increased. We write to a buffer using the buffer.write() method.

Recall a byte is 8 bits and these represent 8 transistors or better put 8 alternating values of zeros and ones. That said, we can generate only a limited number of distinct digits (around 255, 11111111 = 255 in base10 and FF in hexadecimal). To store values beyond 255, our buffer uses two bytes.

Note that buffer integers are displayed in the console as hexadecimal for readability, not base 10 or binary. The small letter a is 97 in utf8 which is 61 in hexadecimal.

Buffers and Strings

A string is a sequence of text characters. Each character corresponds to a number as specified by Universal Transformation Formation (UTF8). This number is then stored as a byte in our buffer. This means each character in a string including white space is stored in a byte. To reiterate; to store a string in binary, each character is first converted to a base10 number as specified by the utf standard, the number is then converted to binary.

For example, the big letter A is 65 and the small letter a is 97.

var buf = Buffer.from('ABC');

console.log(buf); 
//<Buffer 41 42 43> 41 is hexadecimal for 65

To know what number a character corresponds to, we use the str.charCodeAt() method on a character in the string

let a = 'a'
console.log(a.charCodeAt(0)); 
//97

The process of converting a string to binary is known as encoding and the opposite operation is known as decoding.

//encoding
const buf = Buffer.from('ABC');

//decoding
console.log(buf.toString()); 
//<Buffer ABC>

Buffers and Files

Computers generally contain two types of files: text files and binary files. Text files are files that contain human-readable characters and can be edited using a text editor. Examples are .txt, .html, .js,.cpp,.py files etc.

Reading from a text file is like reading a long string.

Binary files are all other files. How they are stored in binary varies massively and is beyond our scope, using an image file as an example, four bytes are used to store the colour of each pixel in an array (the RGBA values).

There is however one more thing for us to note. While our javascript engine gives us the ability to access binary data, some program systems lack that (e.g the mail protocol). Transferring files to these systems will involve converting them to text characters before sending. This is also how we can embed images in HTML files.

There is a special encoding format for this called base64 encoding. We do not use utf-8 because despite being widely supported in the web, is poorly supported in other systems. Base64 encoding implementation is similar to our normal string encoding.

To convert from binary to base 64, we pass in 'base64' as an argument to the toString() method.

const buf = Buffer.from('ABC');


console.log(buf.toString('base64')); 
//<Buffer QUJD>