Since I started reading “The C programming language“, I’ve been solving some coding exercises on Codewars. One of them asked to code a Brainfuck compiler, which was very interesting. In this post, I will explain the basics of the Brainfuck programming language and then show how to implement a compiler for it in C.

## Brainfuck

If you aren’t familiar with it, Brainfuck is a minimalistic programming language which consists of only a few symbols: `<,` `>`, `+`, `-`, `.`, `,`, `[` and `]`. Each one of those characters follows a simple rule, allowing to create programs as you would do with other programming languages (albeit with some limitations).

For example, the famous Hello, world program written in Brainfuck isn’t nearly as simple as on other languages:

`++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.`

In order to understand this program (at least partially), we should understand what each command represents.

## Data manipulation

First of all, Brainfuck works by storing data on a stream of cells which start all initialized to 0.

Each cell can contains 1 byte (8 bits) of information, so the values vary from 0 to 255. We have a data pointer that points to one of these cells at a time (the black arrow below).

At this point, we can already understand what most of the commands do:

• `<` moves the data pointer to the previous cell (one cell to the left).
• `>` moves the data pointer to the next cell (one cell to the right).
• `+` increases the value of the cell marked by the data pointer by 1.
• `-` decreases it by one.

For example, the following code `+++>++` would change our data cells as follows: (suppose the data pointer starts at the central cell).

We start by incrementing the central cell by 3 units (`+++`), then we move to the next cell (`>`) and finally, we increment this data cell by 2 (`++`).

## Input and output: ASCII encoding

The next two commands, `.` and `,` , are for output and input. In order to understand how they work, we need to know a bit about ASCII. Simply put, ASCII is a convention that assigns characters to every one of the digits between 0 to 255.

The reason why it assigns only 256 values, is because computers manipulate data in bundles called bytes, which are composed of 8 bits. A bit is just a 0 or a 1, so it transfers enough data for 2 values. With 8 bits, we have 2^8=256 possible combinations.

This means ASCII assigns characters to each one of the possible combinations in one byte of data.

For example, the value 65 corresponds to character ‘A’. The next value, 66, corresponds to ‘B’. Lowercase letters come after them. They start at value 97 for ‘a’ and so on. You can see a complete list of the character encodings in the Wikipedia link shared above.

Regarding our Brainfuck compiler, when we pass the character ‘A’ to the compiler, what it actually gets is its encoding in ASCII, which is 65. This is what’s done with the input command `,`: it accepts one character of input from the user and saves it to the current data cell.

The command for output `. `works in the opposite sense. Let’s say the data pointer is at a cell with value 65. If we ask Brainfuck for output with `.`, it will pass the value 65 which will be encoded to the user’s output (say the monitor screen) as ‘A’.

## Loops: `[` and `]`

Brainfuck supports iteration with its last two commands `[ `and `]`. We can understand the opening `[ `as the opening of a `while` loop:

```while (pointer_value != 0) {
...
}```

As the code above suggests, the opening `[` will ask if the value at the current pointer is non-zero. If it’s any value other than `0`, it will proceed to execute all of the commands until its matching `]`.

On the other hand, when the condition at `[` fails (that is when the value at the current pointer is `0`), the program’s control flow will advance to the command just after the matching `]`.

When the program enters the loop, the closing `]` will move the program’s control flow back to where the loop started, which is to its matching `[`. Here, the condition will be asked again “is the value at the current pointer equal to zero?”.

## Brainfuck compiler implementation in C

Now that all of Brainfuck’s commands have been explained, we’re in good measure to show an implementation of a compiler in C.

```#include <stdio.h>

#define DATASIZE 1001

void brainfuck(char *command_pointer, char *input) {
int bracket_flag;
char command;
char data[DATASIZE] = {0};
char *dp;   /* data pointer */

/* Move dp to middle of the data array */
dp = &data[DATASIZE/2];

while (command = *command_pointer++)
switch (command) {
case '>':   /* Move data pointer to next address */
dp++;
break;
case '<':   /* Move data pointer to previous address */
dp--;
break;
case '+':   /* Increase value at current data cell by one */
(*dp)++;
break;
case '-':   /* Decrease value at current data cell by one */
(*dp)--;
break;
case '.':   /* Output character at current data cell */
printf("%c", *dp);
break;
case ',':   /* Accept one character from input pointer ip and
*dp = *input++;
break;
case '[':   /* When the value at current data cell is 0,
advance to next matching ] */
if (!*dp) {
for (bracket_flag=1; bracket_flag; command_pointer++)
if (*command_pointer == '[')
bracket_flag++;
else if (*command_pointer == ']')
bracket_flag--;
}
break;
case ']':    /* Moves the command pointer back to matching
opening [ if the value of current data cell is
non-zero */
if (*dp) {
/* Move command pointer just before ] */
command_pointer -= 2;
for (bracket_flag=1; bracket_flag; command_pointer--)
if (*command_pointer == ']')
bracket_flag++;
else if (*command_pointer == '[')
bracket_flag--;
/* Advance pointer after loop to match with opening [ */
command_pointer++;
}
break;
}

/* Adding new line after end of the program */
printf("\n");
}```

Starting with the compiler’s parameters:

`void brainfuck(char *command_pointer, char *input)`

The program’s code will be passed to the `brainfuck` compiler as a string of `char`s. The `command_pointer `will start pointing to the first char of the program’s code. The user’s input will be passed as a string of chars. `input` should point to the first character of the input when `brainfuck` is called. The output is passed to the user’s screen using `printf`.

See that to handle our data, we’re using the `char` type. This is because the size of `char` type in C is of one byte, which corresponds to Brainfuck’s specifications.

Next, we move the data pointer to the middle of the available cells so that we have enough cells both before and after it.

`dp = &data[DATASIZE/2];`

Following, we have a loop that catches the current command and advances the command pointer to the next available character. It will continue until we reach the end of the program’s code which corresponds to the character `'\0'`.

`while (command = *command_pointer++)`

In the loop, we have a switch statement that performs the actions necessary for each command. The code snippet shared above contains comments to understand each case.

## Hello, world!

Finally, we can test our compiler with the “Hello, world!” code we shared above:

```int main() {
char *hello_world_code = "++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.";
char *input = "";
brainfuck(hello_world_code, input);

return 0;
}```

For the input, we just passed an empty string: `""`.

After compiling and executing the Brainfuck compiler, we get the output:

`Hello World!`

Success!

(Maybe now you will be able to run the code snippet at the beginning of this post…)

Published inProgramming
Subscribe
Notify of Inline Feedbacks  