Severyn Kozak

emulating exceptions in C

Sat, 22 Aug 2015 17:00:00 -0400

case study: a recursive-descent parser

I recently stumbled across a practical use-case for simulated exceptions in C while writing a recursive-descent JSON parser for fun and profit. In this quick write-up, I’ll give a high-level overview of the problems that I ran into, why exceptions were ideal for error handling, and how I emulated them in C.

recursive-descent parsing

I won’t dwell on the details of the parser itself because this post is about the error-handling mechanism, but a minimal understanding of recursive-descent parsing is necessary to appreciate it. As with any kind of parsing, we start out with the formal grammar of our language/data format/whatever. A simple grammar for common programming language literals might look like:

value: string | boolean | number
string: '"' char* '"'
boolean: 'true' | 'false'
number: '-'? digit+ ('.' digit+)?
array: '[' (value ((',' value)*)?)? ']'

In fact, the JSON grammar that I used is fairly similar. Writing a recursive-descent parser for a grammar like the above is straightforward, because you simply map each rule onto a corresponding parse function. In pseudocode, we might have:

parse()
    # perform setup
    return parseValue()

parseValue()
    if nextIsString()
        return parseString()
    else if nextIsNumber()
        return parseNumber
    else if nextIsBoolean()
        return parseBoolean()
    else if nextIsArray()
        return parseArray()
    else
        throw ParseError()

parseString()
    matchChars('"')
    string = readCharsUntil('"')
    matchChars('"')
    return string

parseBoolean()
    if peekChar() == 't'
        matchChars('true')
        return true
    else
        matchChars('false')
        return false

# and so on

The gist is that we have a bunch of mutually recursive parsing routines that ultimately rely on very primitive, low-level functions (like nextChar(), readCharsUntil(), matchChars(), etc. in the above example) that operate directly on the string being parsed.

error-handling

Most of the errors that we need to worry about will occur in those primitives: nextChar() might fail to read a character because it hit the end of the input stream and matchChars() might find an unexpected character, for example. We may also want to manually signal an error in one of our high-level parsing routines, like we do in parseValue() when we can’t detect any valid values ahead. The key observations to make are that in a recursive-descent parser, the call stack will grow quite deep, and that errors are fatal; in other words, when one occurs, we need to return through many layers of function calls until we hit the parse() that started it all:

parse()         # The top-level parse routine that we need to jump back to.
parseValue()
parseArray()
parseValue()
parseBoolean()
matchChars()
getNextChar()   # Error, hit EOF!

How should we handle errors in C, then?

error codes

The idiomatic solution is to simply use error codes. If nextChar() fails, return -1 (which is suitable because character values can’t be negative), and make sure to actually check that return value every time you call it.

char chr = nextChar(parserState);
if(chr == -1){
    return -1;
}

Note that the parserState argument passed to nextChar() is a (pointer to a) struct containing the parser’s state: a pointer to the string being parsed, its length, the current index in that string, etc.

In practice, we’d probably settle for a more sophisticated solution that involves storing error information inside parserState, like a boolean indicating whether a failure occurred and an error message to accompany it, since it’s more flexible:

char chr = nextChar(parserState);
if(parserState->failed){
    puts(parserState->errMsg); // just an example
    return NULL;
}

Either way, the result is that we have to remember to manually check some error value after every call to a parse routine that carried the possibility of failure. It bloats your code with repetitive conditionals and prevents you from using the return value of a parse routine directly in an expression because, again, you need an explicit conditional. Can we do better?

exceptions, exceptions

An exception mechanism would be ideal here, since we want to jump back to an arbitrary point in the call stack (in our case, parse()) from any one function. While C doesn’t provide us with real exceptions, we can simulate them…

`longjmp()`, `setjmp()`

Enter longjmp() and setjmp(); like goto, but nuclear! From the manpage, these functions facilitate “nonlocal jumps to a saved stack context,” or, in other words, allow you to perform jumps across functions. Use with extreme caution. The gist is that setjmp() is used to initialize a jmp_buf, storing critical information about the current calling environment – it’s highly system-specific, but generally includes things like the stack pointer and current register values – and returns 0 (the first time it returns – this will be explained shortly). You can then pass that jmp_buf to longjmp() at any other point, and the program will rewind execution back to the setjmp() call. You’ll also need to pass a non-zero int to longjmp(), which will be the value that setjmp() returns this time around; this allows us to discriminate between the times that setjmp() returns a.) initially and b.) after a jump was performed. An example should set things straight:

#include 
#include 

void bar(jmp_buf jmpBuf){
    puts("inside bar()");
    longjmp(jmpBuf, 1);
    puts("this should never run!");
}

void foo(void){
    jmp_buf jmpBuf;
    if(!setjmp(jmpBuf)){
        // This runs after `setjmp()` returns normally.
        puts("calling bar()");
        bar(jmpBuf);
        puts("this should never run!");
    }
    else {
        // This runs after `setjmp()` returns from a `longjmp()`.
        puts("returned from bar()");
    }
}

int main(){
	foo();
	return 0;
}

When compiled and run, you should see:

calling bar()
inside bar()
returned from bar()

Notice how we wrap the call to setjmp() in a conditional, which allows us to selectively run different code after it returned regularly (returning 0) and then after a jump occurred (returning whatever argument was passed to longjmp(), or, in our case, 1). Continuing the exceptions analogy, this is similar to a try {} catch {}.

Also, note that jmp_buf is typedef‘d as an array of the actual jmp_buf structs with only one element – in other words, when you declare jmp_buf jmpBuf;, the struct inside jmpBuf lives entirely on the stack but jmpBuf will decay to a pointer if you pass it to a function. In my opinion that’s rather misleading and I would’ve preferred to manually, explicitly use pointer notation when necessary, but it is what it is.

integrating them into the parser

The idea is to initialize a jmp_buf in the parse() function with setjmp(), store it inside the parserState struct in a prevErrorTrap member (couldn’t think of a better name), and then longjmp() to it whenever an error occurs. If that were all, using this solution would be a no-brainer, but alas, there’s a complication: some of our parsing routines might need to perform cleanup before exiting, like free()ing temporarily allocated memory. For instance, the parseArray() function in my parser allocates a stretchy array to house all of the values that it successfully parses; if an error occurs in one of the parseValue() calls that it makes, it needs to deallocate all of the values parsed thus far and then the array itself. If we jump from the point where the error occurred to the very beginning of the parse, though, we don’t have any means of doing so.

intermediate cleanup

Two solutions come to mind:

storing pointers to all of the blocks of memory allocated by the parse routines inside an array in parserState, and then free()ing them inside the top-level parse() if an error occurred
setting intermediate jump points in functions that need to perform cleanup; in effect, catching exceptions, cleaning up, and reraising them.

I ultimately settled for the latter, and the idea’s the same as before: in functions like parseArray() and any others that allocate intermediate memory, create a copy of the current jump buffer (parserState->prevErrorTrap), and then set parserState->prevErrorTrap to a new jump buffer created with setjmp() – this one will get used by all of the parse routines called by the current one. If the parse succeeds, just restore parserState->prevErrorTrap to the original jump buffer before returning. If it fails, perform cleanup and jump directly to the original buffer. Here’s an example taken straight from the parser’s source, with irrelevant bits omitted:

static JsonArray_t JsonParser_parseArray(JsonParser_t *state){
    /**
     * Omitted: perform setup here.
     */

    jmp_buf prevErrorTrap;
    copyJmpBuf(prevErrorTrap, state->errorTrap);

    // The stretchy array used to store parsed values. Read on
    // for why `volatile` is necessary.
    JsonVal_t *volatile values = NULL;

    if(!setjmp(state->errorTrap)){

        /**
         * Omitted: parse values into `values` with repeated calls
         * to `parseValue()`.
         */

        // If we get this far, then no error occurred, so restore the
        // original `prevErrorTrap`.
        copyJmpBuf(state->errorTrap, prevErrorTrap);

        return (JsonArray_t){
            .length = sb_count(values),
            .values = values
        };
    }
    else {
        // An error occurred! Deallocate all intermediate memory,
        // and then jump to the previous `prevErrorTrap`.
        for(int ind = 0; ind < sb_count(values); ind++){
            JsonVal_free(&values[ind]);
        }
        sb_free(values);
        longjmp(prevErrorTrap, 1);
    }
}

copyJmpBuf() is just a convenience wrapper for memcpy():

static void *copyJmpBuf(jmp_buf dest, const jmp_buf src){
    return memcpy(dest, src, sizeof(jmp_buf));
}

One other thing to note is that we declared the values pointer as volatile to prevent the compiler from placing it into a register. Why? The problem is that we modify values after the call to setjmp(), namely when we perform the initial allocation of a stretchy array and then whenever it gets resized and a realloc() changes the location of the items that it contains. When a long jump occurs, register values are restored from whatever they were at the time of the setjmp() call, since those are what it copied into the target jmp_buf; if the compiler decided to put values into a register, then after the jump, it would be set to NULL. To prevent that from happening, we use the volatile specifier. See this SO post for more; this is an example of the potentially very dangerous subtleties of long jumping. In fact, while writing my parser I forgot to add in the volatile specifier to values, and noticed that it was leaking memory (thank you valgrind!) whenever an error occurred even though the cleanup clause was getting run. It turns out that values would get put into a register and then consequently take on a value of NULL after the jump – since that’s what it was at the time of the original setjmp() – meaning that the only reference to the allocated memory was lost and it couldn’t possibly be deallocated. Moreover, when passed to free(), it wouldn’t blow up, because free() ignores NULL pointers!

To wrap up the above example, all of the other parsing functions that set intermediate breakpoints have virtually the same layout, so you could even theoretically encapsulate the different statements in macros like try and catch for a full blown imitation of exceptions in other languages – that’s too much magic for me, though.

in conclusion

longjmp() and setjmp() are tricky. They’re obscure, can give rise to subtle bugs, are highly platform-specific, and, if abused, will probably lead to awfully confusing code. That being said, like goto, they do have valid uses and can be very powerful when used appropriately. In this case, I think they were superior to error codes and resulted in a slimmer, more readable implementation than what it otherwise would’ve been. If you’re interested in more reading, I recommend this comprehensive article. Also, here’s the thoroughly documented parser source code; check out src/json_parser.c.

RSA: implementation and proofs

Sun, 14 Jun 2015 17:00:00 -0400

Warning: this post contains math blocks rendered with the MathJax JavaScript library. If you're using an RSS reader or otherwise have Javascript disabled, none of them will display, so you should instead read this article at its source.

what is RSA?

RSA is a public-key, or asymmetric, encryption algorithm. In contrast to symmetric algorithms, like DES and AES, which use the same key for both encryption and decryption, RSA employs two distinct keys: a public key used to encrypt data, and a private key used to decrypt whatever was encrypted with the public one. The beauty of public-key encryption is that the parties involved never need to exchange a master key, meaning that communications can be securely encrypted without any prior contact.

Public-key encryption was proposed by Whitfield Diffie and Martin Hellman in ‘76, while RSA itself was patented in ‘77 by Ron Rivest, Adi Shamir, and Leonard Adleman, who then went on to found a cybersecurity company of the same name – confusing, but great PR!

Clifford Cocks, an English cryptographer, arrived at a similar algorithm in ‘73 while working for British intelligence at GHCQ, but his work wasn’t declassified until 1998 due to its sensitivity. Forty years later, RSA underpins SSL certification, SSH handshakes, and lots more.

In this post, we’ll implement RSA, but we’ll very much take the long way around while doing so. The algorithm introduces a number of interesting problems, like finding greatest common divisors, performing modular exponentiation, computing modular inverses, and generating random prime numbers, each of which we’ll thoroughly explore and derive solutions to (many of these won’t be immediately clear, so we’ll formally prove them as we go). Note that we won’t prove RSA itself – I might add that as an extension to the article at some point in the future.

math precursor

$\def \imod {\text{ mod }} \def \divs {\text{ } \vert \text{ }}$ The only thing we need to know before diving into RSA is some modular arithmetic, which is simply arithmetic with the property that numbers have a maximum value (called the modulus) and wrap around to 0 when they exceed it. When we take a number $a \imod b$, we’re basically taking the remainder of $\frac{a}{b}$; most programming languages provide this in the form of a mod function or % operator. We’ll see lots of expressions in the form of:

\[a \equiv b \pmod c\]

Here, the $\equiv$ symbol implies congruence, or that $a \text{ mod } c$ equals $b \text{ mod } c$. An important gotcha is that $\pmod c$ applies to both sides of the expression, which isn’t immediately obvious to anyone used to the modulo operator in the programming sense. Many sources choose to omit the parentheses, simply writing $a \equiv b \imod c$, which just compounds the confusion; the clearest notation would probably be something like $(a \equiv b) \pmod c$. This is extremely important to remember because otherwise, expressions like $a \equiv 1 \imod b$ won’t make any sense at all (“but if $1 \imod b$ is equal to 1 for all $b$ not equal to 1, why not just write $a = 1$?!”).

Some notes about miscellaneous notation:

$a \divs b$ means that $a$ divides, or is a factor of, $b$
range notation is used here and there: $[a, b]$ represents all of the numbers between $a$ and $b$ inclusive, $[a, b)$ includes $a$ but excludes $b$, $(a, b)$ excludes both $a$ and $b$, etc.

how RSA works

RSA revolves around a numeric key-pair, or a mathematically related public and private key. The public key is made known to the world, which can then use it to encrypt a message, while the private key can be used to decrypt anything encrypted with the public key. Encrypting and decrypting a message is fairly straightforward, while generating a key-pair is a more substantial process.

generate a key-pair

To generate a public/private key-pair:

generate two (large) random primes, $p$ and $q$
let $n = pq$
find $\phi(n)$ (Euler’s totient), or the number of integers in the range $[1, n]$ that are coprime with $n$ – that is, have a Greatest Common Divisor (GCD) of 1 with it.
find a value $e$ such that $1 \lt e \lt \phi(n)$ and $e$ is coprime with $\phi(n)$; this is your public key.
find a value $d$ such that $de \equiv 1 \pmod{\phi(n)}$ – in other words, find the multiplicative modular inverse of $e$ modulo $\phi(n)$; this is your private key.

Though short and concise, the above steps present several complex problems:

generate a large, random prime number; this is probably the most involved, so we’ll save it for last (step 1)
find $\phi(n)$, where $n$ is the product of two primes (step 3)
find the GCD of two numbers, which will allow us to find $e$ (step 4)
find the multiplicative modular inverse of a value, to find $d$ (step 4)

example

Before we dive into solving those, let’s walk through the process of generating a key-pair using some small sample numbers.

let $p = 3$ and $q = 5$
$n = 3 \cdot 5 = 15$
$\phi(15) = 8$ (coprime values are 1, 2, 4, 7, 8, 11, 13, and 14)
$e = 3$, because 3 is both less than and coprime with 8
$d = 3$, because $3 \cdot 3 = 9$ and $9 \equiv 1 \pmod 8$

Easy! Except, of course, we weren’t dealing with numbers with hundreds of digits – that’s the hard part. :)

finding $\phi(n)$

To compute $\phi(n)$, we can take advantage of the fact that it’s composed of two prime factors: $p$ and $q$. Thus, the only values with which it shares GCDs that aren’t 1 must be multiples of either $p$ or $q$ (for instance, $\gcd(n, 2q) = q$ and $\gcd(n, 3p) = p$). There are only $q$ multiples of $p$ ($p, 2p, 3p, \ldots, qp$) and $p$ multiples of $q$ ($q, 2q, 3q, \ldots, qp$) that are less than or equal to $n$. Thus, there are $q + p$ values in the range $[1, n]$ that have a GCD with $n$ not equal to 1. Note, however, that we double counted $pq$ in our list of multiples of $p$ and $q$, so in reality it’s $p + q - 1$. Thus, $\phi(n) = \text{total} - (p + q -1)$, where $\text{total}$ is the total numbers of values in the range $[1, n]$ – that is, $n$.

\[\phi(n) = n - (p + q - 1) = n - p - q + 1\]

computing GCDs

To find the GCD of two numbers, we’ll employ the Euclidean algorithm:

the GCD of any number and 0 is the absolute value of that number
the GCD of numbers $a$ and $b$ is the GCD of $b$ and $(a \text{ mod } b)$

or:

def gcd(a, b):
    return abs(a) if b == 0 else gcd(b, a % b)

Let’s prove it. Case 1 should be self-explanatory: 0 is technically divisible by any number, even if the quotient equals 0, so the GCD of 0 and any other number should be that number. We need to be careful and take its absolute value, however, to account for negative values; the greatest divisor of -5 is 5, after all, not -5, so the GCD of 0 and -5 must also be 5. Thus, we have to take the absolute value of -5 to arrive at the greatest divisor.

Case 2 is less intuitive (at least for me), and requires proving that $\gcd(a, b) = \gcd(b, a \imod b)$. Let’s begin by creating another variable $c$:

\[c = a - b\]

prove $\gcd(a, b) \divs c$

We first want to prove that the GCD of $a$ and $b$ divides $c$ (or $\gcd(a, b) \divs c$). Begin by rewriting $a$ and $b$ as products of their GCD.

\[a = x \cdot \gcd(a, b)\\ b = y \cdot \gcd(a, b)\\\]

$x$ and $y$ are just placeholders: we don’t want to know or care what they equal. Now, plug those into the definition of $c$:

\[c = a - b\\ c = x \cdot \gcd(a, b) - y \cdot \gcd(a, b) = (x - y) \gcd(a, b)\\ \therefore \gcd(a, b) \divs c\]

Since we’ve shown that $c$ is the product of $\gcd(a, b)$ and another value, it is by definition divisible by $\gcd(a, b)$.

prove $\gcd(b, c) \divs a$

Apply the same logic here:

\[b = x \cdot \gcd(b, c)\\ c = y \cdot \gcd(b, c)\\ a = c + b\\ a = x \cdot \gcd(b, c) + y \cdot \gcd(b, c) = (x + y) \gcd(b, c)\\ \therefore \gcd(b, c) \divs a\]

prove $\gcd(a, b) = \gcd(b, a - b)$

We know that, by definition, $\gcd(a, b) \divs b$, and we’ve proven that $\gcd(a, b) \divs c$. Thus, $\gcd(a, b)$ is a common divisor of both $b$ and $c$. That doesn’t imply that it’s the least common divisor, greatest, or anything else: all we know is that it divides both numbers. We do know that there exists a greatest common divisor of $b$ and $c$, $\gcd(b, c)$, so we can conclude that:

\[\gcd(a, b) \le \gcd(b, c)\]

We now re-apply that same reasoning. We know that $\gcd(b, c) \divs b$ and $\gcd(b, c) \divs a$. Thus, $\gcd(b, c)$ is a common divisor of $b$ and $a$. Since we know that the greatest common divisor of $a$ and $b$ is $\gcd(a, b)$, we can conclude that:

\[\gcd(b, c) \le \gcd(a, b)\]

But now we have two almost contradictory conclusions:

\[\gcd(a, b) \le \gcd(b, c)\\ \gcd(b, c) \le \gcd(a, b)\]

The only way these can both be true is if:

\[\gcd(a, b) = \gcd(b, c)\]

So we’ve proven that $\gcd(a, b) = \gcd(b, a - b)$ (remember, $c = a - b$).

prove $\gcd(b, a - b) = \gcd(b, a \imod b)$

First, let’s assume that $a > b$, and rewrite it as: $a = bq + r$ (or $r = a \imod b$)

Now, we already know that $\gcd(a, b) = \gcd(b, a - b)$, Since order doesn’t matter, we can rewrite $\gcd(b, a - b)$ as $\gcd(a - b, b)$. Now, we apply the rule $\gcd(a, b) = \gcd(b, a - b)$ again.

\[\gcd(a, b) = \gcd(b, a - b) = \gcd(a - b, b)\\ \gcd(a - b, b) = \gcd(b, a - b - b) = \gcd(a - 2b, b)\\ \gcd(a - 2b, b) = \gcd(b, a - 2b - b) = \gcd(a - 3b, b)\\ \gcd(a - 3b, b) = \gcd(b, a - 3b - b) = \gcd(a - 4b, b)\\ \ldots\\ \gcd(a - qb, b) = \gcd(r, b)\]

or:

\[\gcd(a, b) = \gcd(a - b, b) = \gcd(a - 2b, b) = \ldots = \gcd(a - qb, b) = \gcd(r, b)\]

Bingo. We’ve proven Case 2, and completed our proof of the Euclidean Algorithm. Before we move on, we’ll also define a convenience wrapper for gcd() that determines whether two numbers are prime:

def coprime(a, b):
    return gcd(a, b) == 1

finding modular inverses

Given a value $a$ and modulus $c$, the modular multiplicative inverse of $a$ is a value $b$ that satisfies:

\[ab \equiv 1 \pmod c\]

This implies that there exists some value $d$ for which:

\[ab = 1 + cd\\ ab - cd = 1\]

This turns out to be in the form of Bézout’s identity, which states that for values $m$ and $n$, there exist values $x$ and $y$ that satisfy:

\[mx + ny = \gcd(m, n)\]

$x$ and $y$, called Bézout coefficients, can be solved for using the Extended Euclidean algorithm (EEA). $x$ corresponds to $b$, or the modular inverse that we were looking for, while $y$ can be thrown out once computed. The EEA will also give us the GCD of $m$ and $n$ – it is, after all, an extension of the Euclidean algorithm, which we use to find the GCD of two values. We need to verify that it equals 1, since we make the assume that $\gcd(m, n) = 1$; if it doesn’t, $a$ has no modular inverse. Since modular_inverse() is just a wrapper for EEA – to be implemented in a function called bezout_coefficients() – its definition is simple:

def modular_inverse(num, modulus):
    coef1, _, gcd = bezout_coefficients(num, modulus)
    return coef1 if gcd == 1 else None

bezout_coefficients() is a bit tricker:

def bezout_coefficients(a, b):
    if b == 0:
        return -1 if a < 0 else 1, 0, abs(a)
    else:
        quotient, remainder = divmod(a, b)
        coef1, coef2, gcd = bezout_coefficients(b, remainder)
        return coef2, coef1 - quotient * coef2, gcd

Let’s see why it works.

the Extended Euclidean algorithm

How to solve for $x$ and $y$? Bezout’s Identity states:

\[\gcd(a, b) = ax + by\\\]

or, for $\gcd(b, a \imod b)$:

\[\gcd(b, a \imod b) = bx' + (a \imod b)y'\\\]

Let’s simplify:

\[a \imod b = a - \lfloor \frac{a}{b} \rfloor b\]

Here, $\lfloor \rfloor$ represents the floor function, which floors the result of $\frac{a}{b}$ to an integer.

\[\gcd(b, a \imod b) = bx' + (a - \lfloor \frac{a}{b} \rfloor b)y' =\\ bx' + ay' - \lfloor \frac{a}{b} \rfloor by' =\\ ay' + b(x' - \lfloor \frac{a}{b} \rfloor y')\]

Since we know, by the already proven Euclidean algorithm, that $\gcd(a, b) = \gcd(b, a \imod b)$, we can write:

\[ax + by = ay' + b(x' - \lfloor \frac{a}{b} \rfloor y')\]

So, $x = y'$ and $y = x' - \lfloor \frac{a}{b} \rfloor y'$. But what are $x'$ and $y'$? They’re the results of running the EEA on $(b, a \imod b)$! Classic recursion. In sum:

def bezout_coefficients(a, b):
    quotient, remainder = divmod(a, b)
    coef1, coef2 = bezout_coefficients(b, remainder)
    return coef2, coef1 - quotient * coef2

Of course, we need a base case, or we’ll end up recursing ad infinitum. Let’s take the case of $b = 0$.

\[ax + by = \gcd(a, b)\\ b = 0\\ ax + 0y = \gcd(a, 0)\\ ax = |a|\\ x = \frac{|a|}{a}\]

So, if $b = 0$, we set the $x$ coefficient to 1 if $a$ is positive and -1 is $a$ is negative, and set $y$ to… what? If $b$ is 0, then $y$ can take on any value. For simplicity’s sake we’ll choose 0. Our revised definition looks like:

def bezout_coefficients(a, b):
    if b == 0:
        return -1 if a < 0 else 1, 0
    else:
        quotient, remainder = divmod(a, b)
        coef1, coef2 = bezout_coefficients(b, remainder)
        return coef2, coef1 - quotient * coef2

Also note that, since this is simply a more involved version of the Euclidean algorithm (we’re making recursive calls to bezout_coefficients(b, remainder) and have a base case of b == 0), when we hit the base case, abs(a) is the GCD of a and b. Since modular_inverse() needs to check that the GCD of its two arguments equals 1, we should return it in addition to the coefficients themselves. Hence, we’ll let it trickle up from our base case into the final return value:

def bezout_coefficients(a, b):
    if b == 0:
        return -1 if a < 0 else 1, 0, abs(a)
    else:
        quotient, remainder = divmod(a, b)
        coef1, coef2, gcd = bezout_coefficients(b, remainder)
        return coef2, coef1 - quotient * coef2, gcd

generating large, random primes

Here’s the idea:

generate a large, random, odd number $x$
check $x$ for primality
1. if $x$ prime, return it
2. otherwise, increment $x$ by 2, and return to step 2.)

Easy enough, except for the bit about testing primality. How to do so efficiently? We’ll turn to the Rabin-Miller algorithm, a probabilistic primality test which either tells us with absolute certainty that a number is composite, or with high likelihood that it’s prime. We’re fine with a merely probabilistic solution because it’s fast, since speed is a non-negligible issue due to the size of the numbers that we’re dealing with, and also because the chances of a false positive (ie indicating that a number is prime when it’s actually composite) are astronomically low after even only a few iterations of the test.

Rabin-Miller primality test

The Rabin-Miller test relies on the below two assumptions (just accept that they’re true for now, and we’ll prove them later on). If $p$ is a prime number:

$a ^ {p - 1} \equiv 1 \pmod p$ for any $a$ not divisible by $p$
for any $x$ that satisfies $x ^ 2 \equiv 1 \pmod p$, $x$ must equal ±1

Using these, you can test a value $n$ for compositeness like so (note that we return true/false to indicate definite compositeness/probable primality respectively):

pick a random value $a$ in the range $[2, n - 1]$
use assumption 1 to assert that $a ^ {n - 1} \equiv 1 \pmod n$); if it’s not, return true
if $a$ has an integer square root, let $a' = \sqrt a$; otherwise, return false
since $a' ^ 2 \equiv 1 \pmod n$, we can use assumption 2 to assert that $a' \equiv \pm 1 \pmod n$; if not, return true
otherwise, repeat steps 3-4, taking the square root of $a'$, and the square root of that, and so on, until you hit a value that doesn’t have an integer square root.
if you haven’t already returned anything, you’ve satisfied assumptions 1 and 2 for all testable cases and can return false.

In sum, we return true if we’ve confirmed that $a$ is a witness to the compositeness of $n$, and false if $a$ does not prove that $n$ is composite – transitively, there is a high chance that $n$ is prime, but we can only be more sure by running more such tests. While the above steps serve as a good verbal description of the algorithm, we’ll have to slightly modify them to convert the algorithm into real code.

We need to implement a function is_witness(), which checks whether a random value is a witness to the compositeness of our prime candidate, $n$.

write $n - 1$ in the form $2 ^ s d$. $n=73$, for instance, would yield $s=3$ and $d=9$, since $73 - 1 = 72 = 2 ^ 3 \cdot 9$.
pick a random value $a$ in the range $[2, n - 1]$. We’ll check whether this is a witness for $n$.
let $x = a ^ d \imod n$
if $x \equiv \pm 1 \pmod n$, then return false
repeat $s - 1$ times:
1. let $x = x ^ 2 \imod n$
2. if $x = 1$, return true
3. if $x = n - 1$, return false
if we haven’t returned yet, return true

These steps seem quite a bit different from before, but in reality, they’re exactly the same and just operating in reverse. We start with a value that doesn’t have an integer square root, and square it until we hit $a ^ {n - 1}$. Why did we bother decomposing $n - 1$ into the form of $2 ^ s d$? Well, it allows us to rewrite $a ^ {n - 1}$ as $a ^ {2 ^ s d}$, and now we know exactly how many times we can take square roots before we hit a value that isn’t reducible any further – in this case, $a ^ d$.

\[a_1 = \sqrt{a ^ {2 ^ s d}} = (a ^ {2 ^ s d}) ^ \frac{1}{2} = a ^ {\frac{1}{2} \cdot 2 \cdot 2 ^ {s - 1} d} = a ^ {2 ^ {s - 1} d}\\ a_2 = \sqrt{a ^ {2 ^ {s - 1} d}} = (a ^ {2 ^ {s - 1} d}) ^ \frac{1}{2} = a ^ {\frac{1}{2} \cdot 2 \cdot 2 ^ {s - 2} d} = a ^ {2 ^ {s - 2} d}\\ \ldots\\ a_{last} = a ^ d\]

So, if we start with $a ^ d$ and square it, we’ll get $a ^ {2d}$, then $a ^ {2 ^ 2 d}$, then $a ^ {2 ^ 3 d}$, and ultimately $a ^ {2 ^ s d}$, or $a ^ {n - 1}$. What’s the advantage of starting from the non-reducible value and squaring it, rather than the reducible value and taking its square roots? It sometimes allows us to short-circuit the process. For instance, as we iterate through the squares of $a ^ d$, if we find an occurrence of -1, we know that we’ll get 1 when we square it, and 1 when we square that, and keep on getting 1s until we stop iterating. As a consequence, we know that we won’t find any failing conditions, and can exit early by returning false (step 5.3). The same goes for step 4: if $a ^ d \equiv \pm 1 \pmod n$, we know that each of the following squares will equal 1, so we immediately return false.

The failing conditions – ie those that cause the algorithm to return true – might not be immediately clear. In 5.2, we know that, if $x = 1$, we’ve violated assumption 2, because that implies that the previous value of $x$ was not equivalent to $\pm 1 \pmod n$. Wait, why? Because if it were equal to -1, we would’ve already returned via 5.3 in the previous iteration, and if it were $1$, then we would’ve returned either from 5.3 in an earlier iteration still or 4 at the very beginning. We also return true when we hit 6, because we know that by that point, if assumption 1 is:

true, and $x = a ^ {n - 1} \equiv 1 \pmod n$, then the previous value of $x$ can’t be either 1 or -1 because we would already have returned via either 4 or 5.3.
false, then by definition $n$ can’t be prime, since the assumption must hold true for prime $n$

Finally, we simply repeat the is_witness() test $k = 5$ times. Here’s the final implementation:

def is_prime(n, k=5):
    if n == 2:
        return True

    if n <= 1 or n % 2 == 0:
        return False

    s, d = decompose_to_factors_of_2(n - 1)

    def is_witness(a):
        x = modular_power(a, d, n)
        if x in [1, n - 1]:
            return False

        for _ in range(s - 1):
            x = modular_power(x, 2, n)
            if x == 1:
                return True

            if x == n - 1:
                return False

        return True

    for _ in range(k):
        if is_witness(random.randint(2, n - 1)):
            return False

    return True

def decompose_to_factors_of_2(num):
    s = 0
    d = num

    while d % 2 == 0:
        d //= 2
        s += 1

    return s, d

Note that we’ve introduced a currently undefined function, modular_power(). The problem with computing $a ^ d \imod n$ and $x ^ 2 \imod n$ is that $a$, $d$, $x$, and $n$ are HUGE. Simply running (a ** d) % n would be asking for trouble. Fortunately, there are efficient ways of performing modular exponentiation, and we’ll implement one such method in the modular_power() function later in this article. Now, we need to actually prove the two assumptions that we base Rabin-Miller on.

Euclid’s lemma

…but before we do so, we need to prove Euclid’s Lemma, since both of the following proofs depend on it. It states that if $p$ is relatively prime to $a$ and $p \divs ab$, then $p \divs b$. We’ll prove it using Bezout’s Identity. The GCD of $a$ and $p$ is 1, so there must exist $x$ and $y$ that satisfy:

\[ax + py = 1\]

Multiply both sides by $b$:

\[abx + pby = b\]

$abx$ is divisible by $p$ (because it’s divisible by $ab$, which is divisible by $p$ according to the lemma’s requisite), and $pby$ is by definition divisible by $p$, so $b$ must be divisible by $p$ too.

proof of assumption 1

Our first assumption was that for a prime $p$, $a ^ {p - 1} \equiv 1 \pmod p$ for any $a$ not divisible by $p$. This is better known as Fermat’s Little Theorem. To prove it, begin by multiplying all of the numbers in the range $[1, p)$ by $a$:

\[a, 2a, 3a, \ldots, (p - 1) a\]

We make two observations:

given two values $x$ and $y$, $ax \equiv ay \pmod p$ is equivalent to $x \equiv y \pmod p$ (we effectively divide out $a$). We can prove this by rewriting $ax \equiv ay \pmod p$ as $ax - ay \equiv 0 \pmod p$, which implies that $p \divs ax - ay$, or $p \divs a(x - y)$. By Euclid’s Lemma, since $p$ and $a$ are coprime (reminder: this is a criterion of Fermat’s Little Theorem), $p \divs x - y$, which means we can write $x - y \equiv 0 \pmod p$, or $x \equiv y \pmod p$.
when each of its elements is simplified in $\imod p$, the above sequence is simply a rearrangement of $1, 2, \ldots, p - 1$. This is true because, firstly, its values all lie in the range $[1, p)$ – none can equal 0 since $p$ shares no factors other than 1 with either $a$ or any value in $1, 2, \ldots, p - 1$ due to its primeness. The trick now is to realize that, if we have two distinct values $x$ and $y$, and know that $ax \equiv ay \pmod p$, then by the previous observation we can “divide out $a$” and have $x \equiv y \pmod p$. If $x$ and $y$ were two values chosen from the $1, \ldots, p - 1$ sequence, we’d know that they’re all less than $p$, and can thus remove the $\imod p$ from the expression, leaving us with: $x = y$. In conclusion, the only way to satisfy $ax \equiv ay \imod p$ is to have $x$ be the same item as $y$, and that means that the distinct values in $a, \ldots, (p - 1) a$ map to distinct values in $1, \ldots, p - 1$.

By observation 1:

\[a \cdot 2a \cdot \ldots \cdot (p - 1) a \equiv 1 \cdot 2 \cdot \ldots \cdot (p - 1) \pmod p\\ a ^ {p - 1} (p - 1)! \equiv (p - 1)! \pmod p\]

By observation 2, we can cancel out each of the factors of $(p - 1)!$ from both sides of the expressions (after all, $p$ is prime and all of the factors of $(p - 1)!$ are less than it, so it’s coprime with all of them), which leaves us with:

\[a ^ {p - 1} \equiv 1 \pmod p\]

QED.

proof of assumption 2

We now prove assumption 2: if $p$ is prime and $x ^ 2 \equiv 1 \pmod p$, $x$ must equal $\pm 1 \imod p$. First, for greater clarity later on, we can rewrite our conclusion as: $p$ must divide either $x - 1$ or $x + 1$. Now, if $x ^ 2 \equiv 1 \pmod p$, then:

\[x ^ 2 - 1 \equiv 0 \pmod p\\ p \divs x ^ 2 - 1\\ p \divs (x - 1)(x + 1)\]

If $p$ divides $x - 1$, then:

\[x - 1 \equiv 0 \pmod p\\ x \equiv 1 \pmod p\]

and we’ve proven our conclusion. What if $p$ doesn’t divide $x - 1$? We can then leverage Euclid’s Lemma: if $p$ is relatively prime to $a$ and $p \divs ab$, then $p \divs b$. We know that $p$ is prime and doesn’t divide $x - 1$, so it’s relatively prime to $x - 1$, and we know that it divides $(x - 1)(x + 1)$. As a result, it has to divide $x + 1$, which implies that: $x \equiv -1 \pmod p$. Again, we’ve proven our conclusion, and thus proven assumption 2.

applying Rabin-Miller

Now that we’ve implemented Rabin-Miller, creating a large, random prime is almost trivial:

def get_random_prime(num_bits):
    lower_bound = 2 ** (num_bits - 2)
    upper_bound = 2 ** (num_bits - 1) - 1
    guess = random.randint(lower_bound, upper_bound)

    if guess % 2 == 0:
        guess += 1

    while not is_prime(guess):
        guess += 2

    return guess

The num_bits parameter is a bit of a weird way of specifying the desired size of the prime, but it’ll make sense since we usually want to create RSA keys of a specific bit-length (more on this later on).

wrapping it all up

At long last, we can define our create_key_pair() function.

def create_key_pair(bit_length):
    prime_bit_length = bit_length // 2
    p = get_random_prime(prime_bit_length)
    q = get_random_prime(prime_bit_length)
    n = p * q
    totient = (p - 1) * (q - 1)

    while True:
        e_candidate = random.randint(3, totient - 1)
        if e_candidate % 2 == 0:
            e_candidate += 1

        if coprime(e_candidate, totient):
            e = e_candidate
            break

    d = modular_inverse(e, totient)
    return e, d, n

The only thing that requires explanation is this bit_length business. The idea here is that we generally want to create RSA keys of a certain bit-length (1024 and 2048 are common values), so we pass in a parameter specifying the length. To make sure that $n$ has a bit-length approximately equal to bit_length, we need to make sure that the primes $p$ and $q$ that we use to create it have a bit length of bit_length / 2, since multiplying two $n$-bit numbers yields an approximately $2n$-bit value. How come? The number of bits in a positive integer $n$ is $\lfloor \log_2 n \rfloor + 1$, so the number of bits in $n ^ 2$ is $\lfloor \log_2 n ^ 2\rfloor + 1$. According to the logarithm power rule, we can rewrite $\log{a ^ b}$ as $b \cdot \log a$, so the bit length equals $\lfloor 2\log_2 n \rfloor + 1$. In other words, $n ^ 2$ has roughly twice as many bits as $n$.

encrypt/decrypt messages

In comparison to generating keys, encrypting and decrypting data with them is mercifully simple.

encrypt a message $m$ with public key $e$ and modulus $n$: $m ^ e \imod n$
decrypt a message $c$ with private key $d$ and modulus $n$: $c ^ d \imod n$

def encrypt(e, n, m):
    return modular_power(m, e, n)

def decrypt(d, n, c):
    return modular_power(c, d, n)

So, what’s modular_power()? The problem with the encryption and decryption operations, which look deceptively trivial, is that all of the values involved are big. Really, really big. As a result, naively solving $a ^ b \imod c$ by simply resolving $a ^ b$ and then simplifying that modulo $c$ is a no-go. Fortunately, there are more efficient ways of performing modular exponentiation, like exponentiation by squaring.

exponentiation by squaring

When trying to solve $a ^ b \imod c$, begin by representing $b$ in binary form:

\[b = 2 ^ {n - 1} bit_{n - 1} + 2 ^ {n - 2} bit_{n - 2} + \ldots + 2 bit_1 + bit_0\]

where $n$ is the total number of bits in $b$, and $bit$ represents the value of each bit – either 0 or 1. Now, rewrite the original expression:

\[a ^ b \imod c =\\ a ^ {2 ^ {n - 1} bit_{n - 1} + 2 ^ {n - 2} bit_{n - 2} + \ldots + 2 bit_1 + bit_0} \imod c =\\ a ^ {2 ^ {n - 1} bit_{n - 1}} \cdot a ^ { 2 ^ {n - 2} bit_{n - 2}} \cdot \ldots \cdot a ^ {2 bit_1} \cdot a ^ {bit_0} \imod c\\\]

For illustrative purposes, let’s temporarily remove the $bit$ factor from each exponent, which leaves us with:

\[a ^ {2 ^ {n - 1}} \cdot a ^ { 2 ^ {n - 2}} \cdot \ldots \cdot a ^ {2} \cdot a \imod c\]

It’s now obvious that each factor is a square of the one that precedes it: $a ^ {2}$ is the square of $a$, $a ^ {2 ^ {n - 1}}$ is the square of $a ^ { 2 ^ {n - 2}}$, etc. If we were to programmatically solve the expression, we could maintain a variable, say accumulator, that we’d initialize to $a$, and square from factor to factor to avoid recomputing $a ^ {\text{big exponent}}$ every time. Now, let’s reintroduce $bit$:

\[a ^ {2 ^ {n - 1} bit_{n - 1}} \cdot a ^ { 2 ^ {n - 2} bit_{n - 2}} \cdot \ldots \cdot a ^ {2 bit_1} \cdot a ^ {bit_0} \imod c\\\]

The good thing is that $bit$ has a limited set of possible values: just 0 and 1! Any value in the form $a ^ {2 ^ p bit}$ – that is, all of the above factors – evaluates to $a ^ {2 ^ p}$ when $bit = 1$, and $a ^ 0$, or 1, when $bit = 0$. In other words, the value of $bit$ only controls whether or not we multiply one of the factors into the accumulator that’ll become our ultimate result (since if $bit = 0$, we’ll just end up multiplying in 1, which means we shouldn’t even bother). Thus, modular_power() might look something like this:

def modular_power(base, exp, modulus):
    result = 1

    while exp:
        if exp % 2 == 1:
            result = result * base
        exp >>= 1
        base = base ** 2

    return result % modulus

But we still haven’t addressed the issue of multiplying huge numbers by huge numbers, and this version of modular_power() doesn’t perform much better than (base ** exp) % modulus (in fact, after some spot checking, it appears to be much slower!). We can address that by taking advantage of the following property of modular multiplication:

\[xy \imod z = (x \imod z)(y \imod z) \imod z\]

We can prove it by rewriting $x$ and $y$ in terms of $z$:

\[x = q_x z + r_x\\ y = q_y z + r_y\]

and substituting that into the original expression:

\[xy \imod z =\\ (q_x z + r_x) (q_y z + r_y) \imod z =\\ q_x q_y z ^ 2 + q_x z r_y + r_x q_y z + r_x r_y \imod z =\\ z(q_x q_y z + q_x r_y + r_x q_y) + r_x r_y \imod z =\\ r_x r_y \imod z\]

We’re able to remove the entire chunk of the expression that gets multiplied by $z$ because it’s by definition divisible by $z$, meaning that, taken $\imod z$, it would equal 0, and wouldn’t contribute anything to the sum. Thus, $xy \imod z$ equals $r_x r_y \imod z$, or $(x \imod z)(y \imod z) \imod z$.

Using that, we can make the following adjustment to our initial implementation:

def modular_power(base, exp, modulus):
    result = 1
    base %= modulus

    while exp:
        if exp % 2 == 1:
            result = (result * base) % modulus
        exp >>= 1
        base = (base ** 2) % modulus

    return result

We’re now taking % modulus in a bunch of places, which is valid due to the above property and prevents the value of both result and base from growing out of control.

That tops off our implementation of RSA. Here’s the entire source file.

acknowledgements

I wouldn’t have been able to present most of the proofs in this article without help from the following sources. One of the key motivations for gathering them all in one post is that, as I tried to understand all of the moving parts of RSA, I needed to sift through a lot of material to find accessible and satisfactory explanations:

nand2tetris: a book review and recap

Thu, 01 Jan 2015 16:00:00 -0500

what is Nand2Tetris?

Nand2Tetris, or The Elements of Computing Systems, is a twelve-part course in fundamental computer engineering that steps you through the creation of a computer from the ground up, starting with NAND logic gates and ending with an operating system capable of running a complicated program like Tetris.

The course, architected by Noam Nisan and Shimon Schocken, is available as a book that you can download for free (though it appears that some chapters are only available in terse PowerPoint form), and emphasizes a hands-on approach that leads up to some pretty epic struggles and Aha! moments. I just recently finished the course after about two months of hacking on it in my free time – if you reliably spend a couple hours a day on it, though, I can easily see you finishing in two weeks – and wanted to share an overview of the content and some thoughts.

content overview

Once upon a time, every computer specialist had a gestalt understanding of how computers worked. The overall interactions among hardware, software, compilers, and the operating system were simple and transparent enough to produce a coherent picture of the computer’s operations. As modern computer technologies have become increasingly more complex, this clarity is all but lost: the most fundamental ideas and techniques in computer science—the very essence of the field—are now hidden under many layers of obscure interfaces and proprietary implementations. An inevitable consequence of this complexity has been specialization, leading to computer science curricula of many courses, each covering a single aspect of the field.

Elements of Computing Systems: Preface

Nand2Tetris consists of twelve lectures/chapters, each of which tackles a next logical step in building a computer called “Hack,” and iterates on all of your work up to that point. Note that the book ships with various supplementary materials (which you can download here), including emulators for various components of the computer, like the hardware, stack, and virtual machine. Here’s an overview the ground you’ll cover:

I’ll briefly summarize the contents of each chapter (partly as a review for myself).

1: boolean logic

We learn about boolean logic, or logic with boolean values – conveniently, 0s and 1s – that facilitate logical/mathematical operations in hardware. We then construct primitive logic gates, like AND, OR, and MUX, which operate on single-bit inputs, and chain those together to implement their multi-bit (in this case, the Hack word, or two bytes) counterparts, like AND16.

2: boolean arithmetic

We cover binary addition and two’s complement, a means of representing signed numbers (in other words, negative and positive values instead of positive values only), and implement adder chips to perform addition at the hardware level. Finally, we devise an ALU (Arithmetic Logic Unit), which implements addition and value comparisons (ie, logic operations), but, unlike industrial-grade hardware, not either of multiplication and division. We’ll implement those operations at the software level – specifically, in the operating system’s math standard library – in the interest of simplicity, but at the expense of speed.

3: sequential logic

Throughout chapters 1 and 2 we implemented combinational chips using NAND gates, and got arithmetic/logic out of the way. This section introduces a new fundamental building block: the DFF, or Data Flip Flop, which will allow us to construct the second crucial component of our Hack computer – memory. Unlike combinational chips, which simply intake arguments via input pins and “immediately” spit out a result to output pins and are thus stateless, the sequential circuits that we’ll implement with flip-flops are capable of maintaining values across time. Note that, even though we treat the DFF as a fundamental chip, it can be implemented using NAND gates and more – Nand2Tetris just thoughtfully spares us that gory implementation. We implement a Bit, Register, and multiple RAM chips with iteratively larger capacities (64-word RAM consists of 8-word RAM, 512 of 64, etc.), and also a program counter, which we’ll use to keep track of the next CPU instruction to execute. This sequential business is a little mind-bending (and quite cool) because it effectively makes use of delayed recursion in a hardware context.

4: machine language

We’re introduced to the Hack machine language, or the format of the binary strings that our CPU (to be implemented in the next chapter) will interpret as instructions, and its correspondent assembly language: this is the interface between hardware and software. Assembly is a human-readable representation of machine code which allows instructions to be written with mnemonics like ADD or SUB; those are then compiled down to the appropriate binary by an assembler (to be implemented in chapter 6) – essentially a glorified preprocessor. Here’s an example of Hack assembly:

(LOOP)
	@END
	D;JEQ

	@sum
	M=M+D
	D=D-1

	@LOOP
	0;JMP

The above code adds all consecutive integers between 0 and some number, storing the sum in a variable sum.

5: computer architecture

We implement the Hack CPU, which abstracts away all hardware operations and exposes an API for executing them – that is, the machine language. The CPU integrates chapters 2 (the ALU) and 3 (RAM) in a classic mold of the von Neumann architecture:

6: assembler

Assembly! Everyone loves assembly! This section extends chapter 4, which documented the Hack assembly language spec., and has you implement the assembler that translates such programs to binary machine instructions.

7, 8: virtual machine

We learn about virtual machines, or platform-independent runtime environments that allow high-level languages to compile down to a portable intermediate representation, or IR, (in this case, the virtual machine language) that will run on any chip-set with an implementation of that virtual machine. Basically, since different CPUs potentially have different machine languages, writing native compilers for high-level languages would be a nightmare because the output binaries would have to be tweaked on a per-system basis. A virtual machine handles that concern by itself exposing an interface – in the form of a virtual machine language, or IR – for performing memory, logic, and math operations that target systems can reliably be expected to support. Platform-specific compilers that convert the IR to assembly do have to be written, but that problem is now centralized in one place; high-level language developers don’t have to worry about re-inventing the same compilation wheel if they build their language around the same virtual machine, instead leaving that problem to the virtual machine maintainers.

Anyway, the Hack virtual machine wraps its assembly language in a simple, stack-based interface. We implement the IR-to-assembly compiler, which becomes tricky once we involve things like stack frames. Sample code looks like:

function Point.new 0
	push constant 2
	call Memory.alloc 1
	pop pointer 0
	push argument 0
	pop this 0
	push argument 1
	pop this 1
	push pointer 0
	return

9: high-level language

We’re introduced to the spec for a high-level, object-oriented language (without garbage collection) not unlike Java, called Jack. The following Jack code defines a class Point, which represents a 2D geometric point:

class Point {
	field int _x, _y;

	constructor Point new(int x, int y){
		let _x = x;
		let _y = y;
		return this;
	}
}

10, 11: compiler

We implement a Jack compiler, which converts Jack programs to Hack virtual machine code. We learn about basic compilation techniques – tokenization, recursive-descent parsers – and features – symbol tables, parse trees.

12: operating system

Finally, we implement the Hack operating system (using Jack), which only consists of a number of standard system libraries that govern things like math, memory management, and graphics. The chapter centers heavily on algorithms, introducing some fascinating optimized approaches to problems including multiplication and heap allocation.

review and advice

That was a pretty wild ride. I heard about The Elements of Computing Systems nearly two years ago and kept it on the back-burner ever since, and am very glad I finally got around to reading it. Nisan and Schocken succeeded tremendously in what they set out to accomplish – creating a course that gives you a universal, if shallow, understanding of the entire hardware and software stack that computers operate on.

The individual sections are clear and concise, with just enough technical and academic background, examples, and project walkthroughs, and benefit from a uniform structure. Each project assignment involves a good deal of steering, as the authors underscore the suggested (though probably always the way you’d want to go anyway) approach to implementing the next stage of the computer, but with nothing in the way of concrete implementations – this encourages the reader to wet their feet and, in true hacker fashion, build the thing on their own. The software package that ships with the course is entirely bug-free, and the emulators are both user-friendly and robust (these things are easy to take for granted…).

An enormous amount of thought was clearly invested in the structure of the course. The various components of the Hack system have perfectly coupled interrelationships, and your work up to any single point almost magically helps you bootstrap the next project with incredible ease – this is mostly true for the hardware sections of the course, where chip creation is a highly iterative process, and lets you create substantially complicated circuits out of nothing in no time.

Another nice bit about Nand2Tetris is that it has much to offer to people at various skill levels. I entered the course having never written a line of assembly, nor did I have much knowledge about compilers and virtual machines, but I did have a reasonable amount of software engineering experience and at least a vague understanding of the aforementioned components: the course ended up perfect, though I suspect that it’s mostly aimed at people in my situation. Still, I can see it being useful even to greybeards with a nuanced knowledge of architectures, compilers, and operating systems, simply because it does such a good job of tying them all together in a single coherent project. I can imagine myself giving it another pass a couple of years from now, taking each of the projects further and refreshing myself on the overview it provides.

Finally, the course is lightweight: the book comes in at just under 300 pages, and that’s with twelve sections that collectively cover all of the vital components of a rudimentary computer. As a result, it doesn’t delve terribly far into any one of them; you won’t implement many elementary chips, the authors intentionally skip over involved problems like hardware multiplication, the computer won’t have a filesystem, you won’t come anywhere near hardware acceleration, networking isn’t covered, and the high-level language you develop is highly limited (both in syntax and functionality). That’s the point. The Elements of Computing Systems tries to provide a general introduction to each component and a coherent project that ties them all together – it’s not the place to go for an immersive foray into any of them. On the upside, it underscore a wealth of questions which you’re then encouraged to explore on your own.

Taking some notes (I did) for future reference might be a good idea while you read.

N2T is, in my opinion, a high quality must-read for software engineers. Can’t recommend it enough.

a note on requisites

This course is not for the amateur programmer. While the hardware chapters, the projects for which primarily consist of implementing chips using an HDL, or hardware description language, don’t require any prior experience with anything, the software sections involve the creation of reasonably complicated software in your programming language of choice. A solid grasp of recursion is necessary for parsing, tokenization would probably be hell without a knowledge of regex, and the compilers require some engineering acumen to implement cleanly – plus, it might be nice to have a vague understanding of all the various components of a computer’s hardware and software going into the course, so that it clarifies and refines your understanding of the various moving parts instead of simply introducing a bunch of theretofore unheard-of concepts that, as a result, might be difficult to appreciate. I hope someone proves me wrong, though!

vim syntax files

As a complete aside, you’ll work with a number of ad-hoc languages throughout the course: HDL, Hack assembly, Hack virtual machine language, and Jack. I’m a Vim user and got a little tired of the lack of syntax highlighting, so wrote up a minimalist plugin to provide it.

prime number spirals

Wed, 01 Oct 2014 17:00:00 -0400

prime number spirals

Prime number spirals are visualizations of the distribution of prime numbers that underscore their frequent occurrences along certain polynomials. They’re conceptually simple, yet create order out of the apparent chaos of primes and are fairly beautiful. We’ll explore the Ulam and Sacks spirals, some of their underlying theory, and algorithms to render each.

Ulam spiral

The story has it that Stanislaw Ulam, a Polish-American mathematician of thermonuclear fame¹, sat in a presentation of a “long and very boring paper” at a 1963 scientific conference. After some time, he began doodling (the hallmark of great genius), first writing out the first few positive integers in a counter-clockwise spiral, and then circling all of the prime numbers. And he noticed something that he’d later formulate as “a strongly nonrandom appearance.” Even on a small scale – say, the first 121 integers, which form a 11x11 grid – it’s visible that many primes align along certain diagonal lines.

Ulam later used MANIAC II, a first-generation computer built for Los Alamos National Laboratory in 1957, to generate images of the first 65,000² integers. The following spiral contains the first 360,000 (600x600):

Look closely, and we see much more than just white noise.

Sacks spiral

A software engineer named Robert Sacks devised a variant of the Ulam spiral in 1994. Unlike Ulam’s, Sacks’s spiral distributes integers along an Archimedean spiral, or a function of the polar form $r = a + b\theta$. Sacks discarded $a$ (which just controls the offset of the starting point of the curve from the pole) and used $b=\frac{1}{2\pi}$, leaving $r = \frac{\theta}{2\pi}$; he then plotted the squares of all the natural numbers – ${1, 4, 9, 16, 25, ...}$ – on the intersections of the spiral and the polar axis, and filled in the points between squares along the spiral, drawing them equidistant from one another.

prime-generating polynomials

The reason why we see ghostly diagonals is that some polynomials, informally called prime-generating polynomials, have aberrantly high occurrences of prime numbers. $n^2 + n + 41$, for instance, patented by Leonhard Euler in 1772, is prime for all $n$ in the range $[0, 39]$, yielding $43, 47, 53, 61, ..., 1523, 1601$. A variant is $n^2 - n + 41$, proposed by Adrien-Marie Legendre in 1798, which is prime in $[0, 40]$. Here are several others, as taken at random from Wolfram Mathworld:

\[\frac{1}{4}(n^5 - 133n^4 + 6729n^3 - 158379n^2 + 1720294n - 6823316)\\ \frac{1}{36}(n^6 - 126n^5 + 6217n^4 - 153066n^3 + 1987786n^2 - 13055316n + 34747236)\\ n^4 - 97n^3 + 3294n^2 - 45458n + 213589\\ n^5 - 99n^4 + 3588n^3 - 56822n^2 + 348272n - 286397\]

In the case of the rectangular Ulam spiral, these polynomials appear as diagonal lines. They were known about since 1772, if not earlier, and a prime-number spiral was hinted at twice before Ulam published his. In 1932 (31 years earlier before Ulam!), Laurence M. Klauber, a herpetologist primarily focused on the study of rattlesnakes, presented a method of using a spiral grid to identify prime-generating polynomials to the Mathematical Association of America. The second frequently-cited mention of prime spirals came from Arthur C. Clarke, a British science-fiction writer, whose The City and the Stars (1956) describes a protagonist, Jeserac, as “[setting] up the matrix of all possible integers, and [starting] his computer stringing the primes across its surface as beads might be arranged at the intersections of a mesh.” In my opinion, the second mention is fairly ambiguous, but the fact stands that, by the time Ulam published his famous spiral, a general understanding of prime-generating polynomials existed and people were considering ways of visualizing them. Thus, it’s perhaps a little disingenuous to suggest that he stumbled across it when “doodling” (something intricate) at random – there may have been some method to it.

rendering the spirals

I was introduced to prime number spirals about a year ago, by a video on the excellent Numberphile. I immediately jumped into hacking together a Python script to render the spirals on my own, because it’s both tremendously easy and very visually rewarding. I’ll revisit the implementation, this time in Javascript. I’m not going to show all of the necessary code (like HTML markup/CSS styles) in the interest of brevity, but the zipped files are linked to at the end of the post.

canvas setup

Let’s outline our interface. We’ll define functions ulamSpiral(numLayers) and sacksSpiral(numLayers), where the argument numLayers is the number of revolutions in the spiral, or effectively the number of rings that it contains. Both functions need to set the height and width of the canvas according to numLayers, and require a function drawPixel(x, y) to plot pixels. Note that we’ll want drawPixel() to treat the centroid of the canvas as its origin, so that drawPixel(0, 0) plots a point at its center and not the top-left corner. Because both the canvas dimensions and the offset used by drawPixel() are dependent on numLayers, we’ll bundle them them into a function called setupCanvas().

function setupCanvas(numLayers){
	"use strict";

	var sideLen = numLayers * 2 + 1;
	var canvas = document.getElementsByTagName("canvas")[0];
	canvas.setAttribute("width", sideLen);
	canvas.setAttribute("height", sideLen);

	var context = canvas.getContext("2d");
	return function drawPixel(x, y){
		context.fillRect(x + numLayers, y + numLayers, 1, 1);
	};
}

Note that we set sideLen equal to numLayers * 2 + 1, rather than only numLayers * 2, because we need to account for the row/column containing the origin of the spiral, which is not technically a ring. Now, we can use setupCanvas() to both set the canvas dimensions, and return a drawPixel() that takes advantage of closure to access all of the variables (numLayers, context) that it needs. Also, to draw a single pixel, we’re calling fillRect() with a width and height of 1 – the canvas unfortunately doesn’t have (or perhaps just doesn’t expose) a single pixel-plotting function. Finally, to test the primality of our values, we’ll use Kenan Yildirim’s primality library, which provides primality(val).

Ulam algorithm

The dull stuff aside, we can begin implementing ulamSpiral(). The general algorithm will run as follows:

Use variables x, y, and currValue to track the position and value of the current point – the “head” of the spiral.
Trace out the square spirals by incrementing/decrementing x and y, while incrementing currValue.
After the head of the spiral moves, if currValue is prime, plot a pixel at (x, y).

function ulamSpiral(numLayers){
	"use strict";

	var drawPixel = setupCanvas(numLayers);

	var currValue = 1;
	var x = 0;
	var y = 0;

	function drawLine(dx, dy, len){
		for(var pixel = 0; pixel < len; pixel++){
			if(primality(currValue++)){
				drawPixel(x, y);
			}
			x += dx;
			y += dy;
		}
	}

	for(var layer = 0, len = 0; layer <= numLayers; layer++, len += 2){
		drawLine(0, -1, len - 1);
		drawLine(-1, 0, len);
		drawLine(0, 1, len);
		drawLine(1, 0, len + 1);
	}
}

We simply iterate numLayers + 1 times, drawing rectangular layers – the spiral – as we go. I couldn’t think of a better solution than using a function drawLine(), which accepts a direction (dx and dy, one of which should be 0), and a length to draw four different straight lines (perhaps it can somehow be done in one elegant loop?).

Sacks algorithm

The Sacks spiral is a little more mathematically interesting because it relies (somewhat) on polar equations. Our algorithm:

Iterate numLayers times.
For each iteration, draw the values between the current square, $n ^ 2$, and the next, $n + 1 ^ 2$. Since $(n + 1)^2 - n^2 = n^2 + 2n + 1 - n^2 = 2n + 1$, there are $2n + 1$ points per iteration of $n$.
Render each prime point by calculating its angle off the polar axis (the aligned squares), then its radius, or distance from the pole, and then using trigonometry to solve for its cartesian coordinates.

function sacksSpiral(numLayers){
	"use strict";

	var drawPixel = setupCanvas(numLayers);

	var currValue = 1;
	for(var layer = 1; layer <= numLayers; layer++){
		var numPoints = 2 * layer + 1;
		var angle = 2 * Math.PI / numPoints;
		for(var point = 1; point <= numPoints; point++){
			if(primality(currValue++)){
				var theta = point * angle;
				var radius = layer + point / numPoints;
				var x = Math.cos(theta) * radius;
				var y = Math.sin(theta) * radius;
				drawPixel(Math.floor(x), Math.floor(y));
			}
		}
	}
}

To calculate the polar angle of any point, we first solve for the angle between subsequent points (var angle = 2 * Math.PI / numPoints;), and then multiply it by the fraction of the current rotation of the spiral that the point lies at (var theta = point * angle;). We’ll also Math.floor() the coordinates sent to drawPixel(), because, after the various trigonometic operations they’re likely decimals rather than integers and cause blurred canvas reading.

That’s all! For more reading on prime-number spirals, I recommend this in-depth article by Robert Sacks himself, and another write-up of algorithms used to render them.

Download all of the source code here, or view it on Github.

Ulam is also well-known for contributing to the Manhattan Project, proponing the Monte Carlo method of computation, and exploring spaceships propelled by nuclear explosions, amongst a large number of other things. ↩
Assuming that Ulam began rendering his spiral with the integer 1 (instead of something like 41, which is also common), I suspect that the generated images had exactly 65,025 integers. 65,000 integers implies as many pixels, the square root – the Ulam spiral is inherently square – of which is 254.95, which obviously isn’t a valid image height/width. Thus, we round to 255, and square for 65,025. ↩

power set algorithms

Sun, 21 Sep 2014 17:00:00 -0400

power sets

The power set of a set is the set of all its subsets, or a collection of all the different combinations of items contained in that given set: in this write-up, we’ll briefly explore the math behind power sets, and derive and compare three different algorithms used to generate them.

sets: primer

To refresh our memories: a set, the building block of set theory¹, is a collection of any number of unique objects whose order does not matter. A set is expressed using bracket notation, like $\{1, 2, 3\}$, and an empty, or null, set is represented using either of $\emptyset$ and $\{\}$. Because sets are order-agnostic, we can say that the $\{1, 2, 3\}$ and $\{3, 1, 2\}$ are equal, and, because they contain only distinct members, something like $\{1, 1, 2\}$ is invalid.

subsets and the power set

The subset of a set is any combination (the null set included) of its members, such that it is contained inside the superset; $\{a, b\}$, then, is a subset of $\{a, b, c\}$, while $\{a, d\}$ is not. If a subset contains all of the members of the parent set (ie, it’s a copy), we call it an improper subset – otherwise, it’s proper. Finally, the power set of a set is the collection of all of its subsets, so the power set of $\{a, b, c\}$ is:

\[\{ \{\}, \{a\}, \{b\}, \{c\}, \{a, b\}, \{a, c\}, \{b, c\}, \{a, b, c\} \}\]

the cardinality of a power set

The length, or cardinality, of a power set is $2^n$, where $n$ is the cardinality of the original set, so the number of subsets of something like $\{a, b, c\} (n=3)$ is 8 $(2^{n=3})$. Two ways of informally proving that property:

when creating a subset of a given set, we iterate over the members of the given set and choose whether each one will or will not be in the subset. Since there are 2 possible outcomes of each choice (the member either is or isn’t chosen) and there are $n$ elements, there must be $2^n$ subsets.
when adding an element to a set, to update its power set, you must create a copy of each of its existing subsets with the new element included. We’ll use this to implement our succinct second algorithm.

Note: the following algorithms are accompanied by Python implementations. To keep things simple, and because the algorithms are language-independent, I avoided using Python-specific built-ins (like yield) and functions (like list.extend()) that don’t have clear equivalents in most other languages, even though they would’ve made some code much cleaner. Also, even though we’re dealing with sets, we’ll use lists (arrays) under the assumption that they contain distinct elements.

algorithm 1: recursive k-subsets

This was my first stab at an algorithm that, given a set, returns its power set, and surprise! It’s the least intuitive and most inelegant of the three. We begin by writing a recursive function k_subsets() to find all of a set’s subsets of cardinality $k$ (a.k.a. its $k$-subsets):

generating k-subsets

Given a set of length $n$ and a desired subset of length $k$, iterate over the first $n - k + 1$ elements.
For each element, make a recursive call to retrieve the $(k-1)$-subsets for the remainder of the array (all elements after the current one).
Append the element to each $(k-1)$-subset, and return these subsets.

def k_subsets(k, set_):
    if k == 0:
        return [[]]
    else:
        subsets = []
        for ind in xrange(len(set_) - k + 1):
            for subset in k_subsets(k - 1, set_[ind + 1:]):
                subsets.append(subset + [set_[ind]])
        return subsets

from k-subsets to power set

With the ability to generate any $k$-subset, the key to creating a power set is finding the $k$-subsets for all valid $k$, which lie in the range $[0, n]$ ($n$, again, is the cardinality of the superset)!

For any $k$ in $[0, n]$:
find the set’s $k$-subsets

We’ll introduce a wrapper function, power_set(), in which we’ll nest a slightly modified k_subsets() that takes advantage of closures.

def power_set_1(set_):
    def k_subsets(k, start_ind):
        if k == 0:
            return [[]]
        else:
            subsets = []
            for ind in xrange(start_ind, len(set_) - k + 1):
                for subset in k_subsets(k - 1, ind + 1):
                    subsets.append(subset + [set_[ind]])
            return subsets

    subsets = []
    for k in xrange(len(set_) + 1):
        for subset in k_subsets(k, 0):
            subsets.append(subset)
    return subsets

algorithm 2: iterative appending

The second algorithm relies on our second informal proof of sets’ cardinality: whenever an element is added to a set, it must be added to copies of all the subsets in its current power set to form the new one. Thus:

Start with an empty set, $\{\}$, and its power-set, $\{\{\}\}$.
For every element inside the superset:
Create a copy of every set in the current power-set
Add the element to each one.
Add the copies to the current power-set.

Like so:

def power_set_2(set_):
    subsets = [[]]
    for element in set_:
        for ind in xrange(len(subsets)):
            subsets.append(subsets[ind] + [element])
    return subsets

algorithm 3: binary representation

The third algorithm is a clever hack, and relies on the binary representation of an incremented number to construct subsets. In our first proof of the cardinality of a power set, we iterated over each element of an argument set and made a choice with two possible outcomes (the element either was or wasn’t a member of the subset): $\underbrace{2 \times 2 \times ... \times 2}_{n} = 2^n$. Let’s consider an integer of $n$-bits: it has $2^n$ possible values in the range $[0, 2^n - 1]$, meaning that we can use it to represent $2^n$ distinct arrangements of $n$ bits. Hmm…

Iterate over the range $[0, 2^n - 1]$.
For every value, examine each of its $n$ bits.
If the $k$th bit has a value of 1, add the $k$th value of the superset to the current subset.

def is_bit_flipped(num, bit):
    return (num >> bit) & 1

def power_set_3(set_):
    subsets = []
    for subset in xrange(2 ** len(set_)):
        new_subset = []
        for bit in xrange(len(set_)):
            if is_bit_flipped(subset, bit):
                new_subset.append(set_[bit])
        subsets.append(new_subset)
    return subsets

1: (completely tangentially) whenever I mention set theory I can’t help but think of the infamous Principia Mathematica: a staggering, three-volume attempt to axiomatize all of mathematics, published by Bertrand Russell and Alfred North Whitehead in 1910-‘13, that relied heavily on sets. It’s notorious, amongst other things, for proving $1 + 1 = 2$ in no less than 379 pages. Check it out.

Severyn Kozak

emulating exceptions in C

case study: a recursive-descent parser

recursive-descent parsing

error-handling

error codes

exceptions, exceptions

longjmp(), setjmp()

integrating them into the parser

intermediate cleanup

in conclusion

RSA: implementation and proofs

what is RSA?

math precursor

how RSA works

generate a key-pair

example

finding \(\phi(n)\)

computing GCDs

prove \(\gcd(a, b) \divs c\)

prove \(\gcd(b, c) \divs a\)

prove \(\gcd(a, b) = \gcd(b, a - b)\)

prove \(\gcd(b, a - b) = \gcd(b, a \imod b)\)

finding modular inverses

the Extended Euclidean algorithm

generating large, random primes

Rabin-Miller primality test

Euclid’s lemma

proof of assumption 1

proof of assumption 2

applying Rabin-Miller

wrapping it all up

encrypt/decrypt messages

exponentiation by squaring

acknowledgements

nand2tetris: a book review and recap

what is Nand2Tetris?

content overview

1: boolean logic

2: boolean arithmetic

3: sequential logic

4: machine language

5: computer architecture

6: assembler

7, 8: virtual machine

9: high-level language

10, 11: compiler

12: operating system

review and advice

a note on requisites

vim syntax files

prime number spirals

prime number spirals

Ulam spiral

Sacks spiral

prime-generating polynomials

rendering the spirals

canvas setup

Ulam algorithm

Sacks algorithm

power set algorithms

power sets

sets: primer

subsets and the power set

the cardinality of a power set

algorithm 1: recursive k-subsets

generating k-subsets

from k-subsets to power set

algorithm 2: iterative appending

algorithm 3: binary representation

`longjmp()`, `setjmp()`