I recently stumbled across a practical use-case for simulated exceptions in C while writing a recursive-descent JSON parser for fun and profit. In this quick write-up, I’ll give a high-level overview of the problems that I ran into, why exceptions were ideal for error handling, and how I emulated them in C.
I won’t dwell on the details of the parser itself because this post is about the error-handling mechanism, but a minimal understanding of recursive-descent parsing is necessary to appreciate it. As with any kind of parsing, we start out with the formal grammar of our language/data format/whatever. A simple grammar for common programming language literals might look like:
value: string | boolean | number
string: '"' char* '"'
boolean: 'true' | 'false'
number: '-'? digit+ ('.' digit+)?
array: '[' (value ((',' value)*)?)? ']'
In fact, the JSON grammar that I used is fairly similar. Writing a recursive-descent parser for a grammar like the above is straightforward, because you simply map each rule onto a corresponding parse function. In pseudocode, we might have:
parse()
# perform setup
return parseValue()
parseValue()
if nextIsString()
return parseString()
else if nextIsNumber()
return parseNumber
else if nextIsBoolean()
return parseBoolean()
else if nextIsArray()
return parseArray()
else
throw ParseError()
parseString()
matchChars('"')
string = readCharsUntil('"')
matchChars('"')
return string
parseBoolean()
if peekChar() == 't'
matchChars('true')
return true
else
matchChars('false')
return false
# and so on
The gist is that we have a bunch of mutually recursive parsing routines that ultimately rely on very primitive,
low-level functions (like nextChar()
, readCharsUntil()
, matchChars()
, etc. in the above example) that operate
directly on the string being parsed.
Most of the errors that we need to worry about will occur in those primitives: nextChar()
might fail
to read a character because it hit the end of the input stream and matchChars()
might find an unexpected character,
for example. We may also want to manually signal an error in one of our high-level parsing routines, like we do in
parseValue()
when we can’t detect any valid values ahead. The key observations to make are that in a recursive-descent
parser, the call stack will grow quite deep, and that errors are fatal; in other words, when one occurs, we need to
return
through many layers of function calls until we hit the parse()
that started it all:
parse() # The top-level parse routine that we need to jump back to.
parseValue()
parseArray()
parseValue()
parseBoolean()
matchChars()
getNextChar() # Error, hit EOF!
How should we handle errors in C, then?
The idiomatic solution is to simply use error codes. If nextChar()
fails, return -1
(which is suitable because
character values can’t be negative), and make sure to actually check that return value every time you call it.
char chr = nextChar(parserState);
if(chr == -1){
return -1;
}
Note that the parserState
argument passed to nextChar()
is a (pointer to a) struct
containing the parser’s state:
a pointer to the string being parsed, its length, the current index in that string, etc.
In practice, we’d probably settle for a more sophisticated solution that involves storing error information inside
parserState
, like a boolean indicating whether a failure occurred and an error message to accompany it, since it’s
more flexible:
char chr = nextChar(parserState);
if(parserState->failed){
puts(parserState->errMsg); // just an example
return NULL;
}
Either way, the result is that we have to remember to manually check some error value after every call to a parse routine that carried the possibility of failure. It bloats your code with repetitive conditionals and prevents you from using the return value of a parse routine directly in an expression because, again, you need an explicit conditional. Can we do better?
An exception mechanism would be ideal here, since we want to jump back to an arbitrary point in the call stack (in
our case, parse()
) from any one function. While C doesn’t provide us with real exceptions, we can simulate
them…
longjmp()
, setjmp()
Enter longjmp()
and setjmp()
; like goto
, but nuclear! From the manpage, these functions facilitate
“nonlocal jumps to a saved stack context,” or, in other words, allow you to perform jumps across functions. Use with
extreme caution. The gist is that setjmp()
is used to initialize a jmp_buf
, storing critical information about
the current calling environment – it’s highly system-specific, but generally includes things like the stack pointer
and current register values – and returns 0 (the first time it returns – this will be explained shortly). You can
then pass that jmp_buf
to longjmp()
at any other point, and the program will rewind execution back to the
setjmp()
call. You’ll also need to pass a non-zero int
to longjmp()
, which will be the value that setjmp()
returns this time around; this allows us to discriminate between the times that setjmp()
returns a.) initially and
b.) after a jump was performed. An example should set things straight:
#include <stdio.h>
#include <setjmp.h>
void bar(jmp_buf jmpBuf){
puts("inside bar()");
longjmp(jmpBuf, 1);
puts("this should never run!");
}
void foo(void){
jmp_buf jmpBuf;
if(!setjmp(jmpBuf)){
// This runs after `setjmp()` returns normally.
puts("calling bar()");
bar(jmpBuf);
puts("this should never run!");
}
else {
// This runs after `setjmp()` returns from a `longjmp()`.
puts("returned from bar()");
}
}
int main(){
foo();
return 0;
}
When compiled and run, you should see:
calling bar()
inside bar()
returned from bar()
Notice how we wrap the call to setjmp()
in a conditional, which allows us to selectively run different code after it
returned regularly (returning 0) and then after a jump occurred (returning whatever argument was passed to longjmp()
,
or, in our case, 1). Continuing the exceptions analogy, this is similar to a try {} catch {}
.
Also, note that jmp_buf
is typedef
‘d as an array of the actual jmp_buf
structs with only one element – in
other words, when you declare jmp_buf jmpBuf;
, the struct inside jmpBuf
lives entirely on the stack but jmpBuf
will decay to a pointer if you pass it to a function. In my opinion that’s rather misleading and I would’ve preferred
to manually, explicitly use pointer notation when necessary, but it is what it is.
The idea is to initialize a jmp_buf
in the parse()
function with setjmp()
, store it inside the parserState
struct in a prevErrorTrap
member (couldn’t think of a better name), and then longjmp()
to it whenever an error
occurs. If that were all, using this solution would be a no-brainer, but alas, there’s a complication: some of our
parsing routines might need to perform cleanup before exiting, like free()
ing temporarily allocated memory. For
instance, the parseArray()
function in my parser allocates a stretchy array to house all of the values that it
successfully parses; if an error occurs in one of the parseValue()
calls that it makes, it needs to deallocate all of
the values parsed thus far and then the array itself. If we jump from the point where the error occurred to the very
beginning of the parse, though, we don’t have any means of doing so.
Two solutions come to mind:
parserState
,
and then free()
ing them inside the top-level parse()
if an error occurredI ultimately settled for the latter, and the idea’s the same as before: in functions like parseArray()
and any
others that allocate intermediate memory, create a copy of the current jump buffer (parserState->prevErrorTrap
),
and then set parserState->prevErrorTrap
to a new jump buffer created with setjmp()
– this one will get used
by all of the parse routines called by the current one. If the parse succeeds, just restore
parserState->prevErrorTrap
to the original jump buffer before returning. If it fails, perform cleanup and jump
directly to the original buffer. Here’s an example taken straight from the parser’s source, with irrelevant bits
omitted:
static JsonArray_t JsonParser_parseArray(JsonParser_t *state){
/**
* Omitted: perform setup here.
*/
jmp_buf prevErrorTrap;
copyJmpBuf(prevErrorTrap, state->errorTrap);
// The stretchy array used to store parsed values. Read on
// for why `volatile` is necessary.
JsonVal_t *volatile values = NULL;
if(!setjmp(state->errorTrap)){
/**
* Omitted: parse values into `values` with repeated calls
* to `parseValue()`.
*/
// If we get this far, then no error occurred, so restore the
// original `prevErrorTrap`.
copyJmpBuf(state->errorTrap, prevErrorTrap);
return (JsonArray_t){
.length = sb_count(values),
.values = values
};
}
else {
// An error occurred! Deallocate all intermediate memory,
// and then jump to the previous `prevErrorTrap`.
for(int ind = 0; ind < sb_count(values); ind++){
JsonVal_free(&values[ind]);
}
sb_free(values);
longjmp(prevErrorTrap, 1);
}
}
copyJmpBuf()
is just a convenience wrapper for memcpy()
:
static void *copyJmpBuf(jmp_buf dest, const jmp_buf src){
return memcpy(dest, src, sizeof(jmp_buf));
}
One other thing to note is that we declared the values
pointer as volatile
to prevent the compiler from placing it
into a register. Why? The problem is that we modify values
after the call to setjmp()
, namely when we
perform the initial allocation of a stretchy array and then whenever it gets resized and a realloc()
changes the
location of the items that it contains. When a long jump occurs, register values are restored from whatever they were at the
time of the setjmp()
call, since those are what it copied into the target jmp_buf
; if the compiler decided to put
values
into a register, then after the jump, it would be set to NULL
.
To prevent that from happening, we use the volatile
specifier. See this SO
post for more; this is an example of
the potentially very dangerous subtleties of long jumping. In fact, while writing my parser I forgot to add in the
volatile
specifier to values
, and noticed that it was leaking memory (thank you valgrind!)
whenever an error occurred even though the cleanup clause was getting run. It turns out that values
would get put
into a register and then consequently take on a value of NULL
after the jump – since that’s what it was at the time
of the original setjmp()
– meaning that the only reference to the allocated memory was lost and it couldn’t possibly
be deallocated. Moreover, when passed to free()
, it wouldn’t blow up, because free()
ignores NULL pointers!
To wrap up the above example, all of the other parsing functions that set intermediate breakpoints have virtually the same
layout, so you could even theoretically encapsulate the different statements in macros like try
and catch
for a
full blown imitation of exceptions in other languages – that’s too much magic for me, though.
longjmp()
and setjmp()
are tricky. They’re obscure, can give rise to subtle bugs, are highly platform-specific,
and, if abused, will probably lead to awfully confusing code. That being said, like
goto
, they do have valid uses and can be very powerful when used appropriately. In this case, I think they were
superior to error codes and resulted in a slimmer, more readable implementation than what it otherwise would’ve been.
If you’re interested in more reading, I recommend this comprehensive
article. Also,
here’s the thoroughly documented parser source code; check out src/json_parser.c
.
RSA is a public-key, or asymmetric, encryption algorithm. In contrast to symmetric algorithms, like DES and AES, which use the same key for both encryption and decryption, RSA employs two distinct keys: a public key used to encrypt data, and a private key used to decrypt whatever was encrypted with the public one. The beauty of public-key encryption is that the parties involved never need to exchange a master key, meaning that communications can be securely encrypted without any prior contact.
Public-key encryption was proposed by Whitfield Diffie and Martin Hellman in ‘76, while RSA itself was patented in ‘77 by Ron Rivest, Adi Shamir, and Leonard Adleman, who then went on to found a cybersecurity company of the same name – confusing, but great PR!
Clifford Cocks, an English cryptographer, arrived at a similar algorithm in ‘73 while working for British intelligence at GHCQ, but his work wasn’t declassified until 1998 due to its sensitivity. Forty years later, RSA underpins SSL certification, SSH handshakes, and lots more.
In this post, we’ll implement RSA, but we’ll very much take the long way around while doing so. The algorithm introduces a number of interesting problems, like finding greatest common divisors, performing modular exponentiation, computing modular inverses, and generating random prime numbers, each of which we’ll thoroughly explore and derive solutions to (many of these won’t be immediately clear, so we’ll formally prove them as we go). Note that we won’t prove RSA itself – I might add that as an extension to the article at some point in the future.
The only thing we need to know before diving into RSA is some modular
arithmetic, which is simply arithmetic with the property that
numbers have a maximum value (called the modulus) and wrap around to 0 when they exceed it. When we take a number
, we’re basically taking the remainder of ; most programming languages provide this in the
form of a mod
function or %
operator. We’ll see lots of expressions in the form of:
Here, the symbol implies congruence, or that equals . An important gotcha is that applies to both sides of the expression, which isn’t immediately obvious to anyone used to the modulo operator in the programming sense. Many sources choose to omit the parentheses, simply writing , which just compounds the confusion; the clearest notation would probably be something like . This is extremely important to remember because otherwise, expressions like won’t make any sense at all (“but if is equal to 1 for all not equal to 1, why not just write ?!”).
Some notes about miscellaneous notation:
RSA revolves around a numeric key-pair, or a mathematically related public and private key. The public key is made known to the world, which can then use it to encrypt a message, while the private key can be used to decrypt anything encrypted with the public key. Encrypting and decrypting a message is fairly straightforward, while generating a key-pair is a more substantial process.
To generate a public/private key-pair:
Though short and concise, the above steps present several complex problems:
Before we dive into solving those, let’s walk through the process of generating a key-pair using some small sample numbers.
Easy! Except, of course, we weren’t dealing with numbers with hundreds of digits – that’s the hard part. :)
To compute , we can take advantage of the fact that it’s composed of two prime factors: and . Thus, the only values with which it shares GCDs that aren’t 1 must be multiples of either or (for instance, and ). There are only multiples of () and multiples of () that are less than or equal to . Thus, there are values in the range that have a GCD with not equal to 1. Note, however, that we double counted in our list of multiples of and , so in reality it’s . Thus, , where is the total numbers of values in the range – that is, .
To find the GCD of two numbers, we’ll employ the Euclidean algorithm:
or:
def gcd(a, b):
return abs(a) if b == 0 else gcd(b, a % b)
Let’s prove it. Case 1 should be self-explanatory: 0 is technically divisible by any number, even if the quotient equals 0, so the GCD of 0 and any other number should be that number. We need to be careful and take its absolute value, however, to account for negative values; the greatest divisor of -5 is 5, after all, not -5, so the GCD of 0 and -5 must also be 5. Thus, we have to take the absolute value of -5 to arrive at the greatest divisor.
Case 2 is less intuitive (at least for me), and requires proving that . Let’s begin by creating another variable :
We first want to prove that the GCD of and divides (or ). Begin by rewriting and as products of their GCD.
and are just placeholders: we don’t want to know or care what they equal. Now, plug those into the definition of :
Since we’ve shown that is the product of and another value, it is by definition divisible by .
Apply the same logic here:
We know that, by definition, , and we’ve proven that . Thus, is a common divisor of both and . That doesn’t imply that it’s the least common divisor, greatest, or anything else: all we know is that it divides both numbers. We do know that there exists a greatest common divisor of and , , so we can conclude that:
We now re-apply that same reasoning. We know that and . Thus, is a common divisor of and . Since we know that the greatest common divisor of and is , we can conclude that:
But now we have two almost contradictory conclusions:
The only way these can both be true is if:
So we’ve proven that (remember, ).
First, let’s assume that , and rewrite it as: (or )
Now, we already know that , Since order doesn’t matter, we can rewrite as . Now, we apply the rule again.
or:
Bingo. We’ve proven Case 2, and completed our proof of the Euclidean Algorithm. Before we move on, we’ll also define a
convenience wrapper for gcd()
that determines whether two numbers are prime:
def coprime(a, b):
return gcd(a, b) == 1
Given a value and modulus , the modular multiplicative inverse of is a value that satisfies:
This implies that there exists some value for which:
This turns out to be in the form of Bézout’s identity, which states that for values and , there exist values and that satisfy:
and , called Bézout coefficients, can be solved for using the Extended Euclidean
algorithm (EEA). corresponds to , or the
modular inverse that we were looking for, while can be thrown out once computed. The EEA will also give us the
GCD of and – it is, after all, an extension of the Euclidean algorithm, which we use to find the GCD of
two values. We need to verify that it equals 1, since we make the assume that ; if it doesn’t,
has no modular inverse. Since modular_inverse()
is just a wrapper for EEA – to be implemented in a function called
bezout_coefficients()
– its definition is simple:
def modular_inverse(num, modulus):
coef1, _, gcd = bezout_coefficients(num, modulus)
return coef1 if gcd == 1 else None
bezout_coefficients()
is a bit tricker:
def bezout_coefficients(a, b):
if b == 0:
return -1 if a < 0 else 1, 0, abs(a)
else:
quotient, remainder = divmod(a, b)
coef1, coef2, gcd = bezout_coefficients(b, remainder)
return coef2, coef1 - quotient * coef2, gcd
Let’s see why it works.
How to solve for and ? Bezout’s Identity states:
or, for :
Let’s simplify:
Here, represents the floor function, which floors the result of to an integer.
Since we know, by the already proven Euclidean algorithm, that , we can write:
So, and . But what are and ? They’re the results of running the EEA on ! Classic recursion. In sum:
def bezout_coefficients(a, b):
quotient, remainder = divmod(a, b)
coef1, coef2 = bezout_coefficients(b, remainder)
return coef2, coef1 - quotient * coef2
Of course, we need a base case, or we’ll end up recursing ad infinitum. Let’s take the case of .
So, if , we set the coefficient to 1 if is positive and -1 is is negative, and set to… what? If is 0, then can take on any value. For simplicity’s sake we’ll choose 0. Our revised definition looks like:
def bezout_coefficients(a, b):
if b == 0:
return -1 if a < 0 else 1, 0
else:
quotient, remainder = divmod(a, b)
coef1, coef2 = bezout_coefficients(b, remainder)
return coef2, coef1 - quotient * coef2
Also note that, since this is simply a more involved version of the Euclidean algorithm (we’re making recursive calls
to bezout_coefficients(b, remainder)
and have a base case of b == 0
), when we hit the base case, abs(a)
is the
GCD of a
and b
. Since modular_inverse()
needs to check that the GCD of its two arguments equals 1, we should
return it in addition to the coefficients themselves. Hence, we’ll let it trickle up from our base case into the final
return value:
def bezout_coefficients(a, b):
if b == 0:
return -1 if a < 0 else 1, 0, abs(a)
else:
quotient, remainder = divmod(a, b)
coef1, coef2, gcd = bezout_coefficients(b, remainder)
return coef2, coef1 - quotient * coef2, gcd
Here’s the idea:
Easy enough, except for the bit about testing primality. How to do so efficiently? We’ll turn to the Rabin-Miller algorithm, a probabilistic primality test which either tells us with absolute certainty that a number is composite, or with high likelihood that it’s prime. We’re fine with a merely probabilistic solution because it’s fast, since speed is a non-negligible issue due to the size of the numbers that we’re dealing with, and also because the chances of a false positive (ie indicating that a number is prime when it’s actually composite) are astronomically low after even only a few iterations of the test.
The Rabin-Miller test relies on the below two assumptions (just accept that they’re true for now, and we’ll prove them later on). If is a prime number:
Using these, you can test a value for compositeness like so (note that we return true
/false
to indicate
definite compositeness/probable primality respectively):
true
false
true
false
.In sum, we return true
if we’ve confirmed that is a witness to the compositeness of , and false
if
does not prove that is composite – transitively, there is a high chance that is prime, but we can
only be more sure by running more such tests. While the above steps serve as a good verbal description of the
algorithm, we’ll have to slightly modify them to convert the algorithm into real code.
We need to implement a function is_witness()
, which checks whether a random value is a witness to the compositeness
of our prime candidate, .
false
true
false
true
These steps seem quite a bit different from before, but in reality, they’re exactly the same and just operating in reverse. We start with a value that doesn’t have an integer square root, and square it until we hit . Why did we bother decomposing into the form of ? Well, it allows us to rewrite as , and now we know exactly how many times we can take square roots before we hit a value that isn’t reducible any further – in this case, .
So, if we start with and square it, we’ll get , then , then ,
and ultimately , or . What’s the advantage of starting from the non-reducible value and
squaring it, rather than the reducible value and taking its square roots? It sometimes allows us to short-circuit the
process. For instance, as we iterate through the squares of , if we find an occurrence of -1, we know that
we’ll get 1 when we square it, and 1 when we square that, and keep on getting 1s until we stop iterating. As a
consequence, we know that we won’t find any failing conditions, and can exit early by returning false
(step 5.3).
The same goes for step 4: if , we know that each of the following squares will equal
1, so we immediately return false
.
The failing conditions – ie those that cause the algorithm to return true
– might not be immediately clear. In
5.2, we know that, if , we’ve violated assumption 2, because that implies that the previous value of
was not equivalent to . Wait, why? Because if it were equal to -1, we would’ve already returned
via 5.3 in the previous iteration, and if it were , then we would’ve returned either from 5.3 in an
earlier iteration still or 4 at the very beginning. We also return true
when we hit 6, because we know that
by that point, if assumption 1 is:
Finally, we simply repeat the is_witness()
test times. Here’s the final implementation:
def is_prime(n, k=5):
if n == 2:
return True
if n <= 1 or n % 2 == 0:
return False
s, d = decompose_to_factors_of_2(n - 1)
def is_witness(a):
x = modular_power(a, d, n)
if x in [1, n - 1]:
return False
for _ in range(s - 1):
x = modular_power(x, 2, n)
if x == 1:
return True
if x == n - 1:
return False
return True
for _ in range(k):
if is_witness(random.randint(2, n - 1)):
return False
return True
def decompose_to_factors_of_2(num):
s = 0
d = num
while d % 2 == 0:
d //= 2
s += 1
return s, d
Note that we’ve introduced a currently undefined function, modular_power()
. The problem with computing and is that , , , and are HUGE. Simply running (a ** d) % n
would be
asking for trouble. Fortunately, there are efficient ways of performing modular exponentiation, and we’ll implement
one such method in the modular_power()
function later in this article. Now, we need to actually prove the two
assumptions that we base Rabin-Miller on.
…but before we do so, we need to prove Euclid’s Lemma, since both of the following proofs depend on it. It states that if is relatively prime to and , then . We’ll prove it using Bezout’s Identity. The GCD of and is 1, so there must exist and that satisfy:
Multiply both sides by :
is divisible by (because it’s divisible by , which is divisible by according to the lemma’s requisite), and is by definition divisible by , so must be divisible by too.
Our first assumption was that for a prime , for any not divisible by . This is better known as Fermat’s Little Theorem. To prove it, begin by multiplying all of the numbers in the range by :
We make two observations:
given two values and , is equivalent to (we effectively divide out ). We can prove this by rewriting as , which implies that , or . By Euclid’s Lemma, since and are coprime (reminder: this is a criterion of Fermat’s Little Theorem), , which means we can write , or .
when each of its elements is simplified in , the above sequence is simply a rearrangement of . This is true because, firstly, its values all lie in the range – none can equal 0 since shares no factors other than 1 with either or any value in due to its primeness. The trick now is to realize that, if we have two distinct values and , and know that , then by the previous observation we can “divide out ” and have . If and were two values chosen from the sequence, we’d know that they’re all less than , and can thus remove the from the expression, leaving us with: . In conclusion, the only way to satisfy is to have be the same item as , and that means that the distinct values in map to distinct values in .
By observation 1:
By observation 2, we can cancel out each of the factors of from both sides of the expressions (after all, is prime and all of the factors of are less than it, so it’s coprime with all of them), which leaves us with:
QED.
We now prove assumption 2: if is prime and , must equal . First, for greater clarity later on, we can rewrite our conclusion as: must divide either or . Now, if , then:
If divides , then:
and we’ve proven our conclusion. What if doesn’t divide ? We can then leverage Euclid’s Lemma: if is relatively prime to and , then . We know that is prime and doesn’t divide , so it’s relatively prime to , and we know that it divides . As a result, it has to divide , which implies that: . Again, we’ve proven our conclusion, and thus proven assumption 2.
Now that we’ve implemented Rabin-Miller, creating a large, random prime is almost trivial:
def get_random_prime(num_bits):
lower_bound = 2 ** (num_bits - 2)
upper_bound = 2 ** (num_bits - 1) - 1
guess = random.randint(lower_bound, upper_bound)
if guess % 2 == 0:
guess += 1
while not is_prime(guess):
guess += 2
return guess
The num_bits
parameter is a bit of a weird way of specifying the desired size of the prime, but it’ll make sense
since we usually want to create RSA keys of a specific bit-length (more on this later on).
At long last, we can define our create_key_pair()
function.
def create_key_pair(bit_length):
prime_bit_length = bit_length // 2
p = get_random_prime(prime_bit_length)
q = get_random_prime(prime_bit_length)
n = p * q
totient = (p - 1) * (q - 1)
while True:
e_candidate = random.randint(3, totient - 1)
if e_candidate % 2 == 0:
e_candidate += 1
if coprime(e_candidate, totient):
e = e_candidate
break
d = modular_inverse(e, totient)
return e, d, n
The only thing that requires explanation is this bit_length
business. The idea here is that we generally want to
create RSA keys of a certain bit-length (1024 and 2048 are common values), so we pass in a parameter specifying the
length. To make sure that has a bit-length approximately equal to bit_length
, we need to make sure that the
primes and that we use to create it have a bit length of bit_length / 2
, since multiplying two -bit
numbers yields an approximately -bit value. How come? The number of bits in a positive integer is
, so the number of bits in is . According
to the logarithm power rule, we can
rewrite as , so the bit length equals . In other
words, has roughly twice as many bits as .
In comparison to generating keys, encrypting and decrypting data with them is mercifully simple.
def encrypt(e, n, m):
return modular_power(m, e, n)
def decrypt(d, n, c):
return modular_power(c, d, n)
So, what’s modular_power()
? The problem with the encryption and decryption operations, which look
deceptively trivial, is that all of the values involved are big. Really, really big. As a result, naively solving by simply resolving and then simplifying that modulo is a no-go. Fortunately, there are
more efficient ways of performing modular exponentiation, like
exponentiation by squaring.
When trying to solve , begin by representing in binary form:
where is the total number of bits in , and represents the value of each bit – either 0 or 1. Now, rewrite the original expression:
For illustrative purposes, let’s temporarily remove the factor from each exponent, which leaves us with:
It’s now obvious that each factor is a square of the one that precedes it: is the square of ,
is the square of , etc. If we were to programmatically solve the
expression, we could maintain a variable, say accumulator
, that we’d initialize to , and square from
factor to factor to avoid recomputing every time. Now, let’s reintroduce :
The good thing is that has a limited set of possible values: just 0 and 1! Any value in the form
– that is, all of the above factors – evaluates to when ,
and , or 1, when . In other words, the value of only controls whether or not we multiply
one of the factors into the accumulator that’ll become our ultimate result (since if , we’ll just end up
multiplying in 1, which means we shouldn’t even bother). Thus, modular_power()
might look something like this:
def modular_power(base, exp, modulus):
result = 1
while exp:
if exp % 2 == 1:
result = result * base
exp >>= 1
base = base ** 2
return result % modulus
But we still haven’t addressed the issue of multiplying huge numbers by huge numbers, and this version of
modular_power()
doesn’t perform much better than (base ** exp) % modulus
(in fact, after some spot checking, it
appears to be much slower!). We can address that by taking advantage of the following property of modular
multiplication:
We can prove it by rewriting and in terms of :
and substituting that into the original expression:
We’re able to remove the entire chunk of the expression that gets multiplied by because it’s by definition divisible by , meaning that, taken , it would equal 0, and wouldn’t contribute anything to the sum. Thus, equals , or .
Using that, we can make the following adjustment to our initial implementation:
def modular_power(base, exp, modulus):
result = 1
base %= modulus
while exp:
if exp % 2 == 1:
result = (result * base) % modulus
exp >>= 1
base = (base ** 2) % modulus
return result
We’re now taking % modulus
in a bunch of places, which is valid due to the above property and prevents the value of
both result
and base
from growing out of control.
That tops off our implementation of RSA. Here’s the entire source file.
I wouldn’t have been able to present most of the proofs in this article without help from the following sources. One of the key motivations for gathering them all in one post is that, as I tried to understand all of the moving parts of RSA, I needed to sift through a lot of material to find accessible and satisfactory explanations:
>Nand2Tetris, or The Elements of Computing Systems, is a twelve-part course in fundamental computer engineering that steps you through the creation of a computer from the ground up, starting with NAND logic gates and ending with an operating system capable of running a complicated program like Tetris.
The course, architected by Noam Nisan and Shimon Schocken, is available as a book that you can download for free (though it appears that some chapters are only available in terse PowerPoint form), and emphasizes a hands-on approach that leads up to some pretty epic struggles and Aha! moments. I just recently finished the course after about two months of hacking on it in my free time – if you reliably spend a couple hours a day on it, though, I can easily see you finishing in two weeks – and wanted to share an overview of the content and some thoughts.
Nand2Tetris consists of twelve lectures/chapters, each of which tackles a next logical step in building a computer called “Hack,” and iterates on all of your work up to that point. Note that the book ships with various supplementary materials (which you can download here), including emulators for various components of the computer, like the hardware, stack, and virtual machine. Here’s an overview the ground you’ll cover:
I’ll briefly summarize the contents of each chapter (partly as a review for myself).
We learn about boolean logic, or logic with boolean values –
conveniently, 0
s and 1
s – that facilitate logical/mathematical operations in hardware. We then construct primitive
logic gates, like AND
, OR
, and MUX
, which operate on single-bit
inputs, and chain those together to implement their multi-bit (in this case, the Hack
word, or two bytes) counterparts, like AND16
.
We cover binary addition and two’s complement, a means of representing signed numbers (in other words, negative and positive values instead of positive values only), and implement adder chips to perform addition at the hardware level. Finally, we devise an ALU (Arithmetic Logic Unit), which implements addition and value comparisons (ie, logic operations), but, unlike industrial-grade hardware, not either of multiplication and division. We’ll implement those operations at the software level – specifically, in the operating system’s math standard library – in the interest of simplicity, but at the expense of speed.
Throughout chapters 1 and 2 we implemented combinational chips using NAND gates, and got
arithmetic/logic out of the way. This section introduces a new fundamental building block: the
DFF, or Data Flip Flop,
which will allow us to construct the second crucial component of our Hack computer – memory. Unlike combinational
chips, which simply intake arguments via input pins and “immediately” spit out a result to output pins and are thus
stateless, the sequential circuits that we’ll implement with flip-flops are capable of maintaining values across
time. Note that, even though we treat the DFF as a fundamental chip, it can be implemented using NAND
gates and more – Nand2Tetris just
thoughtfully spares us that gory implementation. We implement a Bit
, Register
, and multiple RAM
chips with
iteratively larger capacities (64-word RAM consists of 8-word RAM, 512 of 64, etc.), and also a program counter,
which we’ll use to keep track of the next CPU instruction to execute. This sequential business is a little
mind-bending (and quite cool) because it effectively makes use of delayed recursion in a hardware context.
We’re introduced to the Hack machine language, or the format of the
binary strings that our CPU (to be implemented in the next chapter) will interpret as instructions, and its
correspondent assembly language: this is the interface between
hardware and software. Assembly is a human-readable representation of machine code which allows instructions to be
written with mnemonics like ADD
or SUB
; those are then compiled down to the appropriate binary by an assembler
(to be implemented in chapter 6) – essentially a glorified preprocessor. Here’s an example of Hack assembly:
(LOOP)
@END
D;JEQ
@sum
M=M+D
D=D-1
@LOOP
0;JMP
The above code adds all consecutive integers between 0 and some number, storing the sum in a variable sum
.
We implement the Hack CPU, which abstracts away all hardware operations and exposes an API for executing them – that
is, the machine language. The CPU integrates chapters 2 (the ALU
) and 3 (RAM
) in a classic mold of the
von Neumann architecture:
Assembly! Everyone loves assembly! This section extends chapter 4, which documented the Hack assembly language spec., and has you implement the assembler that translates such programs to binary machine instructions.
We learn about virtual machines, or platform-independent runtime environments that allow high-level languages to compile down to a portable intermediate representation, or IR, (in this case, the virtual machine language) that will run on any chip-set with an implementation of that virtual machine. Basically, since different CPUs potentially have different machine languages, writing native compilers for high-level languages would be a nightmare because the output binaries would have to be tweaked on a per-system basis. A virtual machine handles that concern by itself exposing an interface – in the form of a virtual machine language, or IR – for performing memory, logic, and math operations that target systems can reliably be expected to support. Platform-specific compilers that convert the IR to assembly do have to be written, but that problem is now centralized in one place; high-level language developers don’t have to worry about re-inventing the same compilation wheel if they build their language around the same virtual machine, instead leaving that problem to the virtual machine maintainers.
Anyway, the Hack virtual machine wraps its assembly language in a simple, stack-based interface. We implement the IR-to-assembly compiler, which becomes tricky once we involve things like stack frames. Sample code looks like:
function Point.new 0
push constant 2
call Memory.alloc 1
pop pointer 0
push argument 0
pop this 0
push argument 1
pop this 1
push pointer 0
return
We’re introduced to the spec for a high-level, object-oriented language (without garbage collection) not unlike Java,
called Jack. The following Jack code defines a class Point
, which represents a 2D geometric point:
class Point {
field int _x, _y;
constructor Point new(int x, int y){
let _x = x;
let _y = y;
return this;
}
}
We implement a Jack compiler, which converts Jack programs to Hack virtual machine code. We learn about basic compilation techniques – tokenization, recursive-descent parsers – and features – symbol tables, parse trees.
Finally, we implement the Hack operating system (using Jack), which only consists of a number of standard system libraries that govern things like math, memory management, and graphics. The chapter centers heavily on algorithms, introducing some fascinating optimized approaches to problems including multiplication and heap allocation.
That was a pretty wild ride. I heard about The Elements of Computing Systems nearly two years ago and kept it on the back-burner ever since, and am very glad I finally got around to reading it. Nisan and Schocken succeeded tremendously in what they set out to accomplish – creating a course that gives you a universal, if shallow, understanding of the entire hardware and software stack that computers operate on.
The individual sections are clear and concise, with just enough technical and academic background, examples, and project walkthroughs, and benefit from a uniform structure. Each project assignment involves a good deal of steering, as the authors underscore the suggested (though probably always the way you’d want to go anyway) approach to implementing the next stage of the computer, but with nothing in the way of concrete implementations – this encourages the reader to wet their feet and, in true hacker fashion, build the thing on their own. The software package that ships with the course is entirely bug-free, and the emulators are both user-friendly and robust (these things are easy to take for granted…).
An enormous amount of thought was clearly invested in the structure of the course. The various components of the Hack system have perfectly coupled interrelationships, and your work up to any single point almost magically helps you bootstrap the next project with incredible ease – this is mostly true for the hardware sections of the course, where chip creation is a highly iterative process, and lets you create substantially complicated circuits out of nothing in no time.
Another nice bit about Nand2Tetris is that it has much to offer to people at various skill levels. I entered the course having never written a line of assembly, nor did I have much knowledge about compilers and virtual machines, but I did have a reasonable amount of software engineering experience and at least a vague understanding of the aforementioned components: the course ended up perfect, though I suspect that it’s mostly aimed at people in my situation. Still, I can see it being useful even to greybeards with a nuanced knowledge of architectures, compilers, and operating systems, simply because it does such a good job of tying them all together in a single coherent project. I can imagine myself giving it another pass a couple of years from now, taking each of the projects further and refreshing myself on the overview it provides.
Finally, the course is lightweight: the book comes in at just under 300 pages, and that’s with twelve sections that collectively cover all of the vital components of a rudimentary computer. As a result, it doesn’t delve terribly far into any one of them; you won’t implement many elementary chips, the authors intentionally skip over involved problems like hardware multiplication, the computer won’t have a filesystem, you won’t come anywhere near hardware acceleration, networking isn’t covered, and the high-level language you develop is highly limited (both in syntax and functionality). That’s the point. The Elements of Computing Systems tries to provide a general introduction to each component and a coherent project that ties them all together – it’s not the place to go for an immersive foray into any of them. On the upside, it underscore a wealth of questions which you’re then encouraged to explore on your own.
Taking some notes (I did) for future reference might be a good idea while you read.
N2T is, in my opinion, a high quality must-read for software engineers. Can’t recommend it enough.
This course is not for the amateur programmer. While the hardware chapters, the projects for which primarily consist of implementing chips using an HDL, or hardware description language, don’t require any prior experience with anything, the software sections involve the creation of reasonably complicated software in your programming language of choice. A solid grasp of recursion is necessary for parsing, tokenization would probably be hell without a knowledge of regex, and the compilers require some engineering acumen to implement cleanly – plus, it might be nice to have a vague understanding of all the various components of a computer’s hardware and software going into the course, so that it clarifies and refines your understanding of the various moving parts instead of simply introducing a bunch of theretofore unheard-of concepts that, as a result, might be difficult to appreciate. I hope someone proves me wrong, though!
As a complete aside, you’ll work with a number of ad-hoc languages throughout the course: HDL, Hack assembly, Hack virtual machine language, and Jack. I’m a Vim user and got a little tired of the lack of syntax highlighting, so wrote up a minimalist plugin to provide it.
>Prime number spirals are visualizations of the distribution of prime numbers that underscore their frequent occurrences along certain polynomials. They’re conceptually simple, yet create order out of the apparent chaos of primes and are fairly beautiful. We’ll explore the Ulam and Sacks spirals, some of their underlying theory, and algorithms to render each.
The story has it that Stanislaw Ulam, a Polish-American mathematician of thermonuclear fame^{1}, sat in a presentation of a “long and very boring paper” at a 1963 scientific conference. After some time, he began doodling (the hallmark of great genius), first writing out the first few positive integers in a counter-clockwise spiral, and then circling all of the prime numbers. And he noticed something that he’d later formulate as “a strongly nonrandom appearance.” Even on a small scale – say, the first 121 integers, which form a 11x11 grid – it’s visible that many primes align along certain diagonal lines.
Ulam later used MANIAC II, a first-generation computer built for Los Alamos National Laboratory in 1957, to generate images of the first 65,000^{2} integers. The following spiral contains the first 360,000 (600x600):
Look closely, and we see much more than just white noise.
A software engineer named Robert Sacks devised a variant of the Ulam spiral in 1994. Unlike Ulam’s, Sacks’s spiral distributes integers along an Archimedean spiral, or a function of the polar form . Sacks discarded (which just controls the offset of the starting point of the curve from the pole) and used , leaving ; he then plotted the squares of all the natural numbers – – on the intersections of the spiral and the polar axis, and filled in the points between squares along the spiral, drawing them equidistant from one another.
The reason why we see ghostly diagonals is that some polynomials, informally called prime-generating polynomials, have aberrantly high occurrences of prime numbers. , for instance, patented by Leonhard Euler in 1772, is prime for all in the range , yielding . A variant is , proposed by Adrien-Marie Legendre in 1798, which is prime in . Here are several others, as taken at random from Wolfram Mathworld:
In the case of the rectangular Ulam spiral, these polynomials appear as diagonal lines. They were known about since 1772, if not earlier, and a prime-number spiral was hinted at twice before Ulam published his. In 1932 (31 years earlier before Ulam!), Laurence M. Klauber, a herpetologist primarily focused on the study of rattlesnakes, presented a method of using a spiral grid to identify prime-generating polynomials to the Mathematical Association of America. The second frequently-cited mention of prime spirals came from Arthur C. Clarke, a British science-fiction writer, whose The City and the Stars (1956) describes a protagonist, Jeserac, as “[setting] up the matrix of all possible integers, and [starting] his computer stringing the primes across its surface as beads might be arranged at the intersections of a mesh.” In my opinion, the second mention is fairly ambiguous, but the fact stands that, by the time Ulam published his famous spiral, a general understanding of prime-generating polynomials existed and people were considering ways of visualizing them. Thus, it’s perhaps a little disingenuous to suggest that he stumbled across it when “doodling” (something intricate) at random – there may have been some method to it.
I was introduced to prime number spirals about a year ago, by a video on the excellent Numberphile. I immediately jumped into hacking together a Python script to render the spirals on my own, because it’s both tremendously easy and very visually rewarding. I’ll revisit the implementation, this time in Javascript. I’m not going to show all of the necessary code (like HTML markup/CSS styles) in the interest of brevity, but the zipped files are linked to at the end of the post.
Let’s outline our interface. We’ll define functions ulamSpiral(numLayers)
and sacksSpiral(numLayers)
, where
the argument numLayers
is the number of revolutions in the spiral, or effectively the number of rings that it contains. Both
functions need to set the height and width of the canvas according to numLayers
, and require a function
drawPixel(x, y)
to plot pixels. Note that we’ll want drawPixel()
to treat the centroid of the canvas as its
origin, so that drawPixel(0, 0)
plots a point at its center and not the top-left corner. Because both the canvas
dimensions and the offset used by drawPixel()
are dependent on numLayers
, we’ll bundle them them into a function
called setupCanvas()
.
function setupCanvas(numLayers){
"use strict";
var sideLen = numLayers * 2 + 1;
var canvas = document.getElementsByTagName("canvas")[0];
canvas.setAttribute("width", sideLen);
canvas.setAttribute("height", sideLen);
var context = canvas.getContext("2d");
return function drawPixel(x, y){
context.fillRect(x + numLayers, y + numLayers, 1, 1);
};
}
Note that we set sideLen
equal to numLayers * 2 + 1
, rather than only numLayers * 2
, because we need to account for the
row/column containing the origin of the spiral, which is not technically a ring. Now, we can use setupCanvas()
to
both set the canvas dimensions, and return a drawPixel()
that takes advantage of closure to access all of the
variables (numLayers
, context
) that it needs. Also, to draw a single pixel, we’re calling fillRect()
with a
width and height of 1 – the canvas unfortunately doesn’t have (or perhaps just doesn’t expose) a single pixel-plotting
function. Finally, to test the primality of our values, we’ll use Kenan Yildirim’s
primality library, which provides primality(val)
.
The dull stuff aside, we can begin implementing ulamSpiral()
. The general algorithm will run as follows:
x
, y
, and currValue
to track the position and value of the current point – the “head” of the
spiral.x
and y
, while incrementing currValue
.currValue
is prime, plot a pixel at (x
, y
).function ulamSpiral(numLayers){
"use strict";
var drawPixel = setupCanvas(numLayers);
var currValue = 1;
var x = 0;
var y = 0;
function drawLine(dx, dy, len){
for(var pixel = 0; pixel < len; pixel++){
if(primality(currValue++)){
drawPixel(x, y);
}
x += dx;
y += dy;
}
}
for(var layer = 0, len = 0; layer <= numLayers; layer++, len += 2){
drawLine(0, -1, len - 1);
drawLine(-1, 0, len);
drawLine(0, 1, len);
drawLine(1, 0, len + 1);
}
}
We simply iterate numLayers + 1
times, drawing rectangular layers – the spiral – as we go. I couldn’t think of a
better solution than using a function drawLine()
, which accepts a direction (dx
and dy
, one of which should be
0), and a length
to draw four different straight lines (perhaps it can somehow be done in one elegant loop?).
The Sacks spiral is a little more mathematically interesting because it relies (somewhat) on polar equations. Our algorithm:
numLayers
times.function sacksSpiral(numLayers){
"use strict";
var drawPixel = setupCanvas(numLayers);
var currValue = 1;
for(var layer = 1; layer <= numLayers; layer++){
var numPoints = 2 * layer + 1;
var angle = 2 * Math.PI / numPoints;
for(var point = 1; point <= numPoints; point++){
if(primality(currValue++)){
var theta = point * angle;
var radius = layer + point / numPoints;
var x = Math.cos(theta) * radius;
var y = Math.sin(theta) * radius;
drawPixel(Math.floor(x), Math.floor(y));
}
}
}
}
To calculate the polar angle of any point, we first solve for the angle between subsequent points
(var angle = 2 * Math.PI / numPoints;
), and then multiply it by the fraction of the current rotation of the spiral
that the point lies at (var theta = point * angle;
). We’ll also Math.floor()
the coordinates sent to drawPixel()
,
because, after the various trigonometic operations they’re likely decimals rather than integers and cause blurred
canvas reading.
That’s all! For more reading on prime-number spirals, I recommend this in-depth article by Robert Sacks himself, and another write-up of algorithms used to render them.
Download all of the source code here, or view it on Github.
Ulam is also well-known for contributing to the Manhattan Project, proponing the Monte Carlo method of computation, and exploring spaceships propelled by nuclear explosions, amongst a large number of other things. ↩
Assuming that Ulam began rendering his spiral with the integer 1 (instead of something like 41, which is also common), I suspect that the generated images had exactly 65,025 integers. 65,000 integers implies as many pixels, the square root – the Ulam spiral is inherently square – of which is 254.95, which obviously isn’t a valid image height/width. Thus, we round to 255, and square for 65,025. ↩
The power set of a set is the set of all its subsets, or a collection of all the different combinations of items contained in that given set: in this write-up, we’ll briefly explore the math behind power sets, and derive and compare three different algorithms used to generate them.
To refresh our memories: a set, the building block of set theory^{1}, is a collection of any number of unique objects whose order does not matter. A set is expressed using bracket notation, like , and an empty, or null, set is represented using either of and . Because sets are order-agnostic, we can say that the and are equal, and, because they contain only distinct members, something like is invalid.
The subset of a set is any combination (the null set included) of its members, such that it is contained inside the superset; , then, is a subset of , while is not. If a subset contains all of the members of the parent set (ie, it’s a copy), we call it an improper subset – otherwise, it’s proper. Finally, the power set of a set is the collection of all of its subsets, so the power set of is:
The length, or cardinality, of a power set is , where is the cardinality of the original set, so the number of subsets of something like is 8 . Two ways of informally proving that property:
Note: the following algorithms are accompanied by Python implementations. To keep things simple, and because the
algorithms are language-independent, I avoided using Python-specific built-ins (like yield
) and functions (like
list.extend()
) that don’t have clear equivalents in most other languages, even though they would’ve made some code
much cleaner. Also, even though we’re dealing with sets, we’ll use lists (arrays) under the assumption that they
contain distinct elements.
This was my first stab at an algorithm that, given a set, returns its power set, and surprise! It’s the least
intuitive and most inelegant of the three. We begin by writing a recursive function k_subsets()
to find all of a
set’s subsets of cardinality (a.k.a. its -subsets):
def k_subsets(k, set_):
if k == 0:
return [[]]
else:
subsets = []
for ind in xrange(len(set_) - k + 1):
for subset in k_subsets(k - 1, set_[ind + 1:]):
subsets.append(subset + [set_[ind]])
return subsets
With the ability to generate any -subset, the key to creating a power set is finding the -subsets for all valid , which lie in the range (, again, is the cardinality of the superset)!
We’ll introduce a wrapper function, power_set()
, in which we’ll nest a slightly modified k_subsets()
that takes
advantage of closures.
def power_set_1(set_):
def k_subsets(k, start_ind):
if k == 0:
return [[]]
else:
subsets = []
for ind in xrange(start_ind, len(set_) - k + 1):
for subset in k_subsets(k - 1, ind + 1):
subsets.append(subset + [set_[ind]])
return subsets
subsets = []
for k in xrange(len(set_) + 1):
for subset in k_subsets(k, 0):
subsets.append(subset)
return subsets
The second algorithm relies on our second informal proof of sets’ cardinality: whenever an element is added to a set, it must be added to copies of all the subsets in its current power set to form the new one. Thus:
Like so:
def power_set_2(set_):
subsets = [[]]
for element in set_:
for ind in xrange(len(subsets)):
subsets.append(subsets[ind] + [element])
return subsets
The third algorithm is a clever hack, and relies on the binary representation of an incremented number to construct subsets. In our first proof of the cardinality of a power set, we iterated over each element of an argument set and made a choice with two possible outcomes (the element either was or wasn’t a member of the subset): . Let’s consider an integer of -bits: it has possible values in the range , meaning that we can use it to represent distinct arrangements of bits. Hmm…
def is_bit_flipped(num, bit):
return (num >> bit) & 1
def power_set_3(set_):
subsets = []
for subset in xrange(2 ** len(set_)):
new_subset = []
for bit in xrange(len(set_)):
if is_bit_flipped(subset, bit):
new_subset.append(set_[bit])
subsets.append(new_subset)
return subsets
1: (completely tangentially) whenever I mention set theory I can’t help but think of the infamous Principia Mathematica: a staggering, three-volume attempt to axiomatize all of mathematics, published by Bertrand Russell and Alfred North Whitehead in 1910-‘13, that relied heavily on sets. It’s notorious, amongst other things, for proving in no less than 379 pages. Check it out.
>