How to write a compiler generator

February 10th, 2010

http://www.cubiclemuses.com/cm/blog/archives/000419.html/

I read this article recently.  It talks about how to use a partial evaluator to write a compiler generator (a program that transforms an interpreter for a language, into a compiler for that language).

Now, while I don’t have a partial evaluator, I do have a compiler generator, PyPy.  It takes an interpreter written in RPython, a subset of Python, and makes a compiler out of it.  This is fine, except what if I prefer not to use RPython, but my language of choice?  Say I’m working in Scheme.  If I get:

  1. A Scheme -> RPython compiler written in RPython
  2. A Scheme interpreter written in Scheme

Then by running (1a) with the code to (2a), I get a Scheme interpreter written in RPython.  PyPy converts this to a Scheme compiler to any bytecode PyPy supports.

Let’s assume PyPy can compile to RPython, in particular (I do not know that this is true, but if it’s not, it’s useful enough to be worth writing).  Then if we write a Scheme interpreter in RPython, we can use PyPy to get (1) directly.  Start with any version of Scheme, Scheme1, then all I really need is to write a Scheme1 to RPython compiler.   Then, write (2b), a Scheme2 interpreter in Scheme1.  Running (1a) with (2b) gives us a Scheme2 -> anything compiler.  In particular, it gives us (1b), a Scheme2 -> RPython compiler.  And we can use this to take a Scheme3 interpreter written in Scheme2 and get a Scheme3 -> RPython compiler, and so on.  Eventually, we opt to compile to bytecode rather than RPython, and get a Scheme3 -> Bytecode compiler in addition.  So basically, we start with

  1. A Scheme1 interpreter, written in RPython
  2. A Scheme2 interpreter, written in Scheme1
  3. A Scheme3 interpreter, written in Scheme2 (and so on)

And we can get a bytecode compiler for Scheme3, with only PyPy, an RPython -> Bytecode and RPython -> RPython compiler generator.

This is to say the least, fairly neat.  Now, remember that we don’t have to write interpreters the hard way.  So here’s an interesting option–use eval.  Then, we can add just one tiny feature (say, writing a function or macro to be a new keyword), and we can take our two old programs, the two compilers PyPy generated at the last stage, and get a new pair of compilers (one RPython, one bytecode).

Of course, we don’t really care directly about the RPython compiler.  One possibility is to wrap this somehow.  We could easily write a
“compiler generator N” wrapper function that takes as arguments the new SchemeN+1 eval function, as written in SchemeN, and a SchemeN->RPython function, runs things through PyPy, and outputs a SchemeN+1 RPython compiler and a SchemeN+1 bytecode compiler.  Now, just hide the RPython compiler input and output in global variables somewhere in the compiler itself (okay, it’s actually a bit harder than I make it sound), and you have new new built-in, compiler-generator.  It takes as arguments a replacement eval function, and outputs a compiler (with the compiler-generator function updated to use the new global variable, and thus accept eval written in the new language).  Using compiler-generator it’s a few lines to redefine lambda or macro so that they output a new compiler with that function or macro as a keyword!  Happy hacking,

-Za3k

Objects in Scheme

January 29th, 2010

Last night, I hacked up an object system in scheme.  I’ve been meaning to learn one, and but somehow I ended up writing one instead.  It is currently rather Pythonesque, but can be subtly modified to give a number of object system I find interesting.  It is most suitable for flexible duck typing, in my opinion.

The starting object system is very basic - there is no inheritance.  You can dynamically assign attributes and methods.  If you assign a method, it binds the first argument of the method to the object–denoted “self” in Python, or “this” in C++ based languages.  You can freely reassign both, but there’s currently no support for deleting.

So, it’s basically a dynamic version of ’struct’.  We could have a version that doesn’t allow modifying methods (only adding them).  This would mean any object that “inherited” from an earlier version would be guaranteed to have not only the same methods as is true right now, but also the same versions of those methods.  This is a somewhat interesting type system–it’s monotonic in a sense.  If we disallow even modifying methods, we ‘freeze’ the class and get a static interface.

If we disallow adding new attributes or adding or modifying methods, we get something like a static struct.  If we disallow any changes, we freeze the struct into a single, nonmodifiable object.

By adding a “derive” operation (perhaps used by calling the object itself, with no arguments), we can add prototyping.  An object looks for an attribute in its current namespace.  If it’s not there, it looks in the parent object.  If we start allowing methods to be deleted, we can decide that a child object no longer implements the same interface.  This is not inheritence any more, but could allow more efficient implementation for the programmer (a la “private” inheritance in C++ or the “not defined” I’ve heard the Gang of Four reference in Smalltalk).

The biggest thing I’d like to add personally is this derivation I mention, and deletion of at least attributes.

Objects with optional mutation in Scheme
  1. ;An object has one special method, "call-method", which is called when the object is invoked as a function.
  2. ;All methods including call-method may or may not mutate the object.
  3. ;Attributes are accessed like (dot obj attribute-name).
  4. ;Methods are accessed like (dot obj method-name) — this returns the (bound) method
  5. ;Attributes and methods are modified or set with dota and dotm, which return a modified object without changing the original object.
  6. ;Attributes and methods are dota! and dotm! do the same thing, except that they modify the original object and return it.
  7.  
  8. (define (remassq obj list) (remq (assq obj list) list))
  9.  
  10. (define (make-object)
  11.   (letrec ([make-object-raw-helper (lambda (self attributes methods)
  12.                                      (lambda function-and-args
  13.                                        (if (null? function-and-args)
  14.                                            (begin (error "Object cannot be called as a function") #f)
  15.                                            (let ([function (car function-and-args)]
  16.                                                  [args (cdr function-and-args)])
  17.                                              (cond [(and (eq? function 'dot) (= (length args) 1))
  18.                                                     (let ([attr (assq (car args) attributes)])
  19.                                                       (if (list? attr)
  20.                                                           (second attr)
  21.                                                           (let ([meth (assq (car args) methods)])
  22.                                                             (if (list? meth)
  23.                                                                 (let ([unbound-method (second meth)])
  24.                                                                   (lambda (parameters) (unbound-method self parameters)))
  25.                                                                 (begin (error "Invalid attribute.") #f)))))]
  26.                                                    [(and (eq? function 'dota) (= (length args) 2))
  27.                                                     (make-object-raw (cons args (remassq (car args) attributes)) (remassq (car args) methods))]
  28.                                                    [(and (eq? function 'dotm) (= (length args) 2))
  29.                                                     (make-object-raw (remassq (car args) attributes) (cons args (remassq (car args) methods)))]
  30.                                                    [(and (eq? function 'dota!) (= (length args) 2))
  31.                                                     (set! attributes (cons args (remassq (car args) attributes)))
  32.                                                     self]
  33.                                                    [(and (eq? function 'dotm!) (= (length args) 2))
  34.                                                     (set! methods (cons args (remassq (car args) methods)))
  35.                                                     self]
  36.                                                    [(eq? function 'initialize-self!) (set! self (first args)) self]
  37.                                                    [else (error "Object cannot be called as a function") #f])))))]
  38.            [make-object-raw (lambda (attributes methods) (let ((obj (make-object-raw-helper 'dummy attributes methods)))
  39.                                                            (obj 'initialize-self! obj)))])
  40.     (make-object-raw '() '())))
  41.  
  42. (define (dot obj attribute-or-bound-method) (obj 'dot attribute-or-bound-method))
  43. (define (dota! obj attribute-name attribute-value) (obj 'dota! attribute-name attribute-value))
  44. (define (dotm! obj unbound-method-name unbound-method-value) (obj 'dotm! unbound-method-name unbound-method-value))
  45. (define (dota obj attribute-name attribute-value) (obj 'dota attribute-name attribute-value))
  46. (define (dotm obj unbound-method-name unbound-method-value) (obj 'dotm unbound-method-name unbound-method-value))

There are two downsides of the system that I don’t see a way around.  First, you can’t use the object like a function.  To allow this, we’d have to disallow passing certain symbols as the first argument, which is extremely clunky (see code below to understand why).  Second, I’d much prefer the operator to be . rather than “dot”, but that’s a reserved word in Scheme.  It’s not used for much, so that’s rather annoying.  .a and .m would are allowable, but for consistency they’re dota and dotm here.  If anyone knows a way to allow calling the object like a function, please let me know.

Portable, Virtual Me (Part 1)

September 12th, 2009

I recently purchased a flash drive.  Surprisingly, I still have it; probably won’t last long.  I wanted to put all my data on it and be portable.  Unfortunately, this wouldn’t let me keep my installed programs or settings, and so forth.  Ideally, I’d like to keep everything down to the applications I have running.  Virtual computing lets me do that.  Specifically, I’d like to

  • Have one copy of my data, for space and synchronization.  This should be accessible quickly in one format by all operating systems; no networking, and I picked FAT32.
  • Have several operating systems I can copy like files, and which are machine-independent.  Running an OS in a virtual machine means that even when I switch computers, it seems the same hardware.  This is nice for Linux and vital for Windows, which doesn’t like switching computers.
  • Be able to version my data and optionally my operating system.
  • Ideally, be able to run more than one OS at a time.  I’m lazy, and prefer to do things the easiest way possible.  Also, running stuff on two computers could again cause sync issues.
  • Be able to work offline.  This means I need to carry it, not have it on a server somewhere.  Of course, I can still use a remote server if I need computing power (I’ve been itching for a netbook); it’s just an option.

For those of you who may not know, a virtual computer is a program that acts like a generic computer emulator.  It runs virtual hardware, on which you can install an operating system, insert CDs, etc.  Setups can be a little more flexible this way, of course.  To add a hard drive, you can just tell the program to use a specific file.  A friend recommended I try VirtualBox, which can do almost everything I need.  It’s put out by Sun and will run on most major operating systems.  (Unfortunately, it’s doesn’t have full support for FreeBSD and some other linux Distros). There’s a closed- and open-source version; the closed-source version adds a “remote desktop” type feature and USB support.  Since I want to run this on a USB drive, I need to use the closed-source edition until open-source catches up.  This is not big problem for me since I don’t compile from source and don’t have any hangups about whether something is open-source or not.

This means I need installs for virtualbox on the USB drive (sadly, it does need to install on the OS, since to emulate hardware at a reasonable speed (near real-time) requires kernel-level methods.  Also, I’d like to be able to run something off *only* the USB drive, so I want to add a bootable OS that can run VirtualBox and a windowing system for display.  I don’t need much else, and since space is at a premium, I installed a minimal version of Ubuntu (FreeBSD doesn’t support some advanced features).

I have a Ubuntu virtual machine as well now.  After I pick up probably Windows XP and some dialect of DOS (maybe FreeBSD or Win 98 too), that’s about everything I need.  Right now, XP and Ubuntu would match my current setup, so that’s my goal.  Unforunately, the virtual machine files, installs, etc. need to go on a FAT32 partition of the drive I can’t get XP to see right now.  I wonder if it’s formatted?

Last, I need to move a data directory onto the drive.  It’ll be shared between the operating systems to keep everything in sync, and so I can automatically version the machines seperately from my files.  Also, on one machine I’ll need to setup some kind of backup mechanism.  I’m thinkin rsync or more likely rdiff-backup.

So my checklist is:

  1. Select and install a virtual computing program.  Done: VirtualBox is installed on Vista (my laptop) and Ubuntu (the flash drive).
  2. Create a bootable flash drive with a data area.  Bootable done.  Data might or might not have worked, though, so we’ll see.
  3. Copy data and virtualbox installs to drive.
  4. Create virtual machines (Ubuntu, Windows XP, DOS 6 / FreeDOS, FreeBSD, Win98 in order of priority).  Make sure supports add-ons (cool drivers which interact with the “guest” operating system, allowing things like shared mouse, shared windowing system, shared clipboard, etc.).  I’m using these right now before I make the other virtual machines, but I’ll have to add them to everything and get stuff working more smoothly.
  5. Allow virtual machines to access data folder.  Since I didn’t seperate the partitions, this means virtual machines could modify their own files and virtualbox.  However, since the other steps involve partitioning a flash drive and so forth, this seemsless scary in comparision.
  6. Add backups.

(Not Currently) At Work

August 31st, 2009

For the past three months or so I’ve been programming 99% of the time in C++ (I had a recent excursion to perl to autogenerate some code).

I now know much, much more than I ever expected or wanted to know about C++.  To give you a sample, here are the books I bought for work:

Also, I plan to learn Ruby, but I’m having trouble finding a good tutorial (that is, a tutorial for non-beginners).  Please comment with any recommendations.

Learning Java and a Scheme program

April 16th, 2009

In answer to my last post I encountered this interesting snippit:

Can you figure out what it does? Why it does it?
  1. (let* ((yin ((lambda (foo) (newline) foo)
  2.              (call/cc (lambda (bar) bar))))
  3.        (yang ((lambda (foo) (write-char #\*) foo)
  4.               (call/cc (lambda (bar) bar)))))
  5.  (yin yang))

In other news, due to this that and the other, I have to learn Java.  So far, I’ve not learnt much.  I’ve found out two interesting things.  First, threads are remarkably easy to make.  Now, I’ve never done threading before, so I’m wondering how easy it is to do in other languages I’ve used.

Second, Java code is far too long.  I’ve always been annoyed with “code generation” features for just this reason.  If you need to generate code, you’re doing something wrong, IMO.  Code completion, (as in function parameters and variable names) is fine since that’s for human legibility and saves typing.  But generating whole skeletons?  Puh-lease.

Kinda turned me off C# before I took a good look at it (one of these days, I tell you).

Reading code

April 8th, 2009

I’ve recently resolved to start reading other peoples’ source code.  However, I’ve noticed that everything I have sitting around falls into four categories.

  1. Stuff I wrote (not much)
  2. Stuff too large to read (i.e. Mozilla, Pidgin, OpenOffice  and most other things I use)
  3. Something closed-source (i.e. mu torrent, K media player, Abyss webserver)
  4. Something I hate (not much I haven’t deleted.  I’m tidy like that.)

So, if anyone has some not-too-big code they recommend I’d love suggestions.  Stuff you wrote is cool.  Stuff you didn’t write is cool.  Open source things you contributed a bit to is cool.  Stuff in C is cool.  Stuff in LISP is cool.  Stuff that’s elegant is cool.  Stuff that’s buggy and kludgy is cool.  Whatever.  As long as I can read it in a few days.

Boy’s first library

January 10th, 2009

I’m currently working on a discrete, finite probability library.  It’s progressing fairly well.  The only downside to the library, as I’m beginning to realize, is that most probabilities, even discrete ones, are not in a finite number of categories.

However, here’s some examples of what the library can do.

  • All calculations are exact.
  • You can model dice.  I wrote a dice function that rolls 2d6, 3d6, 1d4, 27d27, etc.  I also modeled a few systems I like (for instance Dungeons and Dragons’ 4-die ability roll).
  • You can transform a function.  For instance, you can take a hero’s ability score in D&D, subtract 10, divide by 2, and round down to get the distribution of modifiers.
  • You can combine distributions using a function.  For instance, given 2d6 and 3d4, you can find the distribution of 2d6+3d6.  More interestingly, you can find the distribution of those two combined using any function whatsoever (for instance, multiply and take the square root, to get the geometric mean).  If the function returns a distribution itself, this can let you do some tricky things like have distributions of distributions (which the library of course has a ‘compose’ function for if you didn’t want that).
  • You can make a uniform distribution out of an arbitrary set of things, which is often a helpful starting point.  You can give the library a list of things and their probabilities, but since that’s how the library stores distributions, that’s a little boring for me.
  • If the distribution is all numbers, you can get the mean, median, quartiles, a cumulative version of the distribution.
  • You can make a random generator out of a distribution (for instance, given the d6 distribution, make a function that returns a random number 1-6 whenever called).
  • You can plot numerical distributions (approximated by a polynomial, too.) and their cumulative versions along with boxplot stats.  This doesn’t return anything, it just dumps a bunch of pretty graphs and statistics.

This is my to-do list for the library:

  • Standard distributions like the binomial distribution
  • Standard deviation and variance
  • Fix - the ‘window’ of the graph is determined by the points in the distribution.  It should also take into account the approximating polynomial, which can go off screen at present.
  • Fix - the random number generators made my polynomials are almost certainly subject to rounding errors at present.  Change that.
  • Add permutations, and the standard distributions of permutations
  • The ultimate goal - a scheme macro, dist-random, which would do something like the following:

(dist (+ (dist-random 1 2 3) (dist-random 1 2 3)))

Which would return the distribution 2d3.  This would allow you to take a normal program that uses random generation, replace the random generators with dist-random, and wrap the whole thing in dist, and get back the distribution of outputs that program *would* have given is dist-random were actually a random number generator.  For instance, you could take a poker program that draws a hand of five cards and tells you what you got, and use this library to wrap it such that you just got a distribution of the hands in poker.  Remember too, that these are all exact probabilities.

If you have comments or suggestions, please contact me.  I’d especially like to hear anything you think I should add to the library.

Blog maintenance, Untitled python roguelike nears completion

January 4th, 2009

After having deleted about 3000 comments over a few weeks, none from humans, I decided to add a CAPTCHA system to the blog.  For those unfamiliar with the name, these are systems wherein the system presents a word or two to the user as a distorted graphic, and the user types the word correctly.  They are designed to be solvable by users and not computers, but have the downside of being irritating to the user by slowing everything down, and sometimes so distorted that humans can’t easily solve them either.  For this blog, I decided on a system called reCAPTCHA, which uses actual scanned text from a central repository.  It has the neat benefit of simultaneously recognizing scanned text that scanners can’t read, i.e. for digitizing books, while providing insurance against spam.

On a happier note, the mysterious python project I mentioned earlier is close enough to finished that I’ll elaborate more now.  I have been working on a roguelike game.  Since it’s an unpopular genre, many readers may not have heard of it before.  In a roguelike, the player moves around an ascii character inside a tile-based map, fighting monsters and picking up treasure.  They’re essentially like old-style block-based RPG games set in a dungeon, except the whole thing’s overhead and in ASCII instead of fancy graphics.  For instance, a ‘$’ might represent a pile of gold, or an ‘r’ the ever-feared rat.

In my game, the player must delve into a dungeon, collect as much loot as possible, and leave–a typical goal.  It includes line-of-sight, twenty-something identical monsters of increasing power, a random dungeon generator, about eight unique items, a message system, color (provided with python’s curses), and three original game devices.  The only aspects of the game uncompleted are the highscore table, and game balancing issues.  (Right now it works as a program, but it’s not much fun to play.)  Documentation might be added at some point.  If you intend to play the game, I advise you skip the next paragraph where I spoil the hell out of it.  Or read it, depending on your play style.

The first element is a branching dungeon.  As the player descends, she travels through an inverted “tree” about six levels deep.  This isn’t completely random, to make other aspects of the game consistent between plays.  Each level of the dungeon has a random setup of nine rooms connected by corridors, all in a grid, as per the classic ‘Rogue’ which started the genre.  There are a guaranteed nine monsters and nine items per level, for consistency between sessions.  The second element is lava.  As the player explores the dungeon, she may discover that it is in fact a volcano about to erupt–and she had better get out of the way before it does!  This provides a built-in time limit, an interesting forced aspect to exploration, and the opportunity for me to have fun making the lava flow up and down branches in a fluidlike manner.  Lava flow rate, relative to the speed of the player, is one of the balance issues I am working on now.  Finally, the goals of the game are rather like a checklist.  First, getting out of the dungeon alive is a plus.  Second, getting more points by taking items and killing monsters than the lava consumes provides an interesting ultimate goal.  Third, there are special items in the dungeon the player can try to get before she leaves.  There is gem on each of the six levels of the dungeon.  Getting all six gives the player a reward.  Also, there is a magic scroll in a special level somewhere in the volcano.  This is especially difficult to bring to the surface.

In all, I tried to make a simple, but fun roguelike to play in half an hour to an hour.  I wanted it to be original enough to be interesting, but simple enough that I could write it and fairly casual players could discover everything about it.  Finally, I wanted to write a game I myself would find fun to play, which I believe I have.

Euler problem 25

November 21st, 2008
  1. ;What is the first term in the
  2. ;Fibonacci sequence to contain 1000 digits?
  3. (define (square x) (* x x))
  4.  
  5. (define (pow man exp)
  6.   (cond
  7.     ((= exp 0) 1)
  8.     ((< exp 0) (/ 1 (pow exp (- man))))
  9.     ((even? exp) (square (pow man (/ exp 2))))
  10.     ((odd? exp) (* man (square (pow man (/ (- exp 1) 2)))))
  11.     (else #f)))
  12.  
  13. (define (prob25 digits)
  14.   (let ((big-const (pow 10 (- digits 1))))
  15.     (let fib ((first 1) (second 1) (count 1))
  16.       (if (>= first big-const)
  17.           count
  18.           (begin
  19.             ;(display first) (display " ") (display count) (newline)
  20.             (fib second (+ first second) (+ count 1))
  21.             )))
  22.     )
  23.   )
  24.  
  25. (prob25 1000)

Coding C

November 13th, 2008

I’ve been taking a break from my other projects to learn C. Since I already know C++ fairly well, this is most fun when I write hideous C involving pointer arithmetic. Right now I’m writing an interpreter (or more accurately I’m starting by writing a tokenizer, which isn’t working yet and which I may not choose to go past.

More importantly, I’m doing the whole thing over ssh on a BSD box, thus learning *nix at the same time. Since I used about three commands, I’m not learning *much* but there you go.