Implementing Metadata in Cloje

This week I’ve been doing planning for Cloje, and trying to figure out how I might approach implementing certain Clojure features. Even if I’m not going to write the code for several weeks, I want to avoid painting myself into a corner.

One such feature is support for metadata. I see three potential approaches one might take when implementing metadata:

  • Wrap the object with a metadata container
  • Inject metadata into the object
  • Store the metadata separately from the object

Each of these solutions is pretty hairy, and there are significant trade-offs involved. So, I wrote this blog post to explore these approaches and hopefully help me decide which to use.

Wrap the object with a metadata container

My first inclination was to wrap objects in a metadata container. The container would be a structure with two slots: one slot for the object being wrapped, and another slot for a hash map containing the metadata.

The advantages of this approach are:

  • The container structure is simple and portable to implement.
  • Any type of object can have metadata. You don’t have to design new types just to add a slot for metadata.
  • The metadata “travels with” the object, so you don’t have to worry much about the metadata lingering after the object has been garbage collected.

The disadvantage of this approach is that the object must be extracted from the container to use it. That has some nasty implications:

  • Cloje’s code, and perhaps some Cloje users’ code, would be littered with calls to strip-meta (or whatever the extraction function is called).
  • You could not simply “call” a function that has metadata, you would have to extract and then call it. So instead of (foo bar) you would have to write ((strip-meta foo) bar) or (invoke foo bar). Every time.

Until recently, I thought that perhaps a clever reader macro could solve two problems at once: calling functions with metadata, and IFn (Clojure’s ability to “call” certain non-function types as if they were functions). The reader macro would change how s-expressions are read, so that a function call like (foo bar) would be read as (invoke foo bar), and invoke would be a function that would (if necessarry) extract foo's object from the metadata container, and “call” it.

Besides the fact that “change how s-expressions are read” is full of nasty pitfalls, this plan would break macros. All macros. And special forms. You could no longer write (if x y z) because it would be read as (invoke if x y z), and the implementation would complain that the variable if is unbound.

To work around that, Cloje would need to know, when it sees code like (foo bar), whether foo is a macro or not. Either the reader macro would need to handle macro calls specially (by not putting invoke in front of them), or invoke would need to be a macro so that (invoke foo bar) could expand to (foo bar) if foo is a macro.

Either way, that would entail some sort of macro registry, to record which symbols are macros/special forms, and which symbols are not. Scheme implementations must have their own internal macro registries, but the question is whether Cloje would be able to query that registry. As far as I know, there is no portable way to do that in standard Scheme, so either Cloje would have to use implementation-specific ways (thus making Cloje less portable), or Cloje would need its own macro registry, which would probably have to be implemented in implementation-specific ways (also making Cloje less portable).

And, that still leaves the problem that Cloje’s code, and some Cloje users’ code, would be littered with calls to strip-meta, anywhere you allow a value that could possibly have metadata.

Inject metadata into the object

Another approach would be to inject the metadata right into the object. This would require that the object has a slot to support such metadata.

I’m guessing that this is the approach that Clojure uses, based on the fact that only certain types of objects support metadata: symbols, collections, and functions. The error message if you try to add metadata to an unsupported type is “Metadata can only be applied to IMetas”, so apparently there is an IMeta interface in Clojure’s implementation, although it doesn’t seem to be publicly documented.

Frankly, building metadata support into types is the best choice, if you’re implementing a whole new language like Clojure. Cloje’s situation is different. I’m trying to build an API on top of existing languages, and those existing languages already have many of the types I need, and I’d like to retain compatibility with those native types if possible.

I happen to know that CHICKEN has some support for something like metadata, on certain types of objects. You can use extend-procedure to attach an extra data slot to a function, although it only supports one slot, so Cloje metadata would clobber any other uses of that slot. Also on CHICKEN, symbols can have property lists, so you could have a “cloje-meta” property on symbols. But because symbols are interned in Scheme but not interned in Clojure, this would result in a semantic discrepancy. And it wouldn’t be any help with other kinds of objects. Plus, these are not necessarily portable to other host languages.

I could take the Clojure route, and create new data types with a slot for metadata built in. I’ll be creating new data types anyway, to add hash-sets, and possibly persistent immutable data structures in the future.

But in the cases of symbols, vectors, hash maps, and functions, which already exist in the host language, there doesn’t seem to be any advantage to creating multiple new types, rather than creating a single general-purpose metadata container structure described in the first approach. You still have to extract the base value before using it, so the code is still littered with extraction functions calls. Except now you have N different extraction functions instead of 1!

Store the metadata separately from the object

The final approach is to store the metadata separately from the object, for example in a central registry.

The advantages of this are:

  • Any type of object can have metadata. You don’t have to build in a metadata slot.
  • You don’t have to perform any extraction or conversion when using the object normally.
  • If you never use metadata, you pay no cost.

But the disadvantages are:

  • You can’t have separate metadata on two symbols with the same name. (I call this a “disadvantage” because it differs from Clojure.)
  • You have to be very careful in how you implement the central registry, or you’ll get memory leaks or other badness.

The second issue is the more significant one.

For example, you might think of using a hash table as the registry, with the object being the key and the metadata being the value. But if it’s a strongly-referenced hash table (like Clojure’s maps), the object would never be garbage collected — even if your code isn’t using the object anymore, it’s still being referenced by the registry.

There is such a thing as weakly-referenced hash table, which don’t prevent the object from being garbage collected. Racket supports hash tables with weak keys. CHICKEN doesn’t, but it does support weak “locatives”, so I think it would be feasible to roll your own weak data structure.

But there’s another possible memory leak: if the original object gets garbage collected, but the metadata lingers in the registry. So you would need some way to clear out old metadata. Ideally you would want a hook into the garbage collector, so you can perform some cleanup code when necessary. But if you were really desperate, you could loop through your weak hash table (or whatever structure) and remove any metadata for weak references that are broken.

I am not a data structures expert, so there may be some structure that would be a better solution to this problem than a weak hash table. If you know about such things, please add a comment and let me know.

Final Thoughts

So, there are the three potential approaches I thought of to implement metadata in Cloje. All of them have pretty significant trade-offs, so it’s not clear which approach to use.

Of course, there’s no rule that says I have to choose just one approach and use it for everything. For example, I could use a metadata registry for defn'd functions, but use metadata containers for everything else. That way, you don’t have to use invoke for every function call. Plus, defn'd functions are very unlikely to be garbage collected, so it’s less vital that the registry promptly and efficiently dispose of lingering metadata.

I do need to do some more analysis to see how often / in what situations Cloje users would need to use strip-meta to access raw values. Ideally, Cloje would do that all behind the scenes.

Of course, if all else fails, there’s always a fourth approach to supporting metadata in Cloje: don’t!

Announcing Cloje

Cloje icon Over the past couple months, I’ve been tinkering with a new side project: Cloje, a clone of Clojure built atop Scheme/Lisp.

Cloje is partly a fun learning exercise, but it may also be of practical interest to any Schemers/Lispers out there who have wished some of the cool features from Clojure were available in their language of choice.

Cloje currently targets CHICKEN and Racket, two popular implementations of Scheme. One of the chief goals of Cloje is portability. Most of the code is written using standard R5RS and SRFI features, so it would not be very difficult to add support for more Scheme implementations. I will be happy to add Common Lisp as a target, if a Lisper volunteers to help write and maintain it.

Today I tagged version 0.1 of Cloje. This is more of a “milestone” than it is a “release”. If you want to play around with it, you can clone the repository. Installation and usage instructions are available in the README. Future releases of Cloje will be available on each host language’s package manager for easy installation.

Version 0.1 of Cloje includes about 160 functions and macros. Most are reimplementations of basic Clojure features, plus some Java-look-alike functions, and a few additions to fill in some gaps. [Update (May 4): I should mention that Cloje was written from scratch by myself, guided only by Clojure’s public documentation and observable run-time behavior. None of Clojure’s source code was copied, studied, or consulted at any point in the creation of Cloje.]

Tentative plans for Cloje 0.2 include:

  • Syntax for maps, vectors, sets, regexps, and more (optional, pick-and-choose which to enable)
  • Hash-set data structures
  • Lazy sequences
  • Implementing the last few clojure.string functions
  • Various higher-order functions, map-related functions, and memoization
  • Packaging scripts, so Cloje can be made available on each host language’s package manager

Also in Cloje 0.2, I will be changing the primary goal of Cloje. I initially intended to faithfully emulate Clojure’s behavior, with the goal that existing pure Clojure code would run on Cloje. The experience of developing Cloje 0.1 has convinced me that this is not a practical or desirable goal. I’m writing a post with more about this, but for now you can read about the change of goal in Cloje’s goals document. [Update (May 9): The aforementioned post has been published.]

That’s all for now. If you play around with Cloje, let me know what you think!

Atomic Spin Roundup

In mid 2012, I moved to downtown Detroit and joined Atomic Object as a full-time software developer. Ultimately, it wasn’t the right fit for me, and I recently decided to go back to freelancing. Nevertheless, it was a positive experience, and I’ve gained new friends, knowledge, skills, and experience.

Over the course of the two years I was at Atomic, I wrote a wealth of articles for the company blog, Atomic Spin, on a variety of topics related to software development, ranging from specific technologies, to best practices, to community groups and events. Each article was crafted with care and attention. Enjoy!

On Programming Languages and Technologies

On Development and Management Practices

On the Community, Conferences, and Groups

A Personal Lisp Crisis

There is a half-jocular saying among atheists, that the surest way for a Christian to lose their faith is to read the Bible, cover to cover. The idea is that if you really scrutinize it, and don’t gloss over the strange, antiquated, or inconsistent parts, it stops seeming so profound and special. I don’t know if that’s really true of Christianity, but I have recently had a similar experience that shook my belief that Common Lisp was a profound and special Lisp dialect.

Continue reading A Personal Lisp Crisis

Ambienome Brain Dump

Lately, I’ve been focusing so much on Common Lisp itself, that I have barely spent any time on Ambienome, the whole reason I’ve been learning it. It has been so long that I can barely remember what I was working on. (Physics engine integration, I think?)

That’s no good, so I wanted to take some time to set my thoughts and ideas for Ambienome onto paper (so to speak). This will help me slide back into that mode of thinking, hone my ideas by putting them into words, leave a journal of how the game evolves, and maybe spark some interest in anyone reading this.

Caveat: you should not interpret any part of this as a promise or firm description of how the actual game will be. I’m just unloading my thoughts and ideas as they exist now. Everything is still subject to change, etc. Continue reading Ambienome Brain Dump

A Rubyist’s Impressions of Common Lisp

It has been nearly 6 months since I dove into Common Lisp. I have been studying and/or using Common Lisp almost every day, working on Ambienome and related code. So, I wanted to take some time to document my observations, feelings, and impressions of Common Lisp at this time. Be advised that this is a fairly long post, even though I have trimmed out a lot of details and examples to keep it from growing even longer. (Apparently I have a lot of impressions!)

I have approached CL from nearly 8 years of programming in Ruby. I mention this because my experience with Ruby has certainly colored my impressions of CL, and many of my observations are about how CL compares to Ruby. Continue reading A Rubyist’s Impressions of Common Lisp

Physics Engine and Object Selection

This is the second part of my post about what I’ve been working on lately. In the first part, I talked about data structures and serialization. In this post, I talk about the physics engine and object selection via mouse picking.

For Ambienome’s physics engine, I’m using SquirL, a Common Lisp port of the popular Chipmunk 2D game physics engine. I’m not far enough along yet to know how well it performs in my own game, but from running the demos, it certainly seems performant enough for my purposes. If it turns out to be somewhat sluggish, many Common Lisp implementations can perform some pretty aggressive optimization, given the right type declarations. As a last resort, I could create an FFI binding to the actual Chipmunk library, although that might not be any faster than a well-optimized Common Lisp port. One benefit of a binding to Chipmunk would be using its new features that have not been ported to SquirL (which hasn’t been updated in 2 years).

Continue reading Physics Engine and Object Selection

Data Structures and Serialization

Lately, I’ve been tackling several separate, but interconnected, systems in Ambienome: data structures, serialization, object selection, and the physics engine. These are complex topics, so I’ve written this in two parts. In this first part, I talk about data structures and serialization, and in the second part I talk about the physics engine and object selection.

Data structure simply means the way some data is structured or organized, either in memory while the program is running, or when the data is saved to a file or sent over a network. Is it stored in a hash table? An array? An instance of a class? What slots/members does the class have? How are different pieces of data related to each other? And so on.

Serialization is a topic very closely related to data structure. Serialization involves converting from one data structure to a simpler data structure, usually some human-redable text or a raw binary sequence with some specific order or pattern. The reverse process, deserialization, involves converting the simpler data structure back into the original structure (or a similar one). Serialization is most often used when saving data to a file or sending it over a network.

Personally, I consider binary serialization to be a measure of last resort, to be used only when performance is absolutely critical. Binary formats are notoriously fragile and difficult to extend, especially if they are poorly designed.

Continue reading Data Structures and Serialization

Complicated Code and Creative Blocks

The past several weeks have been a struggle, productivity-wise.

First, I spent quite a lot of time working on proto-slots. That may seem productive, but the amount of effort I put into polishing and documenting it was way out of proportion to the benefit I would get from it. I think I did a pretty good job on that project, but I also recognize now that I was using it as a way to avoid actually working on Ambienome.

It was about three weeks ago that I realized what was happening, so I tried to force myself to focus on Ambienome. That didn’t really work, and I just ended up with a serious creative block. I spent about two weeks making almost no progress. My code had been accruing unnecessary complexity, mostly due to exploring a lot of unfamiliar territory (Lisp, and OpenGL), which made it difficult to work on when my motivation was not very high. Each time I would try to focus on Ambienome, I’d run into some obstacle, grimace, and reflexively go find some way to distract myself. (In hindsight, it was probably not wise to start a challenging new project at the beginning of winter, since my energy levels and motivation always dip during the winter.)

(Those two weeks weren’t a complete loss, though; I did beat a lot of games! :P If you’re curious: Dungeons of Dredmor, The Binding of Isaac, Limbo, Glowfish, Blocks That Matter, and a couple others that I can’t recall at the moment.)

Continue reading Complicated Code and Creative Blocks