eSteve's Blog

A blog about code, computers, and stuff.

Compilers Should Be Slower

Sometimes I think compilers (and interpreters, I suppose) should be slower. This would force us to slow down and reason about the code we write.

In Thinking, Fast and Slow, Daniel Kahneman breaks down thinking into 2 modes. The slow mode is a deliberate and conscious mode; it’s the kind of thinking you do when you’re in control and are accutely aware of the choices you’re making. The fast mode, is a “reactive” mode driven by our nature and our insticts; it’s the kind of thinking you do when you need to quickly asses a situation and react to it immediately.

Slow thinking, then, is the kind of thinking you do when deciding what vehicle to purchase. Fast thinking, is the kind of thinking you do when you wake up at night and need to go to the bathroom.

And here, again, is the reason why I half-wish compilers were slower. Writing code is not the kind of task you can do in fast thinking mode. Even with years of prior experience writing software, you can’t code thinking fast.

Writing software is a tough beast: it takes a lot of research, discovery, and prototyping. Clearly, writing code is slow thinking.

It’s easy, however, to feel like systems you’ve developed before are similar to the software you’re currently writing. It’s tempting to think you can just go with prior knowledge, make some assumptions, do things just like you’ve done them before. It’s easy to think you can use some heuristics and think fast.

But you can’t. There are too many details and too many dependencies that you need track. There are too many unknowns and too many side effects that you need to consider. In other words, you’ll never achieve quality software by thinking fast.

So, just take a minute and slow down. Think. Design. Prototype. Test. Then throw away all the code you’ve written and start all over again. It’s surprising how much even just a little design helps.

Even if all you do is grab some paper and draw a couple of diagrams; even if all you do is just talk to a few people about some of your ideas, you’ll be better off.

We need to reason about the code we write; we need to think slowly and carefully about the changes we make.

And if compilers were slower, we might just be forced to think slow. :)

The Right Way to Do REST Updates

There’s lots of good advice on the internet about how to design good REST APIs. The folks at apigee have tons of great articles and really know their stuff, so if you are looking for some general advice on REST, you should start there.

Most of the guidance you’ll find on the internet regarding REST, however, focuses on GETs (reads) and POSTs (inserts). Today, therefore, I want to offer advice on how to do updates.

Gimme The Skinny

Let the consumers of your API update resources by appylying deltas instead of forcing them to replace the entire resource.

I know this isn’t strictly related to REST APIs; it’s more of a data API recommendation, but partial updates really do work better than full resource updates.

Why? Well, there are several reasons of course, but most of them revolve around:

  • Partial updates ease update concurrency problems.

  • Partial updates let you more accurately express the changes you want to make and this simplifies your code.

One final recommendation regarding REST: use the HTTP PATCH method instead of PUT to do partial updates. PATCH was specifically introduced in March 2010 to allow partial resource modification.

Gimme More Detail

OK, you’ve decided to keep reading (thanks!). So here’s the deal: Full resource updates (replacements) have the following problem:

Pretty ugly, huh?

There’s a good chance the user wasn’t even trying to update the “conflicting” fields. And if he was updating the conflicting fields, does the knowledge that the data changed matter to him? Why are we showing him an error? What is he supposed to do about it? Really, there’s not a lot of good solutions here.

And then there’s even more problems. As soon as your user can make changes you’re going to get requirements like the following:

  1. When the customer address is updated, notify accounting.
  2. Make sure only fields x, y and z are updatable on Foo.
  3. Keep a history of changes on fields a, b, and c on Bar.

How are you going to do that if all you get from you client is an entirely “new” resource? Are you going to diff it and derive what changed? That’s hard and usually ends up being really messy.

Again, just have your API consumers be explicit about the fields they’re changing and all your problems will go away, your life will be better, and your co-workers will be super impressed by your simple and intelligent code. :) Just kidding, you’ll never impress another programmer, so don’t even bother. :P

Gimme Me Some Code

You should check out the samples and the tests in Voodoo. Scott Brown and I recently added some cool code to support partial enitity with automatic validation in Jersey.

We decided to take a rather informal representation of the data by accepting a dictionary where only the fields to be updated are present. Our Jersey endpoint therefore looks something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
@Path("/person")
public class PersonResource {
    // Other endpoints omitted.

    @PATCH
    @Path("/{id}")
    public Response updatePerson(@PathParam("id") String id, @Editable(type = Person.class,
        fields = {"name", "age"}) Map personUpdates) {
        // Note: In Voodoo @Editable makes sure only the editable
        // fields defined here are being updated.
        // Voodoo also validates the values passed according to the
        // 'javax.validation` annotations on the Person class.

        // Build your query to update person here.
        // You can also raise events, based on the properties being updated here.

        // It's a good idea to return the latest representation of the entity.
        Person updatedPerson = personStore.getById(personUpdates.get(id));
        return Response.ok(updatePerson).build();
    }
}

As you can see, the approach is rather straightforward. I think the @Editable annotation actually turned out pretty well: it abstracts away all the validation code from your endpoint.

A few parting thoughts

Of course not everything is perfect with this approach; there are a few minor drawbacks. Chief among these is the fact that there’s a good chunk of tooling that doesn’t quite support PATCH yet.

Backbone for example, has recently added support for PATCH (in IE) but hasn’t put it into any of their releases yet. We’ve had similar problems with Swagger; the method isn’t really supported there either.

One other minor consideration is that if you don’t really need any intelligence around what’s changing, the code to do a full resource update is usually simpler.

So there you go: if you want to be awesome, start using PATCH and supporting partial resource updates. Otherwise, you can keep doing the same old thing you’ve been doing. :)

Inside the .git Directory

Have you ever wondered how git works? Have you tried to figure out how git stores stuff?

Well, I have. This is what I discovered about the git object store.

In the begining there was nothing.

Not really. In fact, there’s quite a bit of stuff in an empty repo. On your command line, create a new repo and list its contents.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Create the repository
$ mkdir mygitrepo && cd mygitrepo
$ git init
Initialized empty Git repository in /Users/earaya/Projects/gitrepo/.git/

# Show all files
$ find .
.
./.git
./.git/config
./.git/description
./.git/HEAD
./.git/hooks
./.git/hooks/applypatch-msg.sample
./.git/hooks/commit-msg.sample
./.git/hooks/post-update.sample
./.git/hooks/pre-applypatch.sample
./.git/hooks/pre-commit.sample
./.git/hooks/pre-rebase.sample
./.git/hooks/prepare-commit-msg.sample
./.git/hooks/update.sample
./.git/info
./.git/info/exclude
./.git/objects
./.git/objects/info
./.git/objects/pack
./.git/refs
./.git/refs/heads
./.git/refs/tags

As you can see, there’s quite a bit of stuff there to begin with.

Something we learn right off the bat is that Git supports hooks. Take a look at some of the samples. At work, we use a pre-commit hook to run our linting before letting you commit. But I digress, I want to talk about the object store.

You’ll see that initially, the objects directory is empty.

So, let’s create a git object. In order for this to work, you have to type the commands as shown:

1
2
$ echo "git rocks" > test.txt
$ git add test.txt

As a result of adding the above object you should now see:

1
2
3
4
5
6
$ find .git/objects
.git/objects
.git/objects/f6
.git/objects/f6/8ce0a31a54e37649ee417d60e90911258f1043
.git/objects/info
.git/objects/pack

And then there was a (SHA1) hash

You might now be wondering how I was so confident you’d get the same output I got on my machine.

The answer is simple: Git is, at its core, just a key-value store. Git doesn’t care what you call your objects; Git only cares about the content of those objects.

To store an object, Git first peforms a few operations on the data. One of these operations is to calculate the SHA1 hash of the data and then store the data in the object store with a filename representing the hash.

At this point we have to make a brief but important aside. SHA1 has two properties that make it really useful for Git’s use:

  1. It’s extremely unlikely that 2 different objects will have the same hash value.
  2. Identical objects will always the same hash representation.

And so, 8ce0a31a54e37649ee417d60e90911258f1043 represents the SHA1 hash of “git rocks”. That’s how I knew you’d get exactly the same output I got.

And finally, there was a tree.

So now that our file is tucked away in the object store, we have to wonder: What happened to its filename? After all, Git wouldn’t be that useful if it didn’t preserve folder structure and if it didn’t let us find our files by name.

Git tracks pathnames through a tree.

Go back to your command line and type:

1
2
3
4
5
6
7
8
9
10
$ git write-tree
81cbaf28bc31ce9218d51b685e35a08bfea99599
$ find .git/objects
.git/objects
.git/objects/81
.git/objects/81/cbaf28bc31ce9218d51b685e35a08bfea99599
.git/objects/f6
.git/objects/f6/8ce0a31a54e37649ee417d60e90911258f1043
.git/objects/info
.git/objects/pack

git write-tree (a low level command) saves the state of the index (your staged files) to the object store. Thus, we now see a new object in the store: .git/objects/81/cbaf28bc31ce9218d51b685e35a08bfea99599. This new object is our tree.

Once you peek into the tree object, it’ll immeidately make sense. You’ll be able to see the file contents by typing the following git command:

1
2
$ git cat-file -p 81cbaf
100644 blob f68ce0a31a54e37649ee417d60e90911258f1043    test.txt

You might already have guessed it, but the first section of the file, the 100644 represents the file permissions (in octal). f68ce0 represents the filename of the blob in the store, and test.txt is the filename.

Directory hierarchies are represented in a similar manner, of course.

Conclusion

And there you have it folks. That’s pretty much all there is to how Git stores objects. Pretty simple, huh?

I think it’s amazing how Linus built such a powerful and useful system by elegantly using a simple hashmap. I wish my software was more like Git.

I hope you too gain an appreciation for using simple constructs in your own code.

Javascript AMD: Asynchrounous Module Definition

Before you read any further, I should warn you: I have limited experience with JS. However, please hear me out; I’ve found that the concepts I’m going to talk about, although imporant, are still not widely adopted by Javascript developers.

In the last few years we’ve seen a shift to the “cloud”. With that, there has been a resurgence of the web. We’re now doing things with HTML and HTTP that no one would have thought possible just 5 years ago. We now render data on the client, we push data from the server to the browser to have real time interactions like a native application would. Heck, we can even do 3D rendering on the browser… and if you have a modern browser, it works well!

In a few words, we’re doing what Java Applets promised to do but never accomplished.

This re-birth of the web, however, has been bumpy. Developing for the browser is plagued with problems. You have to target multiple (old) browsers, on multiple OSes, with multiples displays; you have to deal with disparate hardware, internet connections, etc. I think you get the point: you really don’t know where and how your app is going to run. And to make all of this worse, the tooling for writing browser applications is still maturing; there’s not a lot of help out there.

Imagine, for example, if you had to write a Java server application and you didn’t have a good compiler; or if you had to add a bunch of conditionals to detect what OS you’re running on – it’d be crazy, right?

But that’s not even the worse of it; imagine Java the language provided no mechanism for modularizing your code: not JARs, no classes, no nothing. And of top of that everything was globally scoped. Scary, huh?

Well, to some extent that’s the situation we find ourselves in when we write Javascript applications. Javascript has no supports for modules. And of top of that, everything in the browser is globally scoped. Now, I’m aware that we developers have been playing games for years to diminish the global scoping problem, but we really haven’t had a good solution. Unitl now, at least. Now we have AMD… and it changes everything.

AMD stands for asynchronous module definition. The goal of the AMD format is to provide a solution for modular Javascript that we can use right now (while we wait for harmony).

The genius in AMD is that it proposes a format where both the module and its dependencies are asynchronously loaded. If you’re working on the browser, the async nature of AMD provides advantages over other module systems (such as CommonJS) that really make it enjoyable to code in Javascript.

But enough talk, let’s get to some code. I don’t want to write a full tutorial on AMD, so the code samples will be brief. I’m just hoping that when you’re done reading this, you won’t write a single line of code withouth using RequireJS.

Here’s how you define a module:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
define(['jquery', 'backbone'], function ($, Backbone) {

  // At this point your dependencies are loaded (jquery and backbone).

  // Let's do some setup. Anything here is private to the module.
  var privateVariable = 5;

  // This what consumers of your module will get when they require your module.
  return Backbone.View.extend({
    id: privateVariable,
    initialize: function () {
      // Some view init code.
    },

    // The rest of your public methods.
  });


});

I don’t know about you, but the first time I saw an AMD module I immediately fell in love. Finally I saw an easy way to have private members. Finally there was a cleaner way to scope modules and their dependencies. I could go on and on, but I think the benefits I’ve listed, should be enough to convince you.

Or at least, I hope this has at least made you aware of AMD and piqued your interest. Writing Javascript apps on the browser doesn’t have to be a pain anymore. In fact, I rather enjoy.

I know, however, I haven’t done this topic justice. Please go look at the RequireJS site and look at the samples and the documentation – espeically the the optmizer section. Go look at this post by James Burke where he talks about the many reasons developers are now using AMD. And then when you’re done reading that, come join the fun.

Scala Really Scales

About 3 months ago @iammerrick started talking to me about Scala. After talking a little bit about it, he said: “It’s called Scala because the language scales with you”.

My initial reaction was to dismiss Merrick’s statement as “just marketing fluff”. However, after playing with the language a bit, I’m convinced Scala is really the way forward on the JVM.

As a side note – if you can take the Functional Programming Principles In Scala course by Martin Odersky on Coursera, I’d highly recommend it. I just finished the class and really enjoyed it. The course is a little on the tough side (specially the last 3 assignments), but it’s a great introduction to Scala and to functional programming.

Why Scala Matters

The JVM is a superb platform. It runs everywhere and it’s just solid.

Furthermore, the Java ecosystem is fantastic. Everything from the IDEs, to the build tools, to the Servlet containers, to the OSS projects that run on Java are first class, well documented and well supported by a fantastic community of excellent developers.

Java the language, however, has failed to keep with the times. The lack of anonymous functions and closures, the lack of type inference, the lack of object and array literals, and the lack of many other constructs make the language feel archaic. Unfortunately it’s not just that the language is arcane, it’s verbose and it gets in the way; it’s not that all of this is annoying, it gets in the way of productivity. If often wish C# would run on the JVM.

In summary, the JVM is a great platform, but it just needs a better statically typed language.

How Scala Truly Scales

The thing about Scala is that it has a low barrier to entry. If you’re not used to functional languages, you can write imperative code and get started anyhow. Classes are also first class citizens, so if you’re coming from Java, you’ll feel right at home.

If you’ve written C# and used some of the newer features such as LINQ, anonymous functions, etc., the transition will be even easier.

And then, when you’re starting to get comfortable, you can write truly idiomatic Scala. It’s simple, uncluttered, and succinct. Here’s a short sample:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
  /**
   * This function decodes the bit sequence `bits` using the code tree `tree` and returns
   * the resulting list of characters.
   */

  def decode(tree: CodeTree, bits: List[Bit]): List[Char] = {
    def decode0(tree0: CodeTree, bits: List[Bit], acc: List[Char]): List[Char] = {
      tree0 match {
        case Leaf(c, w) => decode0(tree, bits, acc :+ c)
        case Fork(l, r, cs, w) => {
          if (bits.isEmpty) acc
          else if (bits.head == 0) decode0(l, bits.tail, acc)
          else decode0(r, bits.tail, acc)
        }
      }
    }
    decode0(tree, bits, List())
  }

(Note, I’m not claiming to write idiomatic Scala. In fact, as mentioned before, I’ve just barely started using the language. This snippet, however, demonstrates some good features).

If you’re like me, it may take you a bit to get used to they syntax. In fact, I thought I’d never get used to the implicit returns and the optional semicolons – but now I wish I’d never have to type those extra characters ever again.

Pattern matching and case classes are a fantastic features. I wish more languages (I’m looking at you C#) would offer such capabilities. In the snippet above you can also see functions are first class citizens, and how to type variables.

Things you can’t see in the sample, but that are just as important:

  1. The List class you see above, is immutable. This means most operations on it actually return a new list instead of mutating it.

  2. Types are optional; if the compiler can figure out the type, you can omit it.

It’s hard to explain, but I felt like Scala was really forgiving and let me grow into the language. For example, as I learned, I realized more and more that I really didn’t need mutable types. This of course, has a nice side effect of making parallelization of your code much easier.

In summary Scala scales with you. As you learn the language, you can express more interesting thoughts in simpler ways. Furthermore, the language pushes to clean, elegant solutions.

Final Thoughts

If Scala required it’s own runtime, I’d have to say it’d be just another functional language. But because Scala runs on the JVM, and because it can use all the Java code you currently depend on, Scala really does have a bright future.

I hope in 5 or so years we’re all writing Scala instead of Java.