Sunday, March 22, 2015

Build Tooling Discussions, Part I: Code Formatting

In an earlier post I released base-api, a template project for developing Scala API servers. In this post I'd like to discuss why I included a code formatter in that project and the philosophy of code formatting in general.

My favorite way to talk about technical topics is by centering them on real world examples. To that point, I'd like to take a look at 3 different ways of formatting the same snippet of code.
  /** This is some scaladoc
    *  @param p1
    *  @param p2
    *  @param p3
    */
  class A(p1: String,
    p2: String,
    p3: String) extends
  Intf with Intf2
  {
    val p4 : Int = p1 match {
      case "A" => 1
      case "BBBBBBBBBBBB" => 2
      case "CCCCC" => 3
    }
    val list = List("a", "b")
    val zip = list zip(list)
    def foo( param: String ) = "foo"
  }
  /**
   * This is some scaladoc
   * @param p1
   * @param p2
   * @param p3
   */
  class A(p1: String,
          p2: String,
          p3: String)
  extends
    Intf with
    Intf2 {
    val p4: Int = p1 match {
      case "A"            => 1
      case "BBBBBBBBBBBB" => 2
      case "CCCCC"        => 3
    }
    val list = List(
      "a",
      "b"
    )
    val zip = list zip list
    def foo(param: String) = "foo"
  }
  /**
   * This is some scaladoc
   * @param p1
   * @param p2
   * @param p3
   */
  class A(p1: String, p2: String, p3: String) extends Intf with Intf2 {
    val p4: Int = p1 match {
      case "A" => 1
      case "BBBBBBBBBBBB" => 2
      case "CCCCC" => 3
    }
    val list = List("a", "b")
    val zip = list zip list

    def foo(param: String) = {
      "foo"
    }
  }
The crazy thing to me is that none of those are wrong, or even particularly different. Let's throw one extra in that really is "wrong":
  class A(p1:String,p2:String,p3:String) extends Intf with Intf2 {
    val p4: Int = p1 match {
      case "A" => 1; case "BBBBBBBBBBBB" => 2; case "CCCCC" => 3;
    }
    val list = List("a","b"); val zip = list.zip(list)
    def foo(param: String) = "foo"
  }
We could use a code formatter to make any of these snippets look like any of the others (except #4, that can be fixed but it can't be emulated). At first blush, this might seem a bit draconian - why should someone get to dictate how my code looks not only with some style guide but with actual automated rule enforcement?



Cognitive load


The answer is that there is a cost to having different ways of representing the exact same thing, and that cost is always masked at creation time and only borne later when the code must be read and understood. That is, you're making future you (and every other dev on the codebase) suffer so that present you can do whatever feels right in the moment (re: code format, though the same applies to most other rules about code). The basics of cognitive load for user experience are covered in a great article by Kevin Matz, but the gist of it is that we want to cut to the barest minimum the brain-effort required to parse and understand a piece of code.



Enter: automated code formatting


The fantastic thing for us code slingers is that there are a plethora of tools out there for most every language that will do a non-trivial portion of the work for us. Reducing the cognitive load required to comprehend a piece of code has a number of aspects - in a language like Scala it might mean refraining from using implicits, always using case classes for data typing (as opposed to type declarations), or other specifics. It often also includes style checking that goes outside the scope of formatting. Fortunately, formatting is a subclass of this problem that can be fixed in an automated way, but other subclasses like preventing the usage of null in Scala where we have native optionals, cannot be fixed in an automated way (though they can be detected thankfully!).

Scalariform is my favorite Scala code formatter, but which one you use isn't really important. You might even choose not to automate format standardization. The key is that you think through what rules you want applied to your codebase, how they can and should be enforced, and whether or not there's value in providing fixes to everyone automatically. There are downsides to this of course, sometimes Scalariform will wantonly destroy a carefully crafted special code format used to make (e.g.) a DSL easier to read or something, but it's trivial to add comment blocks disabling it around special whitespace areas.