Robert Conrad

Saturday, June 13, 2015

A Scala Developer's Perspective on Swift Protocols

Alternate title: error: protocol 'Foo' can only be used as a generic constraint because it has Self or associated type requirements

Swift is cool. It's rocking that functional paradigm without giving up the OO strengths that make it possible to comprehensibly compose massively complex applications. It reminds me of Scala, which straddles that same gap comfortably and to great effect. However... what the heck is going on with typealiases in protocols?

I'd like to give a motivating example. Here is a pretty canonical example of the Observer Pattern written in Scala:

    trait Observer[T] {
      def notifyMe(param: T)
    }

    var notified = 0

    val intObservers: List[Observer[Int]] = List(
      new Observer[Int] {
        def notifyMe(param: Int) {
          notified = 1
        }
      }
    )

    assert(notified == 0)
    intObservers.foreach(_.notifyMe(1))
    assert(notified == 1)

Pretty straightforward. We have a generic observer trait that we specialize to a registry that accepts ints, then we notify all observers in the registry with an int param. Now let's have a look at how we might (naively) implement such a pattern in Swift:

    protocol Observer {
        typealias T
        func notifyMe(param: T)
    }

    var notified = 0

    // super annoying, no anonymous classes in Swift!
    class ObserverImpl: Observer {
        typealias T = Int
        func notifyMe(param: T) {
            notified = param
        }
    }

    var intObservers: [Observer<Int>] = [AnyObserver(ObserverImpl())] 

    assert(notified == 0)
    intObservers.forEach { $0.notifyMe(1) } 
    assert(notified == 1)

Again pretty straightforward but this time on compile we get the following errors:

Cannot specialize non-generic type 'Observer'
Member 'notifyMe' cannot be used on a value of protocol type 'Observer'; use a generic constraint instead

What? I'm really not allowed to constrain my generic protocol to a specialized type that I can concretely operate on? Yep. Them's the breaks -- protocols really are not interfaces, and once you add typealiases to them you lose all ability to refer to them concretely in your program. So that sucks, but fortunately there's a semi-reasonable way around this! Apple themselves apply this pattern all over the place in the standard Swift libraries because, it turns out, generic programming is really useful - and a huge part of that is the need to refer to type-constrained generic interfaces. The solution is to wrap our generic protocol in a concrete class. The new struct AnyObserver is providing the compiler with additional type information that it can't guarantee from the protocol at compile-time.

    protocol Observer {
        typealias T
        func notifyMe(param: T)
    }

    struct AnyObserver<T>: Observer {
        private let _notifyMe: (T) -> Void
        
        init<O: Observer where T == O.T>(_ observer: O) {
            _notifyMe = observer.notifyMe
        }
        
        func notifyMe(param: T) {
            _notifyMe(param)
        }
    }

    var notified = 0

    class ObserverImpl: Observer {
        typealias T = Int
        func notifyMe(param: T) {
            notified = param
        }
    }

    var intObservers: [AnyObserver<Int>] = [AnyObserver(ObserverImpl())]

    assert(notified == 0)
    intObservers.forEach { $0.notifyMe(1) }
    assert(notified == 1)

So for a bit of extra code we get to have our cake and eat it too. In this contrived example that seems like a pretty good deal, but if you imagine the complexity of generic interfaces you can come up with in, e.g., a large enterprise app (or heck Facebook or something), you might think twice about having to repeat your interfaces everywhere you have an Any* wrapper. C'est la Swift.

Saturday, April 4, 2015

Build Tooling Discussions, Part II: Style Checking & Linting

In an earlier post I released base-api, a template project for developing Scala API servers. In this post I'd like to discuss why I included a style checker / linter in that project and the philosophy of automated code repository rule enforcement in general. The last post on this topic covered automated code formatting, which is really just a subset of style rule enforcement that happens to be possible to fix in an automated manner.

Style checking and linting are closely related concepts that differ primarily in that:

Style rules are aimed at reducing the cognitive load (see also) imposed on engineers attempting to understand or work with your codebase.
Linting is aimed at preventing bugs enforcing a ban on specific patterns and practices that are both considered harmful and detectable in an automated manner.

We consider these together because often style rules and linting bleeds together. For example, is it a stylistic concern that all if statements have braces, or is not including them a harmful practice because it frequently leads to bugs where logic accidentally escapes control flow? Hard to say, so we just roll them into one topic.

A common rule among style checkers is the prevention of magic number use. Let's take a look at an example where preventing use of magic numbers might save our bacon. Say we implement this block of code somewhere in our application:

    def hasFlag(bits: Int, flag: Int) = {
      ((bits >> flag) & 1) != 0
    }
    
    def doSomeLogic(userPreferences: Int) {
      // bit 0 is make profile public
      if (hasFlag(userPreferences, 0)) {
        // show public profile
      }
      // bit 1 is receive site emails
      if (hasFlag(userPreferences, 1)) {
        // send daily digest email
      }
    }

Then elsewhere another developer implements the logic to store user preferences in the database:

    // sets the bit at $flag position to $enabled
    def setFlag(bits: Int, flag: Int, enabled: Boolean) = {
      val value = if (enabled) 1 else 0
      bits | (value << flag)
    }

    def storeUserPreferences(receiveEmails: Boolean, publicProfile: Boolean) {
      var userPreferences = 0
      userPreferences = setFlag(userPreferences, 0, receiveEmails)
      userPreferences = setFlag(userPreferences, 1, publicProfile)
      // store userPreferences...
    }

We suddenly have a very hard to find bug that may not even show up in cursory testing. We release and maybe it takes a while before anyone even notices some users complaining about getting emails they weren't supposed to. If you haven't noticed it - the position of the receiveEmails and publicProfile bits are reversed in these two snippets. With a 'no magic numbers' rule enforced, this code would not be allowed to build / commit / merge (depending on where the pipeline you implement the rule). Hopefully the developer's instinct would be to rectify this situation with an enum thusly:

    object UserPreferenceBits extends Enumeration {
      type UserPreferenceBit = Value
      val ReceiveEmails = Value(0)
      val PublicProfile = Value(1) 
    }

    // checks whether an integer has the bit at $flag position enabled
    def hasFlag(bits: Int, flag: UserPreferenceBit) = {
      ((bits >> flag.id) & 1) != 0
    }

    // sets the bit at $flag position enabled
    def setFlag(bits: Int, flag: UserPreferenceBit, enabled: Boolean) = {
      val value = if (enabled) 1 else 0
      bits | (value << flag.id)
    }

However if they do not understand the reasoning behind the 'no magic numbers' rule, they might try to do something unpleasant like:

    def doSomeLogic(userPreferences: Int) {
      val publicProfileFlag = 0 // BAD!
      val receiveEmailFlag = 1 // BAD!
      // bit 0 is make profile public
      if (hasFlag(userPreferences, publicProfileFlag)) {
        // show public profile
      }
      // bit 1 is receive site emails
      if (hasFlag(userPreferences, receiveEmailFlag)) {
        // send daily digest email
      }
    }

    def storeUserPreferences(receiveEmails: Boolean, publicProfile: Boolean) = {
      var userPreferences = 0
      val receiveEmailFlag = 0 // BAD!
      val publicProfileFlag = 1 // BAD!
      userPreferences = setFlag(userPreferences, receiveEmailFlag, receiveEmails)
      userPreferences = setFlag(userPreferences, publicProfileFlag, publicProfile)
      // store userPreferences....
      userPreferences
    }

Unfortunately some bad practices just can't be defended against, and in the face of a truly determined developer all rules will whither and die. That's what collaboration on the rules creation process and a strong commitment to education about it within an organization is crucial to its efficacy.

Where and how to deploy style checking & linting

There are three primary places one can implement these checks - at compile time, at commit time, or as part of the PR merge process. As with most automation, running it earlier is generally better - at least at compile time the developer is likely still in the context of the offending code. Sometimes this isn't possible - maybe your code doesn't even have a compile step, or maybe your linter is really heavy and including it in your compile step would cause an unacceptable delay for the developer. If you have a good reason, later is OK too, it just has to be balanced against the increased cost of getting back into context to fix whatever problems are uncovered.

The other part of the 'how' is your internal education process about these kinds of rules, and process for updating / adding / removing them. All developers have to be invested in the process for it to matter (and for them to not try to work around it), so it's worth your time to demonstrate real benefits in your project and make sure everyone is on board. Another consideration is that you should do this as early as possible, the most common failure of attempts to implement rules like this is that they are made on large extant codebases that have thousands or more violations. Ain't nobody going back through and fixing all that.

For Scala my favorite is ScalaStyle - it's mostly style rules but also has some good functional rules.

Sunday, March 22, 2015

Build Tooling Discussions, Part I: Code Formatting

In an earlier post I released base-api, a template project for developing Scala API servers. In this post I'd like to discuss why I included a code formatter in that project and the philosophy of code formatting in general.

My favorite way to talk about technical topics is by centering them on real world examples. To that point, I'd like to take a look at 3 different ways of formatting the same snippet of code.

  /** This is some scaladoc
    *  @param p1
    *  @param p2
    *  @param p3
    */
  class A(p1: String,
    p2: String,
    p3: String) extends
  Intf with Intf2
  {
    val p4 : Int = p1 match {
      case "A" => 1
      case "BBBBBBBBBBBB" => 2
      case "CCCCC" => 3
    }
    val list = List("a", "b")
    val zip = list zip(list)
    def foo( param: String ) = "foo"
  }

  /**
   * This is some scaladoc
   * @param p1
   * @param p2
   * @param p3
   */
  class A(p1: String,
          p2: String,
          p3: String)
  extends
    Intf with
    Intf2 {
    val p4: Int = p1 match {
      case "A"            => 1
      case "BBBBBBBBBBBB" => 2
      case "CCCCC"        => 3
    }
    val list = List(
      "a",
      "b"
    )
    val zip = list zip list
    def foo(param: String) = "foo"
  }

  /**
   * This is some scaladoc
   * @param p1
   * @param p2
   * @param p3
   */
  class A(p1: String, p2: String, p3: String) extends Intf with Intf2 {
    val p4: Int = p1 match {
      case "A" => 1
      case "BBBBBBBBBBBB" => 2
      case "CCCCC" => 3
    }
    val list = List("a", "b")
    val zip = list zip list

    def foo(param: String) = {
      "foo"
    }
  }

The crazy thing to me is that none of those are wrong, or even particularly different. Let's throw one extra in that really is "wrong":

  class A(p1:String,p2:String,p3:String) extends Intf with Intf2 {
    val p4: Int = p1 match {
      case "A" => 1; case "BBBBBBBBBBBB" => 2; case "CCCCC" => 3;
    }
    val list = List("a","b"); val zip = list.zip(list)
    def foo(param: String) = "foo"
  }

We could use a code formatter to make any of these snippets look like any of the others (except #4, that can be fixed but it can't be emulated). At first blush, this might seem a bit draconian - why should someone get to dictate how my code looks not only with some style guide but with actual automated rule enforcement?

Cognitive load

The answer is that there is a cost to having different ways of representing the exact same thing, and that cost is always masked at creation time and only borne later when the code must be read and understood. That is, you're making future you (and every other dev on the codebase) suffer so that present you can do whatever feels right in the moment (re: code format, though the same applies to most other rules about code). The basics of cognitive load for user experience are covered in a great article by Kevin Matz, but the gist of it is that we want to cut to the barest minimum the brain-effort required to parse and understand a piece of code.

Enter: automated code formatting

The fantastic thing for us code slingers is that there are a plethora of tools out there for most every language that will do a non-trivial portion of the work for us. Reducing the cognitive load required to comprehend a piece of code has a number of aspects - in a language like Scala it might mean refraining from using implicits, always using case classes for data typing (as opposed to type declarations), or other specifics. It often also includes style checking that goes outside the scope of formatting. Fortunately, formatting is a subclass of this problem that can be fixed in an automated way, but other subclasses like preventing the usage of null in Scala where we have native optionals, cannot be fixed in an automated way (though they can be detected thankfully!).

Scalariform is my favorite Scala code formatter, but which one you use isn't really important. You might even choose not to automate format standardization. The key is that you think through what rules you want applied to your codebase, how they can and should be enforced, and whether or not there's value in providing fixes to everyone automatically. There are downsides to this of course, sometimes Scalariform will wantonly destroy a carefully crafted special code format used to make (e.g.) a DSL easier to read or something, but it's trivial to add comment blocks disabling it around special whitespace areas.

Sunday, March 1, 2015

A Handy-Dandy Typesafe Config Implicit Wrapper in base-api

In an earlier post I released base-api, a template project for developing Scala API servers. In this post I will describe the convenient configuration system included in that project.

The Typesafe Config project is easily the best, most convenient, easiest to use, easiest to understand extensible configuration management system I've ever had the pleasure of working with. It has no dependencies, it reads JSON and HOCON (a JSON superset with substitutions, etc.), has excellent merge strategies, accepts command line runtime overrides automatically, and is of course very well tested (coming from Typesafe after all). Ok great, so what does that look like? Here's an example REST API reference.conf section:

rest {
  protocol = http
  host = "0.0.0.0"
  port = 8080
  timeout = 2 seconds
}

Which might be paired with a production.conf that overlays these values:

rest {
  protocol = https
  port = 443
  timeout = 5 seconds
}

We have 4 values (addressable as rest.protocol, etc.) that have 3 distinct data types - string, string, int, FiniteDuration. The code that requires these values is probably a case class or other typed input:

  case class RestService(
    protocol: String, 
    host: String, 
    port: Int, 
    timeout: FiniteDuration
  )

And to obtain them in at runtime we might have some code that looks like this:

  import scala.concurrent.duration._ // provides implicit millis->FiniteDuration cnvsn
  
  val defaultConf = ConfigFactory.defaultReference()
  val conf = ConfigFactory.load().withFallback(defaultConf)
  
  val secondsFormatter = new PeriodFormatterBuilder()
    .appendSeconds().appendSuffix(" seconds").toFormatter
  val timeoutPeriod = secondsFormatter.parsePeriod(conf.getString("rest.timeout"))  
  
  val restService = RestService(
    conf.getString("rest.protocol"),
    conf.getString("rest.host"),
    conf.getInt("rest.port"),
    timeoutPeriod.toStandardDuration.getMillis.millis
  )

Doesn't look horrible, but could become a bit nasty once we have hundreds of configurable values - a typical fate for any actively developed API. Now let's see what configuring the RestService would look like in base-api where we have our fancy wrapper:

  import base.common.config.BaseConfig._ // provides HOCON implicit cnvsns 

  val REST = "rest"
  val restService = RestService(
    Keys(REST, "protocol"),
    Keys(REST, "host"),
    Keys(REST, "port"),
    Keys(REST, "timeout")
  )

Awesome! We are no longer responsible for specifying the types of data to be retrieved, and particularly in the case of the FiniteDuration there's some real magic going on - somewhere, some code is figuring out how to get from a string in the conf 5 seconds to a strongly typed duration (and it would work equally well if we have put 5 hours or 1 day).

The something, somewhere is the BaseConfig which provides a chain of implicits that will figure out how to populate config values for just about anything HOCON supports, and are easily extensible to custom data types. The primary built-in custom data type is Period, which cycles through a number of formatters attempting to find one that properly parse the provided config value (e.g. "milli", "millis", "millisecond", "milliseconds").

Easy stuff so far. Let's make it interesting.

Ok let's say we need something more complicated. Assume we have this conf:

rest {
  protocol = http
  // ...
  endpoints = [
    { path = foo, methods = [get]       },
    { path = bar, methods = [get, post] }
  ]
}

To configure these values:

  object Methods extends Enumeration {
    type Method = Value
    val GET = "get"
    val POST = "post"
    // ...
  }
  case class Endpoint(path: String, methods: Set[Method])
  case class RestService(
    protocol: String, 
    // ... host, port, etc ...
    endpoints: List[Endpoint]
  )

Unfortunately we don't have any implicits smart enough to figure out how to populate a Set[Method] data type let alone a List[Endpoint] data type. But with just a wee bit of extra work we can get these running as smoothly as the simpler types:

  implicit def string2Method(s: String): Method = Methods.withName(s)

  val REST = "rest"
  val restService = RestService(
    Keys(REST, "protocol"),
    getConfigList(Keys(REST, "endpoints")).map { endpointConfig =>
      implicit val config = new BaseConfig(endpointConfig)
      Endpoint(
        Keys("path"), 
        Keys("methods")
      )
    }
  )

All we had to do was sprinkle a little more implicit magic on it and bingo, we get this rather complex strong type hierarchy built for us out of our config DSL. What's happening in the above snippet is that we are saying the endpoint config value is itself a list of Typesafe Configs. This allows us access to the full power of the implicit chain but scoped down to the contents of each index of this value's list of properties. Neat huh?

This is cool, but isn't implicit magic bad?

Some people feel pretty strongly that implicits should be used with extreme caution and that they can quickly run away in a large multi-developer codebase to an indecipherable unmaintainable mess. To that I say: I agree. You have to be really careful with them, and frankly I think using them in core business logic is a mistake that comes back to haunt people frequently - though it's pretty hard to get around their usage in core libs like JSON DSLs. For their usage here, I think it will be OK even in medium to larger size projects, as long as they remain constrained to the configuration system and only deal with common primitives. On the other hand, I wouldn't fault a dev team for saying "no way Jose, not in my codebase" ;)

Sunday, February 8, 2015

Reducing Space Usage in Redis for UUIDs

One of the primary concerns when setting up a data store is figuring out how much space it will need to cover your use case, relatedly how it will perform on datasets of the sizes you expect, and when you will need consider engaging advanced features like sharding to handle your load. With this in mind, I was thinking about how to spec a system the other day for storing large amounts of data in Redis, keyed for the most part by UUID. Most of the values associated with these keys were either UUIDs as well, or relatively tiny data types like booleans and integers. Given these aspects of the intended system, and combined with the standard practice of storing data in Redis as UTF-8 strings, it was immediately apparent to me that there were some easy gains to be had.

How to turn 36 bytes into 16

The data type of Redis keys is binary-safe string, so the majority of libraries simply execute toString() or whatever your language equivalent is on your key and call it a day. It occurred to me that the majority of the space this data store occupied would be taken up by these string-encoded UUIDs, which has this canonical form:

de305d54-75b4-431b-adb2-eb6b9e546014

As a UTF-8 string, this form will take up one byte per character or 36 bytes, 32 for informational characters and 4 for dashes. What a waste! A UUID is simply a 16 octet (16 byte) integer. The maximum value of this number is 2^128:

340282366920938463463374607431768211456

As a UTF-8 string, this would be 39 bytes. Not only is that worse than 36 bytes, it's pretty close to the average case integer representation (2^128/2). But! Who says we have to represent this as a human-readable string of bytes? If we represent the UUID as a byte array, it is of course only 16 bytes:

��£¬¾ý�ȹɀɱ��ʜͶϋ͍

Boom. We just reduced our data store size on the order of 50%. It's not going to be as good as 16/36 = 55% reduction because there's overhead for every KV pair and there's potentially some other data stored (boolean, int, etc.), but that's still a huge win. Now my single instance of Redis can last me twice as long (assuming a linear growth curve) before I need to worry about sharding, etc. Normally I wouldn't recommend relying on anything less than an order magnitude gain for an architectural decision, but this is really just a thought experiment :)

Are you crazy?

Obviously this does have some drawbacks. If I'm debugging some production issue and the logs say there's a problem with ID xxx-xx-xx, I can't just fire up redis-cli and get the value of $keyPrefix-xxx-xx-xx. I'm going to have to use some custom tooling to convert my logged ID to a byte array, add my prefix and run the command I want against that. This is a non-trivial cost, but for a use-case in which the primary data stored is UUIDs we're talking about a ~55% reduction in space (16/36 bytes). If I can support a userbase & dataset that is more than twice as large on the same hardware I consider that a win, especially since the primary limiting factor in most projects is IO bottlenecking. That said, I probably wouldn't do this in the real world because the value of being able to debug quickly and get other developers up-to-speed quickly usually exceeds the value of performance gains less than 10x.

Let's go further

Alright so we've already established that I'm crazy. How far can we take it? There's one other obvious target for reducing wasted space in Redis KV pairs - the key prefix. Normally we namespace keys to keep them from colliding, so we might have some keys that look like (assuming non-binary UUIDs):

     standardUser-de305d54-75b4-431b-adb2-eb6b9e546014
groupNotification-de305d54-75b4-431b-adb2-eb6b9e546015
   messageContent-de305d54-75b4-431b-adb2-eb6b9e546016

For these keys we're using about 16 bytes each for the function of namespacing. With 16 bytes we could represent 2^128 namespaces! (Conveniently the same size as a UUID :) How many namespaces do we really need for keys? 256? 65536? Let's go with that. 2 bytes as a byte array gives us our 65k prefixes. We store these in an enum somewhere in the common lib for our project and bingo, we have a way to reference our keys 77.5% more space efficiently:

�¬-de305d54-75b4-431b-adb2-eb6b9e546015
ɱ�-de305d54-75b4-431b-adb2-eb6b9e546015
�Ͷ-de305d54-75b4-431b-adb2-eb6b9e546015

When we combine the two approaches together, we turn an average key length of 16 + 1 + 36 = 53 bytes into 2 + 1 + 16 = 19 bytes, for an average savings of 64%. Awesome:

�¬-��£¬¾ý�ȹɀɱ��ʜͶϋ͍
ɱ�-��£¬¾ý�ȹɀɱ��ʜͶϋ͍
�Ͷ-��£¬¾ý�ȹɀɱ��ʜͶϋ͍

Yea, this is crazy

If you go take a gander at the Redis intro to data types, it mentions the following about best practices for keys:

"Very short keys are often not a good idea. There is little point in writing "u1000flw" as a key if you can instead write "user:1000:followers". The latter is more readable and the added space is minor compared to the space used by the key object itself and the value object. While short keys will obviously consume a bit less memory, your job is to find the right balance."

Bah humbug. As much as I hate to admit it, this is right, the grist just isn't worth the grind. You win this time antirez!

That said - in my next post I'm still going to implement an extension to Scredis to do this anyway, just for kicks.

Saturday, February 7, 2015

Extending Scredis, a Scala Redis Client, to Write Binary Keys

In my last post I discussed the possibility of reducing Redis key space usage for UUID-based keys by storing them as byte arrays, along with converting key namespacing from human readable strings to integers as well. The conclusion of that post was that is a bad idea(tm), but I'm going to do it anyway. For TL;DR, show me the code, see the pull request.

The Current Scredis interface

Creating and modifying keys with Scredis is wonderfully simple and easy. It looks like this:

package scredis.commands

import org.scalatest._
import org.scalatest.concurrent._
import scredis._
import scredis.protocol.requests.StringRequests._
import scredis.util.TestUtils._

class BlogExampleSpec extends WordSpec
  with GivenWhenThen
  with BeforeAndAfterAll
  with Matchers
  with ScalaFutures {

  private val client = Client()
  private val SomeKey = "someKey"
  private val SomeValue = "HelloWorld!虫àéç蟲"

  Set.toString when {
    "setting a key that does not exist" should {
      "succeed" in {
        client.set(SomeKey, SomeValue)
        client.get(SomeKey).futureValue should contain(SomeValue)
      }
    }
  }

}

Ok great. Let's take that a step further and figure out how to write our UUID values as byte arrays rather than UTF-8 strings. To do that we will implement a Scredis Reader and Writer for the java.util.UUID type:

package scredis.commands

import java.nio.ByteBuffer
import java.util.UUID

import org.scalatest._
import org.scalatest.concurrent._
import scredis._
import scredis.protocol.requests.StringRequests._
import scredis.serialization.{Reader, Writer}
import scredis.util.TestUtils._

class BlogExampleSpec extends WordSpec
  with GivenWhenThen
  with BeforeAndAfterAll
  with Matchers
  with ScalaFutures {

  private val client = Client()
  private val SomeKey = UUID.randomUUID()
  private val SomeValue = UUID.randomUUID()

  implicit val uuidReader = new Reader[UUID] {
    protected def readImpl(bytes: Array[Byte]): UUID =
    bytes.length == 16 match {
      case false => null
      case true =>
        var msb = 0L
        var lsb = 0L
        for (i <- 0 until 8) {
          msb = (msb << 8) | (bytes(i) & 0xff)
        }
        for (i <- 8 until 16) {
          lsb = (lsb << 8) | (bytes(i) & 0xff)
        }
        new UUID(msb, lsb)
    }
  }

  implicit val uuidWriter = new Writer[UUID] {
    protected def writeImpl(value: UUID): Array[Byte] = {
      val bb = ByteBuffer.wrap(new Array[Byte](16))
      bb.putLong(value.getMostSignificantBits)
      bb.putLong(value.getLeastSignificantBits)
      bb.array()
    }
  }

  Set.toString when {
    "setting a key that does not exist" should {
      "succeed" in {
        client.set(SomeKey.toString, SomeValue)
        client.get[UUID](SomeKey.toString).futureValue should contain(SomeValue)
      }
    }
  }

}

Pretty cool, we just got our 36 byte UUID value down to a length 16 byte array. However, our goal was to have both UUID-based keys and values. In order to do that we have to update Scredis to allow the concept of Readers and Writers for keys, the same as we do for values. That's actually a pretty big interface change so I can't paste it all here, but you can review the PR to see how it's done. As an example, here's how the interface of the set command changed:

  def set[W: Writer](
    key: String,
    value: W,
    ttlOpt: Option[FiniteDuration] = None,
    conditionOpt: Option[scredis.Condition] = None
  ): Future[Boolean]

becomes:

  def set[K: Writer, W: Writer](
    key: K,
    value: W,
    ttlOpt: Option[FiniteDuration] = None,
    conditionOpt: Option[scredis.Condition] = None
  ): Future[Boolean]

Now that we have our handy new key writer interface, we can come back to our test spec and make our UUID-key the way we want to:

  Set.toString when {
    "setting a key that does not exist" should {
      "succeed" in {
        client.set(SomeKey, SomeValue)
        client.get[UUID, UUID](SomeKey).futureValue should contain(SomeValue)
      }
    }
  }

Sweet. Now we're properly encoded on both sides of the KV pair and we've shaved off 40 bytes from our original 72 bytes worth of data. The only thing left to do would be to add a binary namespace to our key, but I'll leave that to your imagination. Please, don't do this at home kids. As discussed in the previous post, the maintainability and debugability of your data store is not worth sacrificing for a few extra bytes :)

Sunday, January 4, 2015

Releasing base-api, a Template for Scala API Server Development

After about the fourth or fifth time I sat down to write an API in Scala it occurred to me that I should probably create a template that has most of the common ingredients to a successful service. Having done so I am now posting it publicly in the off chance that it can help somebody else out. If not at least I get one of my private repos back ;)

Here are the primary features of base-api:

A layered project hierarchy that looks like:

common libs => storage layer => business logic => public APIs

Clean, Scala-based build definition (as opposed to sbt-file based)
Slick 2.1 RDBMS integration (Postgres by default for prod, H2 for test)

Separate slick codegen project for automated DB abstraction case class generation

Redis KV Scala abstraction built on Sam Pullara's ultra-fast Java Redis client
Spray 1.3 REST API infrastructure

REST documentation with Swagger 1.3 annotations and Spray-Swagger 0.4 integration

Netty 4.0 Socket API infrastructure
Json4s 3.2 json abstraction
Akka 2.3 actors for core business logic, Slf4j logging
Scalastyle 0.6 linting with some actually reasonable defaults configured
Scoverage 0.99 for code coverage (which is very high btw)
Scalariform 1.3 for code formatting (impossible to live without these days)
Sbt-Assembly for fat-jar creation
A nice clean homegrown abstraction on Typesafe Configs for service configuration
The basics of a user authentication and authorization scheme compatible with both the REST and Socket APIs simultaneously. Also includes API key-based authentication.
Permissions management functionality for API endpoints (again compatible with both APIs)
Probably a bunch of other cool junk I'm just failing to remember right now
MIT License

In a future posts I'd like to cover the abstraction on the Redis client, the Typesafe Config wrapper, the architecture for multiple APIs types built on the same business logic, and the philosophy behind all the build tools (linting, formatting, coverage, etc). In the meantime I'll just tease the config stuff since it's neat - here's an example service implementation interface:

    class CommonServiceImpl(
      akkaHost: String,
      akkaPort: Int,
      defaultDuration: FiniteDuration
    ) extends ServiceImpl with CommonService

And here's the entirety of the code needed to configure it from the reference conf:

    val AKKA = "akka"

    Services.register(new CommonServiceImpl(
      Keys(AKKA, "host"),
      Keys(AKKA, "port"),
      Keys(AKKA, "defaultTimeout")
    ))

Pretty cool huh.

Saturday, July 12, 2014

Roll-your-own Dependency Injection in Scala

This post assumes you know what DI is and why it's a good thing(tm). If you're not on board with that, here's a StackOverflow answer that might help.

Since Scala runs on the JVM there are countless examples of dependency injection solutions available to any Scala project. There are frameworks like Spring and Guice that will provide you a fully configurable and highly extensible solutions, and there are even some native Scala solutions like Scaldi. Many Scala devs have taken a crack at super simple DI solutions, like Jason Arhart's Reader Monad and Michael Pollmeier's Service pattern. It is into this ring that I would like to throw my hat.

Services and Locators

The pattern that I came up with ended up being a small variation on the services idea with some chrome around it to make it easer to register and consume global services (note that this solution only applies to globally addressable services, you would want to drop the registry and locator pattern for scoped services). It's important to note that the registry below is considered an anti-pattern by some, and in truth it probably is for APIs that will be consumed outside your project. I just happen to like the convenience of it in my smaller side projects.

The gist of it is that we have a hashmap[class manifest -> service impl] registry that the companion objects of the service interface may act as a convenient locator for through their apply method. At app or test bootup we inform the registry of available services, then the companion object locators are used to apply them as needed at runtime. Code speaks louder than words so here we go..

The Code

We start with the base service interface which only requires that the services we implement in our project declare explicitly what their class manifest is so that we have a convenient unique handle to each service. This could be a string or class path or something silly like that, but I like the manifest better because it will never get out of sync with the interface.

/**
 * Injectable Service interface
 */
trait Service {

  /**
   * Explicit manifest declaration of the Service Interface,
   *  allows Service registration to omit a type parameter
   *  NB: this should always be a final def in the Service Interface
   */
  def serviceManifest: Manifest[_]

}

The companion objects of the service interfaces that we create are the convenience locators for our services. By defining them with the service manifest as an implicit parameter they can provide the registry with the appropriate request without having to figure out what interface the caller is asking for.

import scala.reflect.Manifest

/**
 * Base class for all Service companion objects
 *  Makes them into nifty service locators
 */
abstract class ServiceCompanion[T <: Service](implicit m: Manifest[T]) {

  def apply() = Services.apply[T]

}

A centralized boostrap isn't strictly necessary but it's a convenient place for us to define our standard services.

/**
 * Injects configuration into Services and boots them up. 
 *  If it's configurable, it belongs here.
 */
object ServicesBootstrap {

  /**
   * Trigger and status indicator for executing bootstrap startup behavior
   *  (i.e. registering services)
   */
  lazy val registered = {

    Services.register(new ExampleServiceImpl(
      "configurable value 1",
      "2",
      "3"
    ))

    true
  }

}

Finally we get to our first service interface. This example service promises to doSomeStuff and identifies itself to the registry with its own class manifest. Its companion object similarly identifies itself as belonging to this service with the manifest, and now has an apply() method that will retrieve whatever instance of it is defined in the registry.

trait ExampleService extends Service {

  final def serviceManifest = manifest[ExampleService]

  def doSomeStuff(): String

}
object ExampleService extends ServiceCompanion[ExampleService]

An implementation of our service interface takes in some configuration parameters, which presumably would come from the conf file or some other config mechanism. In this case we just concatenate the values we're provided together, comma-separated.

class ExampleServiceImpl(
  conf1: String,
  conf2: String,
  conf3: String) extends ExampleService {

  def doSomeStuff(): String = {
    s"$conf1, $conf2, $conf3"
  }
}

Now we have enough infrastructure to use our service in a real app. For simplicity's sake we just execute our bootstrap and call our service, asserting that it returns our configured values.

assert(ServicesBootstrap.registered)
assert(ExampleService.doSomeStuff() == "configurable value 1, 2, 3")

The juicy bit, and most of the point of DI, is mocking services out to exclude their logic from tests that have nothing to do with them. Here we are setting up a mock service that will return whatever simple value is passed into it at creation time, so that we can control for usages of ExampleService in other business logic. (note that normally we wouldn't go about this by creating an actual mock class, we'd use something like Mockito or ScalaMock to generate it for us, that's just outside the scope of this post.. in fact I'll do one later on that).

class ExampleServiceMock(result: String) extends ExampleService {
  def doSomeStuff(): String = result
}
object ExampleServiceMock {
  val doSomeStuffResult = "wahoo!"
}

Speaking of other business logic, here's a snippet of some that does who-knows-what and also happens to call ExampleService.doSomeStuff, eventually returning it as the result of its operation.

object SomeBusinessLogic {
  def operation(): String = {
    // do some business logic 
    val stuff = ExampleService().doSomeStuff()
    // do some more business logic
    stuff
  }
}

To eliminate the chance of something going wrong with ExampleServiceImpl during our test (i.e. to make our test as narrow and specific as possible), we register an instance of ExampleServiceMock that will return exactly the result we expect, regardless of what configuration or other changes are made to ExampleServiceImpl.

class SomeBusinessLogicTest extends FunSuite {

  val exampleService = new ExampleServiceMock(ExampleServiceMock.result)

  test("some business logic operation") {
    assert(SomeBusinessLogic.operation() == ExampleServiceMock.result)
  }

}

So that's pretty cool, but it has some serious drawbacks, chief among which is that we have just destroyed our ability to run tests in parallel. If some other test also registers its own instance of ExampleServiceMock we will have a potential race condition if that test and this test are executed in parallel, possibly causing flappy tests and indeterminate behavior. The net is that this is a neat and cheapo toy for quick and dirty side projects, but probably a bad thing(tm) for an real efforts, just as Mark Seemann says.

Sunday, December 8, 2013

item 0

System.out.println("hello, world")