Jezen Thomas

Jezen Thomas

CTO & Co-Founder at Supercede.

Make Your Types Smaller

Dragons lie at the boundaries of systems.

But where those boundaries lie is too often in a software developer’s blind spot.

A clear system boundary in a web application is a form. We know not to trust user input, and so we diligently validate — both client-side and server-side — the user’s submission.

Once the submission has crossed that threshold however, I think we lower our guard.

The database also exists beyond a system boundary. Programming language peculiarities should probably not leak into the database, and vice versa.

It is generally understood how to use techniques such as smart constructors to ensure a value floating through the system is always valid, but I think when to use a technique like this is something that programmers have difficulty developing an intuition for.

It may be because of the adjacency of the database conceptually to the data types that model the values the database will eventually contain; but I have noticed — across several open-source projects — that primitive obsession creeps in when defining persistent models.

It’s not an unreasonable thought pattern: you know you want to model a User in your system. Your User model will be persisted in the users table. Your User has a name field, which will be persisted in a column with a VARCHAR (or TEXT) type.

So you define your persistent model this way:

User
  name Text
  dateOfBirth Day
  email Text
  UniqueEmail email
  -- etc…

The problem of course is that the domain for the name field (and also the email field) is much larger than we want. Even taking into account falsehoods programmers believe about names, a name is not just any text value. We need to enforce some rules. We need to reduce our problem space. We need to make it smaller.

For example, we wouldn’t want a user with an empty string for a name:

User
  { name = ""
  , dateOfBirth = -- …
  -- etc…

We also wouldn’t want this monstrosity1:

User
  { name = "V̥̝̣̤͇̮̣̦ͮͬ̇͌̕͟l̲̩̠̬͆ͪ͒͌̿ͧ̅͊͘a̷̙̾́̐͌̀ͥ̂̅͝ḓͤͣ̅͂̂ͩ̆͡ò̲̙͙̗̳̻̠̀l̥̮͈̫̻̤̞̿͛ͧ̄͒͋̅̂ͩ͘f̸̮̩̫̺̾͊̌̌ͫ̀͟ͅ ̥ͪ͋͞P̟̻̝͕̩͎̞ͭ̾ͧ͗̆̉u̶͍̱̭͎̓͋̓͂͗ͧͯ͡ͅͅt̸̯̜̟̥̋ͬͦ͂͆͘͟l̯͉͉̤ͣe̱̟̮̖̋ͦ̌͒͂͆ͪ͌͘r͚͛̒͗̔͊̚͘"
  , dateOfBirth = -- …
  -- etc…

It’s all very well telling ourselves that this wouldn’t happen to us because we are running a comprehensive validation function when processing the form submission that ingests this data, but the reality is that in a non-trivial business your database is going to have more than one entry point. Expediency and technical debt are facts of life.

A persistent model with a bunch of fields representing primitive values like Text is a code smell. When we see it, it’s telling us that we should make our types smaller. That is to say, more specific.

Perhaps what we want instead is something like this:

[st|
  -- Our User type with more specific types in its fields
  User
    name Username -- This type is smaller!
    dateOfBirth Day
    email Email
    UniqueEmail email
    -- etc…
|]

-- Introduce a Username type which wraps a text value
-- Keep this in a different module, and be sure not to expose the constructor!
newtype Username  = Username { unUsername :: Text }

-- The "smart constructor" which enforces validation rules
-- Only expose this one!
mkUsername :: Text -> Maybe Username
mkUsername t
  | t == "" = Nothing
  | failsSomeOtherValidationRule t = Nothing
  | otherwise = Just (Username t)

-- Teach our program to marshal values over the system boundary
instance PersistField Username where
  toPersistValue = -- unwrap value
  fromPersistValue = -- parse value into narrower type

It may seem more expensive to use a more specific type because you then need to take the time to teach your program how to marshal values across that application/database boundary, but I think this one-time cost is cheaper than having to code defensively in perpetuity.


  1. Fun Fact: For quite a long time, Twitter happily accepted Zalgo input in tweets, allowing anyone to turn the timelines of other users into an incomprehensible mess.↩︎