How I write Scala in 2022
Historically Scala has been infamous for being so expressive that everyone ends up writing their own dialect, the result being that interoperability between libraries can be an issue and onboarding new developers can be tricky. Things have mercifully calmed down a lot recently but I still thought it’d be interesting to share how I like to write software with Scala in 2022.
Disclaimer
This is going to be a very opinionated article so I want to make it clear that I’m not suggesting that everyone should do these things, or even that all of these ideas are good! I am looking forward to coming back in a few years and seeing how much of this article I now disagree with.
The code examples provided here are essentially psuedocode because I didn’t try compiling any of them. It also goes without saying that very few of these conventions are truly mine per se, rather they’re things that over the years I have learned from sensible people.
Libraries
I’m a big fan of the Typelevel stack so have a fairly prescriptive set of libaries I reach for when building any applications:
- Cats and Cats Effect for functional programming primitives.
- Doobie for connecting to Postgres, soon to be replaced with Skunk.
- Log4cats backed by logback for logging.
- MUnit and Scalacheck for testing.
- FS2 for concurrency & streaming.
- FS2 Kafka and Vulcan for Kafka and Avro.
- Natchez for tracing.
- http4s for http.
Build tools
I keep it simple here and use SBT. I’ve had some experiences with Bazel but I like to expend as little effort thinking about build tools as possible and I find it is easiest to follow the crowd. I try to keep the number of SBT plugins low so that they’re unlikely to prevent upgrades and to keep the scope of what SBT does fairly limited.
Module structure
I’m not very imaginative with application architecture. I tend to structure applications by
stitching together the following types of module, where a “module” is a Scala object
or trait
.
Client
: HTTP clients calling external APIs with very little extra logic.Storage
: Modules that define database queries to store & retrieve things.Service
: Modules that apply application logic on top ofStorage
orClient
Routes
: HTTP routes that call out toServices
and format their responses.Consumer
: Kafka consumers that also call out to other modules to process messages.Publisher
: Background processes that relay information from the database to third parties.
I try to keep the core logic of the application as separate from any IO as possible so
will often also have modules consisting of objects that contain
pure functions. An example of this
might be functions to determine whether a customer is eligible to sign up to a product based
on the state of their account. A Service
would collect together the state from Storage
objects
and pass it along to the pure functions that make the decisions.
This very formulaic process of following a strict recipe to create an application is something historically I didn’t like doing because it felt like it took the creativity out of the job but I like that having strong conventions takes (most of) the guesswork out of where to put things or what to call them. Breaking these conventions when it makes sense is, however, encouraged.
Directory structure
I organise the packages in each application by data (like customers, accounts) rather than layer (services, storage) so it is easy to see at a glance exactly what kind of things the application does. Here’s an example for a hypothetical event booking application that allows events to be created & signed up for at venues which are consumed from Kafka:
src/main/scala/uk/tomverran/
|_ events/
|_ data.scala
|_ EventRoutes.scala
|_ EventService.scala
|_ EventStorage.scala
|_ customers/
|_ data.scala
|_ CustomerRoutes.scala
|_ CustomerService.scala
|_ CustomerStorage.scala
|_ venues/
|_ data.scala
|_ VenueConsumer.scala
|_ VenueStorage.scala
A nice property of this structure is that as the application grows it will tend to do so horizontally, accumulating more directories that correspond to more features, rather than vertically through files becoming longer or directories accumulating more files.
Code structure
Each Storage
or Service
module consists of a trait and an anonymous implementation in the same file.
Storage
modules are implemented in terms of ConnectionIO
which means that multiple operations
across multiple storage modules can be combined to occur to within one database transaction.
trait CustomerStorage[F[_]] {
def store(customer: Customer): F[Unit]
def find(customerId: Customerid): F[Option[Customer]]
}
object CustomerStorage[F[_]] {
val instance: CustomerStorage[ConnectionIO] =
new CustomerStorage[ConnectionIO] {
def store(customer: Customer): ConnectionIO[Unit] = ???
def find(customerId: CustomerId): ConnectionIO[Option[Customer]] = ???
}
}
Kafka consumers typically are structured as an fs2 Stream
enclosed in an object, so that
dependencies can be passed like so:
object VenueConsumer {
def apply[F[_]](
kafka: KafkaConsumer[F, Venue],
venues: VenueStorage[F]
): Stream[F, Unit] =
kafka
.stream
.evalTap(message => venues.store(message.content))
.evalMap(_.offset.commit)
}
Common data structures live in data.scala
. HTTP requests and responses will usually have
their own data structures but I’m fairly relaxed about using a common case class
across the layers within a package
if all the fields happen to be identical as long as it is easy to split them when they diverge.
Unit tests
Absolutely everything other than Main.scala
should have meaningful unit tests.
I don’t often write integration tests and when I do I still use a unit test framework.
Nothing makes my heart sink more than updating a lot of unit tests only to discover
that the same functionality is also tested five times more slowly in a bunch of neglected e2e tests that
fail 50% of the time and consume all my RAM.
An example test directory for the above booking application would look like this:
src/test/scala/uk/tomverran/
|_ customers/
|_ CustomerRoutesTest.scala
|_ CustomerServiceTest.scala
|_ TestCustomerService.scala
|_ CustomerStorageTest.scala
|_ TestCustomerStorage.scala
I do my very best to keep each test isolated by only ever using mock, in-memory implementations of any
dependencies in tests. These mock implementations are always called TestXXX.scala
and live alongside
the tests for the real implementations.
Most of the core application logic should be tested without needing to use any mocks due to it consisting of objects that contain pure functions. The mocks are used when testing how the various modules are wired together. These tests usually cover things like error handling, retry policies and similar.
In the above example CustomerRoutesTest
will use the mock implementation of CustomerService
called
TestCustomerService
while CustomerServiceTest
will test the real implementation of CustomerService
using the TestCustomerStorage
mock. Mock implementations of modules should have as little logic as possible
but it often isn’t possible to eliminate all logic and I tend not to worry too much about some logic in mocks.
All this mocking of course has a cost but it keeps the tests fast & focused. The goal is to keep the friction of writing unit tests as low as possible by always having a mock implementation of any given module available with a fairly consistent feature set.
Mocks of stateful modules are typically backed by a cats-effect Ref
and will also
often contain extra helper functions, for example to record which functions were called:
trait TestCustomerStorage[F[_]] extends CustomerStorage[F] {
def calls: F[List[TestCustomerStorage.Call]]
}
object TestCustomerStorage {
sealed trait Call
object Call {
case class Store(customer: Customer) extends Call
case class Find(customerId: CustomerId) extends Call
}
def apply[F[_]: Sync]: F[TestCustomerStorage[F]] =
(
Ref.of[F, List[Call]](List.empty),
Ref.of[F, List[Customer]](List.empty)
).mapN { (calls, customers) =>
new TestCustomerStorage[F] {
def find(customerId: CustomerId): F[Customer] =
calls.set(_ :+ Call.Find(customerId)) >>
customers.get.map(_.find(_.id == customerId))
def store(customer: Customer): F[Unit] =
calls.set(_ :+ Call.Store(customer)) >>
customers.modify(_ :+ customer)
def calls: F[List[Call]] =
calls.get
}
}
}
Property based tests
While I like doing property based testing I don’t think I’m very good at it. I often find that rather than writing true property based tests I use generators to let me randomise fields I don’t care about in unit tests and then set one specific field to a known value, as in a normal example-based unit test.
Scalacheck provides both Arbitrary
and Gen
data structures for generators, I tend
to exclusively use Gen
just to keep life simple.
When writing generators for data structures I strictly follow the rule that they must
have the same name as the data structure they’re generating instances of and they must go
in the same package as the data structure, nested within a Generators
object due to Scala 2
not allowing package level val
s.
Sometimes it is useful to have generators that aren’t completely random and these should always be named very explicitly so that it is clear what each test using them is assuming about the input data. Nothing is worse than changing a generator only to discover that 50 tests were assuming a particular field would contain a particular set of values.
For example:
src/main/scala/uk/tomverran/customers/data.scala
package uk.tomverran.customers
case class Customer(name: String, age: Int)
src/test/scala/uk/tomverran/customers/Generators.scala
object Generators {
val customer: Gen[Customer] =
for {
name <- Gen.alphaNumStr
age <- Gen.chooseNum(0, 100)
} yield Customer(name, age)
val over60Customer: Gen[Customer] =
for {
name <- Gen.alphaNumStr
age <- Gen.chooseNum(60, 100)
} yield Customer(name, age)
}
I don’t think this directory structure is a particularly good convention but I do think that it is important to have some kind of convention for where to define generators otherwise it is easy to end up with every unit test re-implementing a bunch of generators and that hurts a lot when adding or removing fields.
Examples
citysocials-api was a side project I briefly took on in 2020 to make a Meetup clone. I gave up on it very quickly so it doesn’t have much implemented but it was the first project where I applied some of these principles so might be a useful source of further code examples.