Our thoughts, knowledge, insights and opinions

Typelevel ecosystem: a high-level overview

In this post, we’ll look at what the Typelevel ecosystem looks like in 2018, and how its various libraries interact with each other. In particular, we’ll focus on how we can compose some of these libraries to build a complete application, in a purely functional fashion.

This will not be a tutorial for Cats (there will actually be hardly any code here) - there’s plenty of learning material for that (linked at the end of this post) - but a high-level overview of the ecosystem and the way its pieces interact with each other.

Before we look at the libraries, however, we need to know what Typelevel and Cats are.

typelevel

typelevel logo

typelevel.scala is a community built around independent open-source software. It focuses on pure, typeful functional programming in Scala, as well as approachability of the libraries built by its community, and the inclusivity of the environment.

Most of Typelevel’s online and offline activity involves maintaining open-source functional programming libraries, giving talks on related subjects and organizing events for the community.

The flagship “product” of Typelevel is Cats.

cats

cats logo

Cats is a Scala library that provides core building blocks (abstractions) for libraries in the ecosystem - various type classes, data types and some syntactic enrichments. These abstractions allow libraries from the ecosystem to interact with each other, as long as their data types and type classes conform to the interface defined by Cats.

If you want to get a general idea of how that works in an application, I’m talking about using some of these building blocks in my Fantastic Monads and where to find them talk.

Now that we know more about Cats, let’s add one more core building block to the bigger picture:

cats-effect

Another Scala library from the typelevel umbrella is cats-effect, which extends Cats with some additional type classes and data types. It’s a relatively new project focusing on wrapping effectful code in a referentially transparent context (like an IO monad), which makes side effects easier to reason about. It also provides some asynchronous primitives like Fiber, and an implementation of IO.

cats-effect will be an important building block in an application on a typelevel stack, as it’s extensively used in some of the libraries we’ll look at later in this post. Using the type classes provided by the library, we can completely deattach ourselves from the effect type we’ll use to handle I/O operations - we can make the decision in a single place (like object Main), and make it apply to the whole application at once. These type classes also enable us to make effectful computations (like getting the current time, or reading a file) referentially transparent, which wouldn’t be possible if they were computed in e.g. a Future.

Let’s talk about referential transparency with cats-effect by using an example of computing the current time.

A side-effecting function for getting the current time could be System.currentTimeMillis(). If we were to get the time at two different points on the timeline, we might want to do:

We’re using a helper function, time(), to get the time - but it doesn’t explicitly say it does any side effects. So let’s wrap it in a Future, as most people would do.

This would work, but Future breaks referential transparency - because if we were to do:

The printed value would be always 0. We shouldn’t have to worry about such cases, so let’s replace the use of Future with Task.

Now the printed time difference will be whatever amount of milliseconds it took for x to be calculated. Imagine this snippet was placed inside a Tagless Final algebra:

That way we could use the Sync[F] instance to delay the effectful computation getting the current time (in a tuple with the actual result of calculateSomething(args)).

A web application

A typical modern web application usually has to perform some of these tasks:

  • handle HTTP traffic
  • request/response streaming
  • serialize/deserialize JSON content
  • validate incoming data
  • connect to different applications (e.g. in a microservice architecture) via HTTP(S)
  • save/read data to disk/a data store

Let’s imagine we need to do all of these. Here’s a raw approximation of what that would look like:

Application structure

If that looks like an oversimplification, it’s because it is one - we left out the streaming part (an exercise for the reader). We also didn’t show the responses for any calls, but you can assume there would be a response for every request.

The diagram isn’t specific about any tools we’re going to use to handle the aspects of the application, so let’s talk about a few libraries that are going to help us out:

cats itself

cats logo

You didn’t think we were going to skip this one, did you? As mentioned before, cats-core contains more than just basic type classes for the libraries - it also provides data structures. One of them is cats.data.Validated, which has been vastly covered by numerous talks and blogposts alike. The only thing you need to know for now is that Validated[E, A] is like an Either[Invalid[E], Valid[A]] that’s built with error accumulation in mind - if E has a Semigroup type class instance, you can compose multiple Validated[E, A] values into one.

There’s more to it than just error composition, but for now it’ll suffice.

circe

circe logo

circe is a JSON library built with Cats. It provides utilities for working with JSON values, including parsing Strings to a JSON value, automatic conversions between case classes/sealed trait hierarchies and JSON, and more. It provides type classes for encoding and decoding JSON values, as well as instances of commonly used type classes from Cats.

fs2

Sadly, no logo here.

fs2 is a streaming IO library. In English, that means it provides you with an abstraction of a stream that, when ran, will produce (or consume) values wrapped in an effect type of your choice - be it IO, Task, or any other effectful monad (that has a Sync instance, from cats-effect). Also included: utilities for writing/reading files in a streaming fashion.

http4s

http4s logo

http4s provides a HTTP server (as well as a client) built on top of fs2 and cats-effect. It provides a routing DSL, support for streaming HTTP requests/responses, type classes to handle request/response entity encoding/decoding, and there’s a submodule for circe that instantiates these type classes for types that have appropriate circe instances (which basically means that using circe with http4s is as simple as adding an import).

doobie

doobie logo

The last library we’ll be using is doobie - a purely functional JDBC wrapper. It’s written with cats-effect as well (surprise), and it allows you to make the execution of SQL queries referentially transparent with ease.


Given the descriptions of these libraries, let’s see an updated approximation of our application’s structure:

Application structure

As you can see, we placed Circe near the places where we’d handle JSON. http4s will handle our HTTP requests, as well as client calls we’re going to make to external services. We’ll validate incoming data using cats.data.Validated, and the persistence of data will be handled by Doobie.

As http4s is built on fs2, we’ll use that library in a less direct way.

Case study: counting page views

As every sensible explanation of a topic should, we’ll need an example for our application. Let’s imagine we’re working on a service that will count page views.

We will look at this example and see how we could implement JSON (de)serialization, HTTP routing, validation and persistence for it - we’ll skip client HTTP calls, as that works pretty similar to libraries other than http4s - you can check for yourself in the http4s guide.

The API specification that we got says we should implement a counting endpoint:

Now, an endpoint using the http4s DSL could look like this:

For each POST request to /views, we’ll try to get a PageViewed object from its body. Then we’ll forward the object to the checkAndSave function, together with the optional ip we got from the HTTP request.

In case the saveView function doesn’t inform us about any errors, we’ll assume everything went well and return the result’s Valid value in JSON with a 201 Created status code. Otherwise, the errors will be handled (let’s assume validatedToJson will just provide a 422 Unprocessable Entity response).

Let’s look at what checkAndSave should do.

Validation

There are a few things we need to do with the request before saving the pageview to our database.

First of all, we could make sure that the tracked page actually exists (note: assuming it’s publicly accessible) - e.g. by asking our database if we’ve already tracked views for it, and making a request to the page otherwise.

As seen in the example, the path to the document that we’re going to count the views for contains query string parameters - the spec says we should normalize the path by removing all of them - we wouldn’t want to count views for distinct tracking IDs separately.

Note: in the real world, you might actually want to preserve some of the query string parameters (depending on how a website is configured) - that’s where some websites pass the identifier for the content to be displayed.

You can also see an authorization header. It’s a JWT (JSON Web Token) - if you decode it (for example, by pasting it on the website linked), you can see that its payload the sub field with the value scala-lang.org - that’s the hostname of the website requesting tracking a pageview. We’ve written about JWT before, if you’re not familiar with that technology. For now, all you need to know is that, given a JWT like above, we can confirm whether it was issued by us, and use the payload it contains - even though the payload is only encoded with base64, which makes it trivial to read for anyone who sees it.

Here, we’ll use the JWT to identify which website’s article got a view.

If you want to handle JWTs (and password hashing, plus different cryptographic things) in a pure FP fashion, check out the tsec library (it even has a http4s module).

We know what page was displayed, we know the hostname of the website it was shown on, but we’re also asked not to count views from the same IP twice. We can try to get the IP the request was made from by extracting it from the X-Forwarded-For request header, or, if not available, by getting the request’s remoteAddr.

Note: in reality, an attempt to ensure uniqueness by checking the IP will incur massive data loss - chances are, the IP you’re going to get is shared by a whole building, or even a whole district! Tracking views would be more accurate if you generated a cookie (or a localStorage field) once per user, and identify users by the value of that cookie. Then you’d need to worry about having a cookie policy, so we won’t be doing that in this example, to ease the pain.

To summarize: the verification and transformations we need to make:

  • validate the JWT
  • ensure the date field in the body is parsed correctly
  • check if the view didn’t happen in the future ;)
  • strip query parameters from the path field
  • get the client IP - we won’t track requests without one
  • ensure the page exists on the hostname in the JWT
  • make sure we haven’t saved a view for the given (ip, path, hostname) parameters

…and if all assumptions are correct, we can save the view. Yay!

Note that we skipped the validation of the JWT in the http4s example - if you’re curious about how that could be handled in http4s, you can look at the aforementioned library tsec. Here, we’ll just assume the controller provided us with the hostname extracted from the JWT.

The actual signature of the checkAndSave function would then look more like this:

Note that we aren’t using Future or IO or Task explicitly in these examples - it’s just F. To find out more about how that works, you can get familiar with the Tagless Final pattern by reading our blogpost or watching one of Luka Jacobowitz’s talks on the topic.

Our error type could be defined as:

Given the signature and these error definitions, we can implement the checkAndSave method.

If this looks like cryptic writings of a possessed madman, don’t worry. We’ll look at each piece now:

  • ValidatedNel[PVError, String] is either a Valid(s: String) or an Invalid(e: NonEmptyList[PVError])
  • we first validate the IP by checking if it’s there (clientIp.toValidNel(PVError.noIp)).
  • in case of success, make a call to the checkIp function (which will return an F[ValidatedNel[PVError, Unit]]). Afterwards, we would have F[ValidatedNel[PVError, ValidatedNel[PVError, String]]], so the most reasonable thing would be to call .map(_.flatten) - but Validated doesn’t have flatten, so we call andThen(identity), which will essentially flatten the nested value to a single ValidatedNel[PVError, String].
  • in case of failure, we’ll end up with our errors being wrapped in the F context (that’s what traverse would do here).

And that’s it!

…for validating the IP. Let’s look at the way we check the path.

  • First, we normalize the path in the first statement of the method (dropping all the query parameters).
  • Then, we create a pathValidationF by making a call to pageExists (which would, under the hood, check the DB and potentially the website for the page’s existence).
  • The result of that function is F[Boolean], so we’ll call Validated.condNel and pass that boolean - which is all happening inside the function passed to map. In case of false we’ll get a NonEmptyList(PageNotFound), in case of true we’ll get the normalized path back.

These two validations end had to be made first, as that requires an effectful check (denoted by the F[_] type), and we can’t easily combine effectful and “pure” checks together. Having checked the effectful ones separately, we can proceed with further validations.

Note: we skipped uuidF here. It doesn’t actually validate anything, but its implementation could be read as “at some point, generate a random UUID in the context of F”.

We pack the effectful checks in a tuple and call traverseN(f) on it - you can think of it as “run f when these four are done, and return the result in the same effect”.

The function we pass takes the values from inside our (...)ValidationFs, and combines them with other validations to build another tuple.

This time, the tuple will consist of elements that are ValidatedNel[PVError, A], each having its own A. We’ll call mapN, a function similar to traverseN - but what this one will do is ensure all the validations in the tuple are Valid, and call the provided function (in this case, PageView.apply - returning a PageView) with them. If any of the validations doesn’t pass, mapN will collect all the errors together. So the result of mapN(PageView) is of type ValidatedNel[PVError, PageView].

Note: the only non-effectful validation that we make here is checking whether the pageview happened before the current point on the timeline, as we’ve hardcoded the rest to .valid - but this will not always be the case, as you might want to check e.g. whether the length of a string matches the configured limits. One might argue that it’s actually effectful because it depends on currentTime, but the way we’re using it to validate the passed timestamp doesn’t involve any side effects per se.

Having either the errors or a PageView that we can save, we can call traverse on that ValidatedNel, passing a persisting method as a parameter. That way, the whole chain of (..., ...).mapN(PageView).traverse(...) will give us either an Invalid with all the validation errors, or a Valid - wrapped in F[_] in both cases.

Let’s look at the part where we combine effects and validations again:

Because there’s no function flatMapN, we still need to flatten the result of the traverseN call we made on our effectful validations - otherwise we would have F[F[ValidatedNel[PVError, Unit]]]. So that’s what we do! - hence the .flatten at the end, and the last call in the function.

To sum up, the function will handle its arguments in a way that’ll make it return a value inside the context of F - which will either be a list of validation errors (guaranteed not to be empty), or a Unit.


How does this use the elements of the ecosystem?

First of all, we extensively used Validated - a data type from Cats - to combine the potential errors that we could get from checking the input (using mapN).

We also used traverse and traverseN from the syntax for the Traverse type class in Cats to “flip” our wrapped types. In a simplified example, List(1,2,3).traverse(x => Future(x)) would give us a Future[List[Int]]. traverseN does a similar “flipping”, but on N inputs.

At last but not least, we used the Sync type class from cats-effect - for an expression like Sync[F].delay { UUID.randomUUID() }, if we specified F = Task, that expression would be equivalent to Task { UUID.randomUUID() }. If we said F = IO, it would be IO { UUID.randomUUID() }, etc.

Now that we’re at Sync and cats-effect, let’s talk about persistence.

Persistence

In the validation code above, we only used one method related to persistence - repository.persist.

Assuming our data store is an SQL database - let’s say, PostgreSQL - the simplest (perhaps not optimal, though) definition of the table for pageviews would be as following:

Table: page_views

text is the default string type here - but in the real world we would rather use a length-limited varchar type instead.

Given the table definition, our persist function could be implemented in the following way, using Doobie:

Quite verbose, but (given a proper Doobie transactor) this function will give us a database insert suspended in F - similarly to Sync[F].delay { actuallyInsert() }.


For persistence, the only things we interacted with were Doobie and cats-effect - because the transactor will use a Sync[F] instance underneath.

Summary

We looked at a specification of a HTTP endpoint and implemented some of its implementation using building blocks from the Cats, cats-effect, circe, http4s and Doobie libraries, learning how they can be composed to build a working HTTP service.

Of course a blogpost can’t dive into any of the mentioned libraries deeply enough to explain everything we just saw in detail, but I hope it’s enough to get you to click the links, read and experiment yourself :)

However, what would we do without

Shameless self-advertising

I’m going to lead a full-day workshop focused around building a similar application with the building blocks mentioned in this post at this year’s ScalaWave conference edition. We’ll cover all the steps to build a few fully functional endpoints like the above, including streaming HTTP requests/responses and more complex database logic. We’ll also spend a good portion of time discussing commonly used patterns that’ll help us write purely functional software using the typelevel/cats ecosystem. It’ll be fun!

follow us on Twitter to keep up with updates or get tickets for the conference now! EUR or PLN

Here are all the links from the blogpost, plus some more learning resources:

Project websites

learning resources

You like this post? Want to stay updated? Follow us on Twitter or subscribe to our Feed.