A simple solution to scalability problems: Event Sourcing
In the past months I've been playing around with Kotlin and Spring's event sourcing engine. To get to know it better, I build a really simple clone of Untappd. If you don't know Untappd, a tl;dr is: Foursquare for beers. You can checkin a beer, share it, rate the beer and see what your friends are drinking.
Why Event Sourcing
One benefit of event sourcing is that you can separate your application into small, loosely coupled concerns (e.g. microservices) and make them communicate through events. On the other hand, you need to be comfortable with eventual consistency.
For this task, I decide to go with a modular monolithic.
Code organization
All code is hosted on Github
As you can see from the image below, I created a package called domains
and one sub-package for each domain I was dealing with.
Each sub-domain contains its own controller, models, repositories and services, pretty much like a total independent service.
User
, Checkin
and Beer
are connected through user_id
and beer_id
on Checkin
but
there is no bond between these two classes on the code (not even on the database).
User class:
@Entity
@Table(name = "users", indexes = [
Index(name = "email_idx", columnList = "email", unique = true)
])
data class User(
@Column(nullable = false, length = 50)
var email: String = "",
@Column(nullable = false)
@JsonIgnore
var password: String = ""
) : BaseEntity()
Checkin class:
@Entity
@Table(name = "checkins", indexes = [
Index(name = "idx_user_id_on_checkin", columnList = "userId"),
Index(name = "idx_beer_id_on_checkin", columnList = "beerId")
])
@EntityListeners(AuditingEntityListener::class)
data class Checkin(
@Id
@GeneratedValue(generator = "UUID")
@GenericGenerator(
name = "UUID",
strategy = "org.hibernate.id.UUIDGenerator")
var id: UUID? = null,
@CreatedDate
var createdAt: LocalDateTime? = null,
@Column(nullable = false)
var beerId: UUID? = null,
@Column(nullable = false)
var userId: UUID? = null,
@Column(columnDefinition = "TEXT")
var description: String = "",
@Column
var rate: Int = 5
) : AbstractAggregateRoot<Checkin>() {
@PrePersist
fun created() {
registerEvent(CheckinCreated(this))
}
}
With this approach, the application can be broken down without much hassle.
Event Sourcing in action
Notice that last method on Checkin
? registerEvent
is a method declared in AbstractAggregateRoot
class. AbstractAggregateRoot
is a convenience class that helps us dispatch domain events to our context. You can also dispatch events without using AbstractAggregateRoot
(more on that later).
After a checkin in a beer, an event called CheckinCreated
wraps the checkin and is enqueued to be dispatched. Spring will dispatch it after a save
happens on the aggregate.
Capturing events
Spring has a nice approach to capture and process these events: @EventListener
annotation.
At this point, after we save the checkin, Spring dispatches our CheckinCreated
event to whoever is listening. We need to update the beer profile (increase checkin count, refresh beer rating, etc.) and update user's profile.
The beer domain has a listener called BeerCheckedInListener
and handles it through the fun beerCheckedIn(event: CheckinCreated)
:
@EventListener
@Transactional
fun beerCheckedIn(event: CheckinCreated) {
logger.info("Updating beer rating")
var beerId = event.checkin.beerId
var averageRating = checkinRepository.findAverageRatingFromBeer(beerId!!)
var beer = beerRepository.findById(beerId).get()
beer.averageRating = BigDecimal(averageRating.toString())
beer.totalCheckin += 1
beerRepository.save(beer)
}
what we are doing here is:
- get the beer id from the checkin
- calculate and set the new average rating for the beer
- increase the total checkin count
- save
and we follow a similar process on UserCheckinListener
(you can see it here)
Dispatching events without AbstractAggregateRoot
It is possible to dispatch an event without rely on AbstractAggragateRoot
. This class will dispatch events only after a save
is perform. But how do we dispatch an event in a delete
?
In that case we can use AbstractApplicationContext
First, we include AbstractApplicationContext
in our controller
@RestController
@RequestMapping("/checkin")
class CreateCheckinController(
val checkinRepository: CheckinRepository,
val beerRepository: BeerRepository,
val applicationContext: AbstractApplicationContext) {
then, during a delete
call, we make use of it
@DeleteMapping(value = ["/{id}"])
fun deleteCheckin(@PathVariable("id") checkinId: UUID): ResponseEntity<String> {
val checkin = checkinRepository.findById(checkinId).orElseThrow { RuntimeException("checkin not found") }
checkinRepository.delete(checkin)
applicationContext.publishEvent(CheckinDeleted(checkin))
return ResponseEntity.status(HttpStatus.NO_CONTENT).body("")
}
publishEvent
will dispatch our event to appropriate listeners.
Eventual consistency
Although this happens fast, it is not happening in a single transaction. The listeners are marked with @Async
, which means that we will process then in a different thread. If the application is under load and all CPU cores are in use, it might take some time to update its profile and users might not see their checkin right away. Though its unlikely to happen, you will experience slowness if the beer has lots of checkins because the time to calculate the rating increases. Not a big deal in the beginning but something to keep in mind.
CQRS
Every time you see "event sourcing", it might be accompanied by "CQRS"
What is CQRS?
As Martin Fowler puts:
At its heart is the notion that you can use a different model to update information than the model you use to read information
In most of applications, you will have a one-to-one relationship between database tables and application models, for example, a class User
maps to a database table called users
.
For complex systems, this might not be ideal for many reasons. Lets use our application as example.
We have a class named Beer
that maps to a table beers
. This beer has thousands of checkins, has a certain rating and there are many venues selling it. So if you want to build a "beer profile" you have to:
- query the
beers
table - calculate an average rating for the beer by querying
checkins
- query the
venues
table to cross check which/how many venues are selling it.
while your application is small, this might not be a problem, because you have X beers, sold by Y venues. But if your application grow, and you have 10 times X beers and 1000 times Y venues, query all this data, run calculation over checkins, etc this can become your bottleneck real soon.
One might think that cache this information might be a good solution, and it might be for a period of time. Rely on cache for system stability is not a good idea. If you lose your cache (say your Redis instance had a network problem, because trust me, it WILL crash sooner or later) your application will go down. You now have a single point of failure: cache.
A better solution would be CQRS. You keep writing the checkin to checkins
table, user information to users
and so on, but you introduce a new model called BeerProfile
.
By using event sourcing, every time a beer is checked in, you fire an event (CheckinCreatedEvent
, as we did). Then, a listener will asynchronously process this event, find the BeerProfile
corresponding to the beer, increase its checkin count, refresh the rating, updating the numbers on many distinct users drink it, and so on.
So from now on, when a user wants the beer profile, you don't need to cross check three different tables since all beer related information is in a single place: beer profile table.
Conclusion
CQRS and event sourcing might be good for your application but adding those concepts will also bring more complexity. New tables means more data, more expensive. Asynchronous method calls are harder to debug. Event sourcing might fail as anything else and eventual consistency can hit you hard.
Study and analyze if something is right for you, don't surf other people's wave just because everybody is doing.