Datomic Conf: What Datomic does to REST
video link (10 minutes)
I'm Dustin Getz. I'm going to talk about Datomic, I promise, but first let's talk about REST.
You've all read the quote; Pete hunt is an engineering manager at Facebook formerly at Instagram and he was on the react team. So this quote was after React came out but before Graphql and Relay.
Pete Hunt: “REST does a shitty job of efficiently expressing relational data.… I mean REST has its place. For example, it has very predictable performance and well-known cache characteristics. The problem is when you want to fetch data in repeated rounds, or when you want to fetch data that isn't expressed well as a hierarchy (think a graph with cycles -- not uncommon). That's where it breaks down. I think you can get pretty far with batched REST, but I'd like to see some way to query graphs in an easier way.” (April 2014)
So, he's talking about the problem with rest. he's talked about fetching data in repeated repeated rounds. What is he really talking about? He's talking about doing I/O, he's talking about making requests over network.
That's the hard part about REST, you have to do a lot of requests. If you express a graph, in REST, it looks something like this:
here's like a regular old REST-y thing, and you can see, if I want to traverse an edge of the graph to resolve this node, well i need to make another request. You know, I'll paste this in the URL bar and navigate it further. So we've got this graph; our business apps are making all these requests. So if I'm Facebook i want to get like my friends events, or whatever. It's a lot of requests. So this is what's difficult about REST.
I/O makes things complicated. We need to think about it, we need to reason about the performance, do we batch the request, do we not batch the request. We have to fix bugs, there's error handling to consider, there's all these problems.
But let's consider a REST app like a CRUD app, like facebook.com is a CRUD app, we all we all know how CRUD apps work. You Read, ... Create Update Delete. Reads, are really complicated in an app like Facebook. In any UI, you're displaying a lot of data, you need to fetch the right data. You don't do a lot of create update or delete. In facebook, you make a post, edit a post, that's the one side effect over this whole mass of UI which is displaying all this data. So there's a lot of complexity in reads, that we can probably do a better job of getting rid of.
The whole point of Datomic is to get rid of read complexity. If you remember Stu's slides this morning, he said, "You can answer queries in memory from an automatic cache." So what that means is, you can make these queries, you don't really care about the performance cost. You can make a query, then get the result back, then you can have an if-statement in your code that says "if this then do this query or if that then this other query," then you can make another query, you don't care that you made two queries. You don't care about batching them. Because it came out of cache anyway. There really wasn't any I/O there. You're just programming against local data, when you use Datomic with a warm cache.
If you want an analogy, this is kind of like git. Git has an immutable cache. The first thing you do is you go to github.com and you say, "Give me the cache." and now everything you do, you can do complicated things like look at the history. That comes out of the cache locally. You can do things like bisect, to say, "find the commit that has the bug in it," that's a really complicated query. You can do it locally. This is exactly how Datomic works. You can do really complicated queries locally which means you don't care about doing a lot of queries, you really can just program as if you're programming against local data.
3:03 When we're building applications like facebook.com, which do a lot of read queries, what if we're able to push the cache into the browser somehow. So that means, on an app like facebook.com where the vast majority of it is reads, all of that no longer does I/O. You don't need to do any ajax requests to initialize your application. More importantly, the code of your application, doesn't have to do anything asynchronous anymore. The data is there the data is inlined from cachge, you don't need promises, you don't need callbacks, you don't even need core.async. Everything is already there, you can program against local data.
3:39 So what might this look like when we have a REST service and we want to program against local data? Here is the REST service I showed you, where traversing an edge of your graph, your REST api needs to make a request. But you can see I've got this little cache thing here.
So inline, in this API, we've actually recursively resolved the graph. So in this particular case we've inlined the entire graph, which is probably pretty slow but we'll talk about that. So this key right here, for "API communities" corresponds to this cached object, so it's already here, it's recursive. Which means we can do arbitrary queries which would usually be doing network requests, we can do it all against local data. So this really substantially changes the programming model we have, when building UIs.
4:30 Let's look at some code of what our UI code might look like, in ClojureScript.
So this is an expression. If you're familiar with what React does, it gives you "UI as a value," so you can just build up an expression that returns a value of DOM. So if you look deep into my DOM i see well there's a :ul, that's a piece of DOM. So in this API, we start with the root of our graph, just some entry point node in the graph.
resolvemethod says, "go and make a request and get the next node." So go and request the root node. But the thing is, if we're operating off of a local cache, this isn't asynchronous. This is actually synchronous code and it will invoke the function right away. There's no performance cost, we don't we don't care if it actually made a request or not. Which means it's composable.
So here we say, We have the root, so let's go down and traverse more edges of our graph. So here's another resolve call.
5:37 Again, if we have local data, this did not touch network, it just returns the object we request in the graph or returns the next node right here inline, so we can build up our UI. We can perform arbitrary functional transformations inside of our UI code that normally would have to do IO. So you can build up your app in a pure functional fashion using the combinators we're used to like map and filter and everything. Our components like combo boxes or whatever don't do IO any more because it came out of local data.
Let's talk about the filter predicate. In this API right here, there's actually an enormous amount of data in line because it inlined the entire graph. Btw this is the Seattle service that comes with the Datomic install.
What matters is we need to expose a knob to reason about what data gets inlined into our cache and what data doesn't. In this case there's a predicate that you pass into it that says "I want any node of depth 5 inlined," but you could do something really smart there. For example you can say, "When I refresh the application let's keep track of all the requested made" and that turns out to be exactly the data that you need to inline. So you can actually make it so that your app does zero requests for read operations.
This research is called Hypercrud, thank you that's my material.
7:35 The question is, HTTP already has a caching layer with etags and stuff like that, why is this different? So one thing that's different is this response is immutable. You can see in all the URLs generated by this response, there is a notion of time. There's a TX value here. So that's actually straight out Datomic. Say we say we just ran this and we didn't have the inline cache, we have a spider that hits all these URLs and it just crawls it using the browser cache to do this. That would work, except it only works because each of these URLs have an immutable value, otherwise you gave up consistency.
8:26 The question is, how do you generate this API on top of Datomic, how do you populate Datomic. So this particular example just exposes the Datomic entity graph, so anything you could do with Datomic REPL, like traverse entities by reference, you can do here. It's just Datomic ref type got exposed as URLs. It's not actually coupled to Datomic, it's just making the Datomic queries underneath so you could implement it on top of whatever data you want as long as it has a notion of immutability.
Okay one more question: so there are some side effects. There is some amount of async. We've only removed the read async code from our codebase. If you have a side effect, that will issue a request, the request will come back with a new TX value out of Datomic or whatever is enforcing immutability in your web service, and then you'll need to requery from the root of the graph, you'll need to requery the entire application. So that's one request. Ideally like you could have it say "I did a side effect, return the new TX value" then you say "Give me the entire graph that I care about so I can repopulate my cache" and of course Datomic makes that fast.