Talk: a Hypermedia tool for making web apps in real-time (REST FEST 2018)
This is the transcript of a talk recorded at REST FEST in September 2018, given to an audience of backend developers passionate about Hypermedia APIs.
Hyperfiddle is a hypermedia based tool that uses all the stuff we've been talking about here at REST FEST to make an app builder. To get you interested, I am gonna jump straight to the video.
So, that's Hyperfiddle. [AUDIENCE: This is your tool?] Yes, this is a startup. We're gonna talk about what makes this possible, and what had to change in the world that wasn't there before.
What does R2D2 have to do with hypermedia? You guys remember this scene from Star Wars? What does he have to do? He has to find Princess Leia. How does he find Princess Leia? He says, okay Deathstar. Let me talk to your uniform data interface. Tell me what services you have. We've got the staff map. We've got the blueprints. We've got the garbage disposal schedule. Let's go to that, and see what actions we can take. What are the hypermedia controls available that we can use to stop the doors from closing?
So, essentially, all R2D2 is doing is following links and forms. We've been talking about this all day at REST FEST, so I'm not gonna go too much into it. But, it's very powerful.
If Hypermedia is so powerful, is it successful?
Do you guys remember Google Data APIs? Back in 2010-11? Back in the day, Google had this full hypermedia thing, and they retired it, without so much as a whimper. They never said anything it. Now it's completely legacy and deprecated. So, something caused this. Right? They had this great hypermedia system and they threw it away. Why would Google do that? Why isn't hypermedia working for them at scale?
Well, let's take this picture. We could probably break this down to links and forms. Right? In fact, I've drawn the boxes. Let's take this purple box in the middle. It could be a collection of links, and each link could go load each item, each notification. Right? And, each one of these is backed by a database query. Right? But, the problem with this is each time you follow a link, that's a network request. That's slow. That turns the radio on, on your phone.
Facebook does care about abstraction, obviously, because they have this huge product that has to run on thousands of devices. But, they care more about user experience. They care more about performance and that it loads instantly. So, this is why Facebook has like 2,000 engineers, right? It's because they were coding custom backends for each different app and mobile backend vs web backend for each different page. Each of these need engineers to maintain it and it's not abstract, but it's fast, and fast matters more.
So, 2014, I think is the year GraphQL came it. Pete Hunt was an engineer on the React.js team and later the engineering manager of Instagram web. We can see he's deeply acquainted with the problem that the previous slide demonstrates.
The data is not ... how do you query that with a SQL query? How do you generate ... like how do you batch that such that you load exactly, precisely the right data without over-fetching, because that costs money in third world countries on their phones when they're paying per kilobyte? And how do you do it quickly?
So, this is why Facebook needed GraphQL so badly. I think you see this at scale across any hypermedia API. The problems we have when we're trying to make a small API are different than the problems like a large, massive company has, when they have thousands of them.
So, in Hyperfiddle, the queries — they're not quite GraphQL, but they're the same idea. This is called a graph pull. You have this graph, and you pick a point, and you kind of shake it, and you say, starting there what do I need to pull out of it? So, you can see first, I would either generate a form based data, but it also knows that since this is a database query, it knows the schema of this. It knows that
:neighborhood/districtis of type reference. So, it's able to draw a link there automatically. That's pretty cool.
But the problem is, it needs to be fast. For this to get widespread adoption, it needs to be fast. Otherwise, no one will use it.
So, this is the bad version, where you have slow link following. You can see why you motivated the graph style, you just pull deeper into it. And, since its a reference type, you can kind of see there's an isomorphism between a reference and a link. A reference (in a data structure) lets you get from here to the next thing. Right? How is that different from a link?
The difference is how much time it takes to do it. Reference traversal is about in-memory data structures. So, we're talking on the order of nanoseconds to do a memory lookup. It does take some amount time, but it's very short. And, then link following, we're talking like a client-server round trip. So, at least 20 milliseconds with the speed of light, and probably more likely, 100 or 200 with all the computers in between. Then you multiply that by like, traversing a thousand links to hydrate the one page.
Anyway, graph pull and links are almost the same thing. That's the first interesting thing we take advantage of.
It also happens on the write side. Not only do you have to have a really complicated and precise read side request, you also need to be able to edit any different point of this enormous graph, and then transact that in one transaction. That's a hard problem too. If you're doing hypermedia and modeling your app as links and forms, and the form says, these are the fields you can fill out for this form. And you have this other link that goes to this other form. How do you get those to flatten into one database transaction? It's very difficult, and very complicated.
Simply matters. It is not enough to just be powerful. For example, XML is more flexible than JSON. XML can encode different data types that pass through without necessarily each intermediate system understanding them. JSON can't. JSON doesn't even have a date type. You have to do this weird out of band schema stuff. JSON is strictly less powerful than XML.
But, XML isn't simple. XML failed because it's so hard. No human looks at XML and says, "yeah, I wanna do this." It's very difficult. Hypermedia has the same problem. We talk about this great abstraction capability, but when your boss says "this needs to be done yesterday, why isn't this done yet?" The hypermedia approach loses. So, to even make our tool possible, it needs to fit the market need, which means it has to be simple. Simple wins.
And, GraphQL isn't. GraphQL is sort of simple on the query side, but what about all the code they don't show you? They show you this nice graph query, and then, well there's a resolver and then this mutation system, and we have to write all this code. And, it's not the type of code that we're happy about writing. It's not like wiring stuff into the UI and accomplishing business needs. It's batching, and caching, and eventual consistency bugs, and your user is just saying, I submitted the transaction, why am I seeing yesterday's data? Why am I seeing the same notification over and over and over? Remember back a couple of years ago, Facebook had this problem, and Twitter had it too. GraphQL doesn't fix that, it actually makes it worse, because it increases the distance between the application logic and the database. Which layer is introducing the consistency bugs? How many layers are there? You don't know! So, GraphQL is not simple at all, and it requires orchestration code and boilerplate. So, if we're trying to make an abstract tool, first we have to solve the boilerplate.
Let's talk about the write side. This is a way to express a transactional edit of a graph in a very simple way. So, you see I've highlighted in yellow, we've edited this row of the table, this column, the
:neighborhood/name. That edit is line two of the staging area, where you can see all the edits. You can also see we've edited, nested further the table here. Here see we deleted something from this thing, and then deleted something from something else. And, these were all happening at different levels of the tree. But, yet, we can flatten that edit into just a very simple list. This expresses a database transaction, to a graph, just as a list.
If you squint at it, I hope you can convince yourself that by looking at these IDs and these attributes, you could map back out the place that it's supposed to be at. You could look at this list, and you could reconstruct what the view tree should be. So, it's very simple.
But the problem is, no databases work like this! SQL doesn't work like this. MongoDB doesn't work like this. You need to write a mutator layer as something to turn that transaction value into an actual database transaction for an actual database. And, that I think is the root cause of all the complexity we see today.
So, we need a new kind of database.
Datomic came out in 2012, and it came out of the work of Rich Hickey, and out of the discipline of functional programming. What's most interesting about Datomic is that it works a lot like git. So, on the inside, it's a log. And, since there's a notion of time threaded throughout the log, you can clone it to all these different machines and not worry about the machines not agreeing what the answer is. If all the git checkouts are at the same commit, you can ask a question based off any checkout on any machine and get a correct and consistent answer. So, you can basically distribute the queries and not worry about how long a query takes, because you can just throw machines at it, without any loss of consistency.
It turns out that immutability in the database is the primitive you need in order to make a system like I've described, work, and also work efficiently. Datomic can query trees efficiently. Datomic can also write transactions to graphs efficiently and simply. By the way, since it's almost 2020 now, it's AWS native and server is elastic and scales reads to infinity, can service reads at Black Friday scale.
Datomic is the key ingredient that makes Hyperfiddle possible.
Hyperfiddle is a tool for making web APIs without boilerplate coding. You remember in the video, there was very little code in it. It was mostly just indicating what I want to happen in a declarative way. So, basically you can see, there's obviously a query backing this. And there's some links. And, the links take you to other queries. And, also like this select option here, that's a query too, that's been embedded in this program.
You can see the nesting. It's very important that the UI can map direct connect to the database without these intermediate layers that add complexity, because they are performance cost, they add bugs, and they just add effort. You can't autogenerate these layers, because it'll perform differently at different points of scale. You end up having to rewrite your backend every year depending on the level of scale you're at. And you have to use different hard coded optimizations. So, it's critical that the database be able to natively speak the graph query language. You can't be writing adapters. And, it's also critical that, that same data structure can also be used to generate things. It's metadata. You can generate a view from it.
You can say, okay. Attribute
:neighborhood/nameis a string. Hyperfiddle knows that, because it inspected the database schema. And, then same thing for each different level in this.
:community/neighborhoodis a reference.
:neighborhood/districtis a reference. So, this is a very simple system, because it's very flat. There are very few layers. The view is automatic. The query, a human had to provide. We think a high school student who is logic minded could learn how to do this. Then, all the GraphQL adapters and resolvers and all that stuff, none of that is necessary! You just send this query directly into the database. The database gives the answer, and then we plumb it into the view. It's very simple.
In Hyperfiddle, you start by defining the query and your links, but you need to ... your app is more than one query. That Facebook page we showed you is more than one query. So, you actually have to define a lot of these things and weave them together. So links are like reference traversal, a composition mechanic.
This is just a visual example of this. You've got a bigger fiddle, and you've got this new button, which is a hypermedia control, which is backed by this other embedded fiddle. That fiddle has a pull expression or query as well, so that form is automatically rendered from that simple query.
It's not all about queries, though. There's a lot of things that you need to express, like requirements that a real app has. So, one thing you can do, is you can insert new things. You can also remove things. So the
remove!button here is a hypermedia control that says: remove this whole item. Remove the whole thing. This also affix and detach. So, that means in a CRUD app, sometimes you don't wanna delete this reference and also kill the child, you just want to sever the reference. Or, you just want to attach one entity to another. I am a male, so I want to point to the record
:gender/male, and if I later want to remove that association, I don't want destroy the notion of something being a male.
Essentially this is it. These seven ideas [ED: now three] are all you need to fully express almost anything a CRUD app needs.
The last thing you need to be able to do is express arbitrary business transactions, like set the post date to today and set the status to true. Generally with just an expression. Again, because it's so flat, the transaction syntax of Datomic database lets you just construct a list of edits in-memory and say, here is the data description of the transaction, and I give it to Datomic, who will execute it. There's no complicated mutations happening. So, it's really, the essential simplicity of the Datomic database that really glues all this together.
Options are just embedded fiddle with its own query powering this, and we've used progressive enhancement in the view, kind of like HTML, in order to style it as a pick list. This was just done in React.js, the select options renderer is not an API level concern.
Another thing that a real CRUD app needs is ... Say you've got gender and shirt size. What you're allowed to pick for your shirt size, depends on the value of gender. Normally, we have to write boilerplate code to express this. Right? How do you express this?
Well, because we have this declarative query language, we can just say, well which part of the query do we have to depend on? We can just say I depend on the
:reg/gender. This is all because the query is just data. It's not some opaque thing that's difficult to parse. You don't have to parse anything. You have the query, and you can just talk about the piece of the query that you need, to plumb in the link. So Hyperfiddle automatically generates plumbing.
:hf/iframe, which has been stylized as a select box, you can link a URL to it, and it's got
:gender/malein the URL. If you change :male to :female, it'll reload the iframe.
How does progressive enhancement work? Well, it works like HTML!
I've shown you how the API side works, with the links and forms. But, sometimes you need stylize it, right? Sure you can code a custom front end, but we also wanna move very fast.
So here's an example of just using what's essentially just React.js to stylize this table. A light touch is good, because we don't wanna do a lot of work. We just wanna put this definition list right here so we don't have a table inside of a table, because that's really unwieldy. Then, this is some ClojureScript code which does that with React.
One of the interesting things about ClojureScript as the previous presenter mentioned is homoiconicity, which means we can validate this as you type it. We can compile it into browser as you type it. We don't screw up your screen if you type something wrong. It's just very easy. It's very easy to parse, and it's very flexible, so you can build very intelligent tooling, without a lot of code.
Sometimes you wanna move even faster than that, because this 17 lines maybe is still a bit complicated. It's not clear what's going on there unless you're a ClojureScript programmer.
So, we have this Markdown thing, which is more for business people. You can list out your form fields using these Markdown extensions, and you just kind of name the piece of the query that you're talking about.
Markdown makes it easy to layout and order your forms, and attach very simple widget renderers to parts of the form. You can put basically a function here, which is a little renderer just for this little value of the form. If you don't do this, you'll get the default, which is generally a HTML
<input>or checkbox. Here we use string renderers to do things like format dates. The point is you can do this in real-time, and it's very fast and snappy. You can even start to do more complicated things. The image thing at the top is just an example of: We've taken the user picture, which is a URL stored as string in the database, and we render it as a HTML image, and then we've just chaining some style onto it, because we're in a hurry. And this is what you get.
Fiddle are stored in the database. This has pros and cons. The pros are you can edit them interactively. The cons is the database isn't git. It might not turn out to matter, because since Datomic is an immutable log, and git is an immutable log, they're basically the same thing and their inherently versioned. So, it might not matter. I don't know. That remains to be seen. We can probably integrate tools.deps to slurp in external ClojureScript, like Datomic Ions does.
A lot of the type of code we write when we write REST clients is we have to orchestrate http requests. So, we saw this week, writing XMLHttpRequest with callbacks and such. In Hyperfiddle, what happens when you click the stage button? It's not really your problem. The stage button which says, well, here's the transaction that we constructed in this popover, and then Hyperfiddle figures out what to do. It knows to send it to the database, and it attempts to figure out efficiently which queries need to be updated and which queries don't need to be updated, and it can just be ... since it's abstracted, it can be optimized over time. It'll get smarter and better without you having to code it.
So, it's totally decoupled from IO. And, it's interesting, because IO is the hard part. Writing callbacks sucks. You forget your error handler. You forget to chain the promises just right, and the thing crashes. This is very hard. Right? Writing data is very easy and very simple. You let the machine deal with the hard stuff.
The other cool thing about Datomic is it's got every buzzword you can think of. So, Hyperfiddle runs on Datomic, and Datomic handles all the AWS. So, this all works at Facebook scale. We haven't validated that obviously, but it should. There's a lot of work to be done, but it should. So, you can make a Facebook scale application interactively in the browser, with data plus very small amounts of code, mostly just functions that describe you database transactions, and all this comes for free.
Check out Datomic. It's really ... I spent the last five years of my life studying the work of the man who made this database.
Maybe you're doing something sophisticated and the set of choices that Hyperfiddle makes on your behalf are not gonna work for you. E.g. we don't today support a websocket that reactively updates your thing like a stock ticker. Right? So, if you need to support that, it's extensible. So what does that mean. I told you about the seven verbs we had, :hf/new :hf/remove :hf/iframe etc. Each verb has a specific behavior that you could swap out our thing and provide new implementations for this. Now you're writing code obviously, but it's abstract code. Right? You can swap it out, and all your fiddles, and all your software will all use the new behavior. Or, you can add new things. You might want server site paging, which we haven't done yet. You could implement a :link/rel with the I/O behavior for that. So, it's totally open to extension.
We think Hyperfiddle can do anything Rails can do. We haven't validated that, but we've done some real apps, and we think it can. It's got all the right hooks to run Clojure functions wherever you'd want to, and the server infrastructure and I/O runtimes are all open source, so you can always just link Hyperfiddle as a library and write code against the interfaces. Or even just use it for your administration dashboards, which is 50% of the UI work in any given application anyway.
One demonstration of this is that Hyperfiddle is bootstrapped. Here we are live editing the Hyperfiddle toolbar, right inside Hyperfiddle. The entire editor works this way. This particular view might be cheating a little bit, we have some legacy views and CSS that haven't been migrated to userland yet.
That's all I've got. Thank you for watching.
AUDIENCE: This is really cool. It's the most sophisticated example of a generic hypermedia browser that I've seen. The one thing I'm not entirely sold on yet is the idea of querying mostly because that requires that your clients understand your whole graph of your data and data model. So, it's not necessarily discoverable like it is in server architecture. It has to be know ahead of time.
DUSTIN: Hyperfiddle has a client/server architecture. In production your fiddles are baked so the queries can't be changed, so basically equal to any client-server app where the queries are stored in git. In this respect, Hyperfiddle's hyperlink mechanic makes the API discoverable like any other REST service. As far as client-server coupling, our React.js views are certainly coupled to the shape of data they consume, like any other sophisticated user interface. But the views are optional. We do a pretty good job of generating admin user interfaces from metadata, and those are fully discoverable. You can point a third-party Hyperfiddle client at the service and it will work. If the client supports React.js, it can even download the fiddle renderers out of the database and render them. Particularly the markdown views would be straightforward to support in iOS. All of this is discoverable.
By the way, while we do embrace client-server architecture, Hyperfiddle is actually decoupled from this decision, client-server is only one of many possible I/O runtimes that Hyperfiddle can express through plugins, though obviously client-server is a very important one to ship with.
AUDIENCE: One thing that GraphQL is kind of coming into problems with is figuring out how to do caching with stuff like this. Have you thought about adding that?
DUSTIN: So, caching is the hardest and most core problem we think in this and Datomic permits a solution, which is because Datomic is like git. Git has a commit hash, which is a notion of time wired through it. So, every single change in git is as of a point in time. It means you can cache it perfectly. So, Datomic also, every question you ask Datomic includes as part of the question, "at this point in time". Right? What was the value of my blog as of 4:52 p.m. today, is how the query is submitted. And, that can be cached forever. That's immutable. That actually answered out of CDN.
AUDIENCE: I don't understand. It sounds like you're talking about caching on the server side.
AUDIENCE: Oh. That's interesting.
DUSTIN: That's critical in making it work. It's also that you have a trade off if you noticed in the first part of your question, which is well, I've got this link that's actually very flexible. And, then I have this query pull kind of flip of the same idea which is now adding fragility and rigidness to the system. And, can you implement a link out of a reference? You probably could abstract over that. I don't know how to make it like transparently fast. There's also data security comes into play. Like, Facebook says, well your not allowed to access that. Links can make that more challenging thing to solve.
AUDIENCE: I just wanna know more about Datomic. So, essentially, saying it's a stream of like commits? [Yes.] So, like event sourcing where you have a sequence of events [Yes.] So, have you dealt with GDPR? because you have an immubtable history of everything that happened. So, you may have captured personal data. So, if you say it's immutable, how do we remove it when the user requests the removal of his data?
DUSTIN: So, the question is, when you have an immutable log in the database, how does that intersect with the GDPR which means you have to forget things and they're in the log? The first is that it's unclear ... there's different opinions from lawyers on what this actually means and if the data has to be actually gone versus just unavailable to the system. I don't know what the deal is with that. The second is that the vendor of Datomic is exploring a feature called excision, which means you surgically remove from the log, data, but you didn't forget that something happened. To basically say, like this is the log, and the point in time hasn't changed. And, something happened there, and it's been excised from the history. It also remains to be seen whether that's a sufficient interpretation for GDPR. And this will all evolve in time. Hopefully, Datomic will just solve that for us.
DUSTIN: I was thinking about one of the questions I got yesterday, about the difference between GraphQL and links.
And the problem is on the right, just pretend it's GraphQL, like a graph pull type query. By predefining the references you're allowed to traverse, you've made your application brittle. And that's exactly the problem that hypermedia's supposed to solve, by instead of pre-traversing the references, you wanna just show links, right?
So I'm gonna comment out part of the query. And right now, I've got a link here to exactly the same thing. Does that make sense? So this way is more flexible, but the problem is link-following is slower, it drains your battery, it doesn't scale, and it's harder to code.
So my idea was, since we know in the database and in our schema, we know that this is a reference, can that be the same thing as a link? By defining the query here, we've basically said, "This is the data that you're allowed to access. You're not gonna violate security if you traverse these paths. These are here for you if you want them. Now you send me up the query you want, and we'll decide whether we'll fulfill that for you or not." And thus, it would be basically the same thing as hyperlinks.
What do you guys think?
AUDIENCE: Sounds very similar to what I was gonna talk to you about later. [Laughter] So yeah, it sounds like we've got a similar idea. But ... Yeah, I'd like to explore that more.
DUSTIN: Because one of the things that I think was unmentioned in these conversations, is the idea of data security. Because we're looking at this from a linked open data perspective. Facebook doesn't care about that, right? Facebook doesn't want you to get hacked, and it doesn't want your data exfiltrated. So are you allowed to go back and see the last thousand posts? Are you allowed to go back and see your advertising history? It's important to them that they say, "This is what you're allowed to query, and this is the pattern you're allowed to query in." And hyperlinks is harder to express that, because they're inherently so flexible. So this may be a way to unify those two things, references and hyperlinks.
AUDIENCE: Makes sense. Cool. One thing that I was thinking about, is I wonder if you could do something that's not exactly a query but more of a search, and that way you can, I don't know, on the server side, figure out all the different things. And if those relationships don't exist, then just nothing comes back in the search. But I don't know, it's just a surface level thought.
DUSTIN: I don't understand yet where you're going.
AUDIENCE: Well I'm trying to see if we can create some sort of way of querying different kinds of data without having to rely on the specific connections between those data. So instead of an explicit reference or link between them, it's like a relation. It's almost like a ... Maybe I want information on people, then I want information on cars, and internally however those are gonna match up, maybe in a graph of some sort, then that's figured out and then it sends you back the data. So the data may not come back in a very expected form, like a specific expected format, but you may be able to do something like Steven was talking about with using that AST structure to look at, to parse that data, in a way that doesn't necessarily have to do with what structure comes back.
DUSTIN: One problem we have is we have thousands, millions of data silos, all these different databases, that all have different things. And I know in the semantic web we had ontology mapping and that type of thing. But I don't care about that, I just know this database works like this and this database is this, now I just wanna merge them together. So there's not references between them, but you can ...
AUDIENCE (Mike Amundsen?) So just on that point of multiple databases. We have people whose job right now is creating correlations, right? Raven, who was here with us the first day for the workshop, that's her total job. She's manually spiking that, creating correlations. Some of that must be going on in the learning AI space, right? [Sure.] Something must be starting to figure out how to start to train machines to make correlations, and in fact, Facebook and Google correlate all sorts of disparate data about where I'm shopping and where I'm traveling and everything else, right? So one of the opportunities might have to do with figuring out what's going on there, and make those correlations more automatic rather than more manual. In other words, not just the fact that there's a machine doing it, but maybe there's a machine doing it even when we're not asking about it, so that it's ready for us. I'm not sure if that's what you were thinking about, but I think one of the things that might help solve some of these problems is sometimes you need to solve a problem by changing the problem. Like, "Yeah, let's just put it all in one place and stop worrying about distance." Right? But other times it might be, "Let's see if we can automate some of these elements, and see if that helps us as well." Because I think some of the challenge is gonna be, putting it all in one place is always gonna be a subset, and eventually somebody's gonna say, "Well I wanna put more in there. I wanna put more in there." [Right, yeah.] So we have to also be prepared, it's almost like a bridging thing, but also be prepared for how would we deal with it.
DUSTIN: In the general case, it's not gonna be pretty. You're gonna have all different types of data and all different types of formats, and you still need to be able to merge that together.
AUDIENCE: Right. And that's what we do in our heads all the time. Not same volume, but same technique. So I think it's also worth thinking about it, and that's a way, as well. Or automation could help some of that. Optimization could help some of that, too. So optimizing the correlation might mean that I've got a whole bunch of solar link lists around that I can access rather than the actual data or something. I don't know, just thinking of other ways to start thinking about it.