docs/graphcache/normalized-caching.md at main · kitten.sh/urql

kitten.sh / urql
Mirror: The highly customizable and versatile GraphQL client with which you add on features like normalized caching as you grow.
fork atom
urql / docs / graphcache / normalized-caching.md
at main 25 kB view raw view rendered
  1---
  2title: Normalized Caching
  3order: 1
  4---
  5
  6# Normalized Caching
  7
  8In GraphQL, like its name suggests, we create schemas that express the relational nature of our
  9data. When we create and query against a `Query` type we walk a graph that starts at the root
 10`Query` type and walks through relational types. Rather than querying for normalized data, in
 11GraphQL our queries request a specific shape of denormalized data, a view into our relational data
 12that can be re-normalized automatically.
 13
 14As the GraphQL API walks our query documents it may read from a relational database and _entities_
 15and scalar values are copied into a JSON document that matches our query document. The type
 16information of our entities isn't lost however. A query document may still ask the GraphQL API about
 17what entity it's dealing with using the `__typename` field, which dynamically introspects an
 18entity's type. This means that GraphQL clients can automatically re-normalize data as results come
 19back from the API by using the `__typename` field and keyable fields like an `id` or `_id` field,
 20which are already common conventions in GraphQL schemas. In other words, normalized caches can build
 21up a relational database of tables in-memory for our application.
 22
 23For our apps normalized caches can enable more sophisticated use-cases, where different API requests
 24update data in other parts of the app and automatically update data in our cache as we query our
 25GraphQL API. Normalized caches can essentially keep the UI of our applications up-to-date when
 26relational data is detected across multiple queries, mutations, or subscriptions.
 27
 28## Normalizing Relational Data
 29
 30As previously mentioned, a GraphQL schema creates a tree of types where our application's data
 31always starts from the `Query` root type and is modified by other data that's incoming from either a
 32selection on `Mutation` or `Subscription`. All data that we query from the `Query` type will contain
 33relations between "entities", JSON objects that are hierarchical.
 34
 35A normalized cache seeks to turn this denormalized JSON blob back into a relational data structure,
 36which stores all entities by a key that can be looked up directly. Since GraphQL documents give the
 37API a strict specification on how it traverses a schema, the JSON data that the cache receives from
 38the API will always match the GraphQL query document that has been used to query this data.
 39A common misconception is that normalized caches in GraphQL store data by the query document somehow,
 40however, the only thing a normalized cache cares about is that it can use our GraphQL query documents
 41to walk the structure of the JSON data it received from the API.
 42
 43```graphql
 44{
 45  __typename
 46  todo(id: 1) {
 47    __typename
 48    id
 49    title
 50    author {
 51      __typename
 52      id
 53      name
 54    }
 55  }
 56}
 57```
 58
 59```json
 60{
 61  "__typename": "Query",
 62  "todo": {
 63    "__typename": "Todo",
 64    "id": 1,
 65    "title": "implement graphcache",
 66    "author": {
 67      "__typename": "Author",
 68      "id": 1,
 69      "name": "urql-team"
 70    }
 71  }
 72}
 73```
 74
 75Above, we see an example of a GraphQL query document and a corresponding JSON result from a GraphQL
 76API. In GraphQL, we never lose access to the underlying types of the data. Normalized caches can
 77ask for the `__typename` field in selection sets automatically and will find out which type a JSON
 78object corresponds to.
 79
 80Generally, a normalized cache must do one of two things with a query document like the above:
 81
 82- It must be able to walk the query document and JSON data of the result and cache the data,
 83  normalizing it in the process and storing it in relational tables.
 84- It must later be able to walk the query document and recreate this JSON data just by reading data
 85  from its cache, by reading entries from its in-memory relational tables.
 86
 87While the normalized cache can't know the exact type of each field, thanks to the GraphQL query
 88language it can make a couple of assumptions. The normalized cache can walk the query document. Each
 89field that has no selection set (like `title` in the above example) must be a "record", a field that
 90may only be set to a scalar. Each field that does have a selection set must be another "entity" or a
 91list of "entities". The latter fields with selection sets are our relations between entities, like a
 92foreign key in relational databases.
 93Furthermore, the normalized cache can then read the `__typename` field on related entities. This is
 94called _Type Name Introspection_ and is how it finds out about the types of each entity.
 95From the above document we can assume the following relations:
 96
 97- `Query.todo(id: 1)` → `Todo`
 98- `Todo.author` → `Author`
 99
100However, this isn't quite enough yet to store the relations from GraphQL results. The normalized
101cache must also generate primary keys for each entity so that it can store them in table-like data
102structures. This is for instance why [Relay
103enforces](https://relay.dev/docs/guides/graphql-server-specification/#object-identification) that
104each entity must have an `id` field. This allows it to assume that there's an obvious primary key
105for each entity it may query. Instead, `urql`'s Graphcache and Apollo assume that there _may_ be an
106`id` or `_id` field in a given selection set. If Graphcache can't find these two fields it'll issue
107a warning, however a custom `keys` configuration may be used to generate custom keys for a given
108type. With this logic the normalized cache will actually create the following "links" between its
109relational data:
110
111- `"Query"`, `.todo(id: 1)` → `"Todo:1"`
112- `"Todo:1"`, `.author` → `"Author:1"`
113
114As we can see, the `Query` root type itself has a constant key of `"Query"`. All relational data
115originates here, since the GraphQL schema is a graph and, like a tree, all selections on a GraphQL
116query document originate from it.
117Internally, the normalized cache now stores field values on entities by their primary keys. The
118above can also be said or written as:
119
120- The `Query` entity's `todo` field with `{"id": 1}` arguments points to the `Todo:1` entity.
121- The `Todo:1` entity's `author` field points to the `Author:1` entity.
122
123In Graphcache, these "links" are stored in a nested structure per-entity. "Records" are kept
124separate from this relational data.
125
126![Normalization is based on types, keys, and relations. This information can all be inferred from
127the query document.](../assets/query-document-info.png)
128
129## Storing Normalized Data
130
131At its core, normalizing data means that we take individual fields and store them in a table. In our
132case we store all values of fields in a dictionary of their primary key, generated from an ID or
133other key and type name, and the field’s name and arguments, if it has any.
134
135| Primary Key            | Field                                           | Value                    |
136| ---------------------- | ----------------------------------------------- | ------------------------ |
137| Type name and ID (Key) | Field name (not alias) and optionally arguments | Scalar value or relation |
138
139To reiterate we have three pieces of information that are stored in tables:
140
141- The entity's key can be derived from its type name via the `__typename` field and a keyable field.
142  By default _Graphcache_ will check the `id` and `_id` fields, however this is configurable.
143- The field's name (like `todo`) and optional arguments. If the field has any arguments then we can
144  normalize it by JSON stringifying the arguments, making sure that the JSON key is stable by
145  sorting its keys.
146- Lastly, we may store relations as either `null`, a primary key that refers to another entity, or a
147  list of such. For storing "records" we can store the scalars in a separate table.
148
149In _Graphcache_ the data structure for these tables looks a little like the following, where each
150entity has a record from fields to other entity keys:
151
152```js
153{
154  links: Map {
155    'Query': Record {
156      'todo({"id":1})': 'Todo:1'
157    },
158    'Todo:1': Record {
159      'author': 'Author:1'
160    },
161    'Author:1': Record { },
162  }
163}
164```
165
166We can see how the normalized cache is now able to traverse a GraphQL query by starting on the
167`Query` entity and retrieve relations for other fields.
168To retrieve "records" which are all fields with scalar values and no selection sets, _Graphcache_
169keeps a second table around with an identical structure. This table only contains scalar values,
170which keeps our non-relational data away from our "links":
171
172```js
173{
174  records: Map {
175    'Query': Record {
176      '__typename': 'Query'
177    },
178    'Todo:1': Record {
179      '__typename': 'Todo',
180      'id': 1,
181      'title': 'implement graphcache'
182    },
183    'Author:1': Record {
184      '__typename': 'Author',
185      'id': 1,
186      'name': 'urql-team'
187    },
188  }
189}
190```
191
192This is very similar to how we'd go about creating a state management store manually, except that
193_Graphcache_ can use the GraphQL document to perform this normalization automatically.
194
195What we gain from this normalization is that we have a data structure that we can both read from and
196write to, to reproduce the API results for GraphQL query documents. Any mutation or subscription can
197also be written to this data structure. Once _Graphcache_ finds a keyable entity in their results
198it's written to its relational table which may update other queries in our application.
199Similarly queries may share data between one another which means that they effectively share
200entities using this approach and can update one another.
201In other words, once we have a primary key like `"Todo:1"` we may find this primary key again in
202other entities in other GraphQL results.
203
204## Custom Keys and Non-Keyable Entities
205
206In the above introduction we've learned that while _Graphcache_ doesn't enforce `id` fields on each
207entity, it checks for the `id` and `_id` fields by default. There are many situations in which
208entities may either not have a key field or have different keys.
209
210As _Graphcache_ traverses JSON data and a GraphQL query document to write data to the cache you may
211see a warning from it along the lines of ["Invalid key: [...] No key could be generated for the data
212at this field."](./errors.md/#15-invalid-key) _Graphcache_ has many warnings like these that attempt
213to detect undesirable behaviour and helps us to update our configuration or queries accordingly.
214
215In the simplest cases, we may simply have forgotten to add the `id` field to the selection set of
216our GraphQL query document. However, what if the field is instead called `uuid` and our query looks
217accordingly different?
218
219```graphql
220{
221  item {
222    uuid
223  }
224}
225```
226
227In the above selection set we have an `item` field that has a `uuid` field rather than an `id`
228field. This means that _Graphcache_ won't automatically be able to generate a primary key for this
229entity. Instead, we have to help it generate a key by passing it a custom `keys` config:
230
231```js
232cacheExchange({
233  keys: {
234    Item: data => data.uuid,
235  },
236});
237```
238
239We may add a function as an entry to the `keys` configuration. The property here, `"Item"` must be
240the typename of the entity for which we're generating a key. The function may return an arbitarily
241generated key. So for our `item` field, which in our example schema gives us an `Item` entity, we
242can create a `keys` configuration entry that creates a key from the `uuid` field rather than the
243`id` field.
244
245This also raises a question, **what does _Graphcache_ do with unkeyable data by default? And, what
246if my data has no key?**<br />
247This special case is what we call "embedded data". Not all types in a GraphQL schema will have
248keyable fields and some types may just abstract data without themselves being relational. They may
249be "edges", entities that have a field pointing to other entities that simply connect two entities,
250or data types like a `GeoJson` or `Image` type.
251
252In these cases, where the normalized cache encounters unkeyable types, it will create an embedded
253key by using the parent's primary key and combining it with the field key. This means that
254"embedded entities" are only reachable from a specific field on their parent entities. They're
255globally unique and aren't strictly speaking relational data.
256
257```graphql
258{
259  __typename
260  todo(id: 1) {
261    id
262    image {
263      url
264      width
265      height
266    }
267  }
268}
269```
270
271In the above example we're querying an `Image` type on a `Todo`. This imaginary `Image` type has no
272key because the image is embedded data and will only ever be associated to this `Todo`. In other
273words, the API's schema doesn't consider it necessary to have a primary key field for this type.
274Maybe it doesn't even have an ID in our backend's database. We _could_ assign this type an imaginary
275key (maybe based on the `url`) but in fact if it's not shared data it wouldn't make much sense to
276do so.
277
278When _Graphcache_ attempts to store this entity it will issue the previously mentioned warning.
279Internally, it'll then generate an embedded key for this entity based on the parent entity. If
280the parent entity's key is `Todo:1` then the embedded key for our `Image` will become
281`Todo:1.image`. This is also how this entity will be stored internally by _Graphcache_:
282
283```js
284{
285  records: Map {
286    'Todo:1.image': Record {
287      '__typename': 'Image',
288      'url': '...',
289      'width': 1024,
290      'height': 768
291    },
292  }
293}
294```
295
296This doesn't however mute the warning that _Graphcache_ outputs, since it believes we may have made a
297mistake. The warning itself gives us advice on how to mute it:
298
299> If this is intentional, create a keys config for `Image` that always returns null.
300
301Meaning, that we can add an entry to our `keys` config for our non-keyable type that explicitly
302returns `null`, which tells _Graphcache_ that the entity has no key:
303
304```js
305cacheExchange({
306  keys: {
307    Image: () => null,
308  },
309});
310```
311
312### Flexible Key Generation
313
314In some cases, you may want to create a pattern for your key generation. For instance, you may want
315to say "create a special key for every type ending in `'Node'`. In such a case we recommend creating
316a small JS `Proxy` to take care of key generation for you and making the keys functional.
317
318```js
319cacheExchange({
320  keys: new Proxy(
321    {
322      Image: () => null,
323    },
324    {
325      get(target, prop, receiver) {
326        if (prop.endsWith('Node')) {
327          return data => data.uid;
328        }
329        const fallback = data => data.uuid;
330        return target[prop] || fallback;
331      },
332    }
333  ),
334});
335```
336
337In the above example, we dynamically change the key generator depending on the typename. When
338a typename ends in `'Node'`, we return a key generator that uses the `uid` field. We still fall back
339to an object of manual key generation functions however. Lastly though, when a type doesn't have
340a predefined key generator, we change the default behavior from using `id` and `_id` fields to using
341`uuid` fields.
342
343## Non-Automatic Relations and Updates
344
345While _Graphcache_ is able to store and update our entities in an in-memory relational data
346structure, which keeps the same entities in singular unique locations, a GraphQL API may make a lot
347of implicit changes to the relations of data as it runs or have trivial relations that our cache
348doesn't need to see to resolve. Like with the `keys` config, we have two more configuration options
349to combat this: `resolvers` and `updates`.
350
351### Manually resolving entities
352
353Some fields in our configuration can be resolved without checking the GraphQL API for relations. The
354`resolvers` config allows us to create a list of client-side resolvers where we can read from the
355cache directly as _Graphcache_ creates a local GraphQL result from its cached data.
356
357```graphql
358{
359  todo(id: 1) {
360    id
361  }
362}
363```
364
365Previously we've looked at the above query to illustrate how data from a GraphQL API may be written
366to _Graphcache_'s relational data structure to store the links and entities in a result against this
367GraphQL query document. However, it may be possible for another query to have already written this
368`Todo` entity to the cache. So, **how do we resolve a relation manually?**
369
370In such a case, _Graphcache_ may have seen and stored the `Todo` entity but isn't aware of the
371relation between `Query.todo({"id":1})` and the `Todo:1` entity. However, we can tell _Graphcache_
372which entity it should look for when it accesses the `Query.todo` field by creating a resolver for
373it:
374
375```js
376cacheExchange({
377  resolvers: {
378    Query: {
379      todo(parent, args, cache, info) {
380        return { __typename: 'Todo', id: args.id };
381      },
382    },
383  },
384});
385```
386
387A resolver is a function that's similar to [GraphQL.js' resolvers on the
388server-side](https://www.graphql-tools.com/docs/resolvers/). They receive the parent data, the
389field's arguments, access to _Graphcache_'s cached data, and an `info` object. [The entire function
390signature and more explanations can be found in the API docs.](../api/graphcache.md#resolvers-option)
391Since it can access the field's arguments from the GraphQL query document, we can return a partial
392`Todo` entity. As long as this
393object is keyable, it will tell _Graphcache_ what the key of the returned entity is. In other words,
394we've told it how to get to a `Todo` from the `Query.todo` field.
395
396This mechanism is immensely more powerful than this example. We have other use-cases that
397resolvers may be used for:
398
399- Resolvers can be applied to fields with records, which means that it can be used to change or
400  transform scalar values. For instance, we can update a string or parse a `Date` right inside a
401  resolver.
402- Resolvers can return deeply nested results, which will be layered on top of the in-memory
403  relational cached data of _Graphcache_, which means that it can emulate infinite pagination and
404  other complex behaviour.
405- Resolvers can change when a cache miss or hit occurs. Returning `null` means that a field’s value
406  is literally `null`, which will not cause a cache miss, while returning `undefined` will mean
407  a field’s value is uncached.
408- Resolvers can return either partial entities or keys, so we can chain `cache.resolve` calls to
409  read fields from the cache, even when a field is pointing at another entity, since we can return
410  keys to the other entity directly.
411
412[Read more about resolvers on the following page about "Local Resolvers".](./local-resolvers.md)
413
414### Manual cache updates
415
416While `resolvers`, as shown above, operate while _Graphcache_ is reading from its in-memory cache,
417`updates` are a configuration option that operate while _Graphcache_ is writing to its cached data.
418Specifically, these functions can be used to add more updates onto what a `Mutation` or
419`Subscription` may automatically update.
420
421As stated before, a GraphQL schema's data may undergo a lot of implicit changes when we send it a
422`Mutation` or `Subscription`. A new item that we create may for instance manipulate a completely
423different item or even a list. Often mutations and subscriptions alter relations that their
424selection sets wouldn't necessarily see. Since mutations and subscriptions operate on a different
425root type, rather than the `Query` root type, we often need to update links in the rest of our data
426when a mutation is executed.
427
428```graphql
429query TodosList {
430  todos {
431    id
432    title
433  }
434}
435
436mutation AddTodo($title: String!) {
437  addTodo(title: $title) {
438    id
439    title
440  }
441}
442```
443
444In a simple example, like the one above, we have a list of todos in a query and create a new todo
445using the `Mutation.addTodo` mutation field. When the mutation is executed and we get the result
446back, _Graphcache_ already writes the `Todo` item to its normalized cache. However, we also want to
447add the new `Todo` item to the list on `Query.todos`:
448
449```js
450import { gql } from '@urql/core';
451
452cacheExchange({
453  updates: {
454    Mutation: {
455      addTodo(result, args, cache, info) {
456        const query = gql`
457          {
458            todos {
459              id
460            }
461          }
462        `;
463        cache.updateQuery({ query }, data => {
464          data.todos.push(result.addTodo);
465          return data;
466        });
467      },
468    },
469  },
470});
471```
472
473In this code example we can first see that the signature of the `updates` entry is very similar to
474the one of `resolvers`. However, we're seeing the `cache` in use for the first time. The `cache`
475object (as [documented in the API docs](../api/graphcache.md#cache)) gives us
476access to _Graphcache_'s mechanisms directly. Not only can we resolve data using it, we can directly
477start sub-queries or sub-writes manually. These are full normalized cache runs inside other runs. In
478this case we're calling `cache.updateQuery` on a list of `Todo` items while the `Mutation` that
479added the `Todo` is already being written to the cache.
480
481As we can see, we may perform manual changes inside of `updates` functions, which can be used to
482affect other parts of the cache (like `Query.todos` here) beyond the automatic updates that a
483normalized cache is expected to perform.
484
485We get methods like `cache.updateQuery`, `cache.writeFragment`, and `cache.link` in our updater
486functions, which aren't available to us in local resolvers, and can only be used in these `updates`
487entries to change the data that the cache holds.
488
489[Read more about writing cache updates on the "Cache Updates" page.](./cache-updates.md)
490
491## Deterministic Cache Updates
492
493Above, in [the "Storing Normalized Data" section](#storing-normalized-data), we've talked about how
494Graphcache is able to store normalized data. However, apart from storing this data there are a
495couple of caveats that many applications simply ignore, skip, or simplify when they implement a
496store to cache their data in.
497
498Amongst features like [Optimistic Updates](./cache-updates.md#optimistic-updates) and [Offline
499Support](./offline.md), Graphcache supports several features that allow our API results to be more
500unreliable. Essentially we don't expect API results to always come back in order or on time.
501However, we expect Graphcache to prevent us from making "indeterministic cache updates", meaning
502that we expect it to handle API results that come back in a random order and delayed gracefully.
503
504In terms of the ["Manual Cache Updates"](#manual-cache-updates) that we've talked about above and
505[Optimistic Updates](./cache-updates.md#optimistic-updates) the limitations are pretty simple at
506first and if we use Graphcache as usual we may not even notice them:
507
508- When we make an _optimistic_ change, we define what a mutation's result may look like once the API
509  responds in the future and apply this temporary result immediately. We store this temporary data
510  in a separate "layer". Once the real result comes back this layer can be deleted and the real API
511  result can be applied as usual.
512- When multiple _optimistic updates_ are made at the same time, we never allow these layers to be
513  deleted separately. Instead Graphcache waits for all mutations to complete before deleting the
514  optimistic layers and applying the real API result. This means that a mutation update cannot
515  accidentally commit optimistic data to the cache permanently.
516- While an _optimistic update_ has been applied, Graphcache stops refetching any queries that contain
517  this optimistic data so that it doesn't "flip back" to its non-optimistic state without the
518  optimistic update being applied. Otherwise we'd see a "flicker" in the UI.
519
520These three principles are the basic mechanisms we can expect from Graphcache. The summary is:
521**Graphcache groups optimistic mutations and pauses queries so that optimistic updates look as
522expected,** which is an implementation detail we can mostly ignore when using it.
523
524However, one implementation detail we cannot ignore is the last mechanism in Graphcache which is
525called **"Commutativity"**. As we can tell, "optimistic updates" need to store their normalized
526results on a separate layer. This means that the previous data structure we've seen in Graphcache is
527actually more like a list, with many tables of links and entities.
528
529Each layer may contain optimistic results and have an order of preference. However, this order also
530applies to queries. Since queries are run in one order but their API results can come back to us in
531a very different order, if we access enough pages in a random order things can sometimes look rather
532weird. We may see that in an application on a slow network connection the results may vary depending
533on when their results came back.
534
535![Commutativity means that we store data in separate layers.](../assets/commutative-layers.png)
536
537Instead, Graphcache actually uses layers for any API result it receives. In case, an API result
538arrives out-of-order, it sorts them by precedence — or rather by when they've been requested.
539Overall, we don't have to worry about this, but Graphcache has mechanisms that keep our updates
540safe.
541
542## Reading on
543
544This concludes the introduction to Graphcache with a short overview of how it works, what it
545supports, and some hidden mechanisms and internals. Next we may want to learn more about how to use
546it and more of its features:
547
548- [How do we write "Local Resolvers"?](./local-resolvers.md)
549- [How to set up "Cache Updates" and "Optimistic Updates"?](./cache-updates.md)
550- [What is Graphcache's "Schema Awareness" feature for?](./schema-awareness.md)
551- [How do I enable "Offline Support"?](./offline.md)