Grafast introduction

info

This introduction to Grafast assumes that you have a basic understanding of GraphQL, including the concept of resolvers. We highly recommend that you read Introduction to GraphQL, and in particular GraphQL Execution, before reading this document.

The GraphQL specification describes how a GraphQL operation should be executed, talking in terms of layer–by–layer resolution of data using "resolvers." But critical to note is this sentence from the beginning of the specification:

Conformance requirements [...] can be fulfilled [...] in any way as long as the perceived result is equivalent.
─ https://spec.graphql.org/draft/#sec-Conforming-Algorithms

Resolvers are relatively straightforward to understand, but when implemented naively can very quickly result in serious performance issues. DataLoader is one of the approaches suggested to solve the "N+1 problem," but this is only the most egregious performance issue that a naive GraphQL schema may face — there are others such as server–side over–fetching and under–fetching and related issues that can really build up as your schemas and operations get more complex.

Grafast was designed from the ground up to eliminate these issues and more whilst maintaining pleasant APIs for developers to use. To do this, in addition to supporting resolvers for legacy fields, Grafast favors a planning strategy that takes a holistic approach to understanding the incoming operation, and unlocks the potential for significant optimizations not previously achievable without a herculean effort.

Please note that Grafast is not tied to any particular storage or business logic layer — any valid GraphQL schema could be implemented with Grafast, and a Grafast schema can query any data source, service, or business logic that Node.js can query.

info

Currently Grafast is implemented in TypeScript, but we're working on a specification with hopes to extend Grafast's execution approach to other programming languages. If you're interested in implementing Grafast's execution algorithm in a language other than JavaScript, please get in touch!

Plan resolvers

This is just an overview, for full documentation see Plan Resolvers.

In a traditional GraphQL schema each field has a resolver. In a Grafast schema, though resolvers are still supported, you are encouraged to instead use plan resolvers. These plan resolvers are generally small functions, like regular resolvers should be, but instead of being called many times during execution and dealing with concrete runtime values, they are called only once at planning time and they build and manipulate steps which are the building blocks of an execution plan which details all the actions necessary to satisfy the GraphQL request.

Imagine that we have this GraphQL schema:

type Query {
  currentUser: User
}
type User {
  name: String!
  friends: [User!]!
}

In graphql-js, you might have these resolvers:

const resolvers = {
  Query: {
    async currentUser(_, args, context) {
      return context.userLoader.load(context.currentUserId);
    },
  },
  User: {
    name(user) {
      return user.full_name;
    },
    async friends(user, args, context) {
      const friendships = await context.friendshipsByUserIdLoader.load(user.id);
      const friends = await Promise.all(
        friendships.map((friendship) =>
          context.userLoader.load(friendship.friend_id),
        ),
      );
      return friends;
    },
  },
};

In Grafast, we use plan resolvers instead, which might look something like:

const planResolvers = {
  Query: {
    currentUser() {
      return userById(context().get("currentUserId"));
    },
  },
  User: {
    name($user) {
      return $user.get("full_name");
    },
    friends($user) {
      const $friendships = friendshipsByUserId($user.get("id"));
      const $friends = each($friendships, ($friendship) =>
        userById($friendship.get("friend_id")),
      );
      return $friends;
    },
  },
};

As you can see, the shape of the logic is quite similar, but the Grafast plan resolvers are synchronous. Grafast operates in two phases: planning (synchronous) and execution (asynchronous); plan resolvers are called during the planning phase.

See the working example

If you want to explore the two code blocks above, and see them in context including their dependencies, please see the "users and friends" example.

The job of a plan resolver is not to retrieve data, it's to detail the steps necessary to retrieve it. Plan resolvers do not have access to any runtime data, they must describe what to do for arbitrary future data. For example, the User.friends Grafast plan resolver cannot loop through the runtime data with a map function as in the resolver example (since there is not yet any data to loop over), instead it describes the plan to do so using an each step, detailing what to do with each item made available later.

The dollar convention

By convention, when a variable represents a Grafast step, the variable will be named starting with a $ (dollar symbol).

Steps

Steps are the basic building blocks of a Grafast plan; they are instances of a step class, constructed via the function calls in the plan resolver. Step classes describe how to perform a specific action and help plan how to perform the action more efficiently via the lifecycle methods. Grafast provides optimized built–in steps for common needs; it's common that you can get started using just these, but as you go about optimizing your schema further it's expected that you will build your own step classes, in the same way that you'd build DataLoaders in a resolver–based GraphQL API.

If we were to make a request to the above Grafast schema with the following query:

{
  currentUser {
    name
    friends {
      name
    }
  }
}

Grafast would build an operation plan for the operation. For the above query, a plan diagram representing the execution portion of this operation plan is:

%%{init: {'themeVariables': { 'fontSize': '12px'}}}%% flowchart TD classDef path fill:#eee,stroke:#000,color:#000 classDef plan fill:#fff,stroke-width:1px,color:#000 classDef itemplan fill:#fff,stroke-width:2px,color:#000 classDef unbatchedplan fill:#dff,stroke-width:1px,color:#000 classDef sideeffectplan fill:#fcc,stroke-width:2px,color:#000 classDef bucket fill:#f6f6f6,color:#000,stroke-width:2px,text-align:left %% plan dependencies Access6{{"Access[6∈0] ➊ ᐸ2.currentUserIdᐳ"}}:::plan __Value2["__Value[2∈0] ➊ ᐸcontextᐳ"]:::plan __Value2 --> Access6 Load7[["Load[7∈0] ➊ ᐸuserByIdᐳ"]]:::plan Access6 --> Load7 Load10[["Load[10∈0] ➊ ᐸfriendshipsByUserIdᐳ"]]:::plan Access6 --> Load10 __Value4["__Value[4∈0] ➊ ᐸrootValueᐳ"]:::plan __Item14[/"__Item[14∈3] ᐸ10ᐳ"\]:::itemplan Load10 ==> __Item14 Access16{{"Access[16∈3] ᐸ14.friend_idᐳ"}}:::plan __Item14 --> Access16 Load17[["Load[17∈3] ᐸuserByIdᐳ"]]:::plan Access16 --> Load17 %% define steps classDef bucket0 stroke:#696969 class Bucket0,__Value2,__Value4,Access6,Load7,Load10 bucket0 classDef bucket1 stroke:#00bfff class Bucket1 bucket1 classDef bucket3 stroke:#ffa500 class Bucket3,__Item14,Access16,Load17 bucket3 classDef bucket4 stroke:#0000ff class Bucket4 bucket4

Each node in this diagram represents a step in the operation plan, and the arrows show how the data flows between these steps.

Plans can be reused for multiple requests

When the same operation is seen again its existing plan can (generally) be reused; this is why, to get the very best performance from Grafast, you should use static GraphQL documents and pass variables at run–time.

Batched execution

The main concern of most steps is execution. In Grafast all execution is batched, so each of the nodes in the operation plan will execute at most once during a GraphQL query or mutation. This is one of the major differences when compared to traditional GraphQL execution; with traditional resolvers processing happens in a layer–by–layer, item–by–item approach, requiring workarounds such as DataLoader to help reduce instances of the N+1 problem.

When it comes time to execute an operation plan, Grafast will automatically populate the steps whose names begin with __ (e.g. the context and variable values) and then will begin the process of executing each step once all of its dependencies are ready, continuing until all steps are complete.

At planning time a step can add a dependency on another step via const depId = this.addDependency($otherStep);. This depId is the index in the values tuple that the step can use at execution time to retrieve the associated values.

When a step executes, its execute method is passed the execution details which includes:

count — the size of the batch to be executed
values — the values tuple, the values for each of the dependencies the step added
indexMap(callback) — method returning an array by calling callback(i) for each index i in the batch (from 0 to count-1)

The execute method must return a list (or a promise to a list) of length count, where each entry in this list relates to the corresponding entries in values — this should be at least a little familiar to anyone who has written a DataLoader before.

When a plan starts executing it always starts with a batch size (count) of 1; but many things may affect this batch size for later steps — for example when processing the items in a list, the batch must grow to contain each item (via the __Item step). Grafast handles all of these complexities for you internally, so you don't generally need to think about them.

Unary steps

A "unary step" is a regular step which the system has determined will always represent exactly one value. The system steps which represent request–level data (e.g. context, variable and argument values) are always unary steps, and Grafast will automatically determine which other steps are also unary steps.

Sometimes you'll want to ensure that one or more of the steps your step class depends on will have exactly one value at runtime; to do so, you can use this.addUnaryDependency($step) rather than this.addDependency($step). This ensures that the given dependency will always be a unary step, and is primarily useful when a parameter to a remote service request needs to be the same for all entries in the batch; typically this will be the case for ordering, pagination and access control. For example if you're retrieving the first N pets from each of your friends you might want to add limit N to an SQL query — by adding the N as a unary dependency you can guarantee that there will be exactly one value of N for each execution, and can construct the SQL query accordingly (see limitSQL in the example below).

SQL example

Here's a step class which retrieves records matching a given column (i.e. WHERE columnName = $columnValue) from a given table in an SQL database. Optionally, you may request to limit to the first $first results.

export class RecordsByColumnStep extends Step {
  constructor(tableName, columnName, $columnValue) {
    super();
    this.tableName = tableName;
    this.columnName = columnName;
    this.columnValueDepIdx = this.addDependency($columnValue);
  }

  setFirst($first) {
    this.firstDepId = this.addUnaryDependency($first);
  }

  async execute({ indexMap, values }) {
    // Retrieve the values for the `$columnValue` dependency
    const columnValueDep = values[this.columnValueDepIdx];

    // We may or may not have added a `$first` limit:
    const firstDep =
      this.firstDepId !== undefined ? values[this.firstDepId] : undefined;

    // firstDep, if it exists, is definitely a unary dep (!firstDep.isBatch), so
    // we can retrieve its value directly:
    const first = firstDep ? parseInt(firstDep.value, 10) : null;

    // Create a `LIMIT` clause in our SQL if the user specified a `$first` limit:
    const limitSQL = Number.isFinite(first) ? `limit ${first}` : ``;

    // Create placeholders for each entry in our batch in the SQL:
    const placeholders = indexMap(() => "?");
    // The value from `$columnValue` for each index `i` in the batch
    const columnValues = indexMap((i) => columnValueDep.at(i));

    // Build the SQL query to execute:
    const sql = `\
      select *
      from ${this.tableName}
      where ${this.columnName} in (${placeholders.join(", ")})
      ${limitSQL}
    `;

    // Execute the SQL query once for all values in the batch:
    const rows = await executeSQL(sql, columnValues);

    // Figure out which rows relate to which batched inputs:
    return indexMap((i) =>
      rows.filter((row) => row[this.columnName] === columnValues[i]),
    );
  }
}

function petsByOwnerId($ownerId) {
  return new RecordsByColumnStep("pets", "owner_id", $ownerId);
}

Notice that there's only a single await call in this step's execute method, and we already know the step is only executed once per request; compare this single asynchronous action with the number of promises that would need to be created were you to use DataLoader instead.

Not just databases!

The execute method is just JavaScript; it can talk to absolutely any data source that Node.js itself can talk to. Though the example shows SQL you could replace the executeSQL() call with fetch() or any other arbitrary JavaScript function to achieve your goals.

Simplified example

The code above was written to be a simple example; though it works (see full solution using it), it's not nearly as good as it could be — for example it does not track the columns accessed so that only these columns are retrieved, nor does it use lifecycle methods to determine more optimal ways of executing.

(Another thing: it passes the tableName and columnName values directly into SQL — it would be safer to use an escapeIdentifier() call around these.)

Step lifecycle

The execution plan diagram you saw above is the final form of the plan, there were many intermediate states that it will have gone through in order to reach this most optimal form, made possible by Grafast's lifecycle methods.

info

For more information about understanding plan diagrams please see Plan Diagrams.

For a fully working implementation of the above schema, please see the "users and friends" example.

This is just an overview, for full documentation see lifecycle.

All plan lifecycle methods are optional, and due to the always–batched nature of Grafast plans you can get good performance without using any of them (performance generally on a par with reliable usage of DataLoader). However, if you leverage lifecycle methods your performance can go from "good" to ✨amazing🚀.

One of the great things about Grafast's design is that you don't need to build these optimizations from the start; you can implement them at a later stage, making your schema faster without requiring changes to your business logic or your plan resolvers!

As a very approximate overview:

once a field is planned we deduplicate each new step
once the execution plan is complete, we optimize each step
finally, we finalize each step

Deduplicate

Deduplicate lets a step indicate which of its peers (defined by Grafast) are equivalent to it. One of these peers can then, if possible, replace the new step, thereby reducing the number of steps in the plan (and allowing more optimal code paths deeper in the plan tree).

Optimize

Optimize serves two purposes.

Purpose one is that optimize lets a step "talk" to its ancestors, typically to tell them about data that will be needed so that they may fetch it proactively. This should not change the observed behavior of the ancestor (e.g. you should not use it to apply filters to an ancestor — this may contradict the GraphQL specification!) but it can be used to ask the ancestor to fetch additional data.

The second purpose is that optimize can be used to replace the step being optimized with an alternative (presumably more–optimal) step. This may result in multiple steps being dropped from the plan graph due to "tree shaking." This might be used when the step has told an ancestor to fetch additional data and the step can then replace itself with a simple "access" step. It can also be used to dispose of plan–only steps that have meaning at planning time but have no execution–time behaviors.

In the "friends" example above, this was used to change the DataLoader–style select * from ... query to a more optimal select id, full_name from ... query. In more advanced plans (for example those made available through @dataplan/pg), optimize can go much further, for example inlining its data requirements into a parent and replacing itself with a simple "remap keys" function.

Finalize

Finalize is the final method called on a step, it gives the step a chance to do anything that it would generally only need to do once; for example a step that issues a GraphQL query to a remote server might take this opportunity to build the GraphQL query string once. A step that converts a tuple into an object might build an optimized function to do so.

Further optimizations

Grafast doesn't just help your schema to execute fewer and more efficient steps, it also optimizes how your data is output once it has been determined. This means that even without making a single change to your existing GraphQL schema (i.e. without adopting plans), running it though Grafast rather than graphql-js should result in a modest speedup, especially if you need to output your result as a string (e.g. over a network socket/HTTP).

Convinced?

If you're not convinced, please do reach out via the Graphile Discord with your queries, we'd love to make improvements to both this page, and Grafast itself!

If you are convinced, why not continue on with the navigation button below...

Grafast introduction

Plan resolvers​

Steps​

Batched execution​

Unary steps​

SQL example​

Step lifecycle​

Deduplicate​

Optimize​

Finalize​

Further optimizations​

Convinced?​

Plan resolvers

Steps

Batched execution

Unary steps

SQL example

Step lifecycle

Deduplicate

Optimize

Finalize

Further optimizations

Convinced?