Design

Design

The Query Engine (Rust)

The core engine is written in Rust, and is designed to support multiple languages. It is responsible for:

  • Parsing source code into an AST.
  • Creating graphs from source code.
  • Responsing to queries from the adapters.
  • Managing the cache.

turbo-tasks

The core engine is based on turbo-tasks, which is a incremental computation framework written in Rust that supoprts persistent caching. Note that the caching includes the task graph itself. So if you change the code, the task graph will be re-computed, but the task graph itself is cached. This is not common in other incremental computation frameworks.

Caching the task graph

turbo-tasks is designed to support huge codebases. If you work for a huge company, you may know... that the disk IO becomes the bottleneck for initial startup. Although parsing is very fast and emberassingly parallel, there are too many files to parse just to draw the dependency graph.

node-file-trace

You can easily auto-detect ESM import/exports, but for users, it is not enough. The behavior of the program may depend on other kinds of files. For example, it may read a SQL file and execute it. It may read a JSON file and parse it. It may read a .env file and load the environment variables. Of course, taskend provides APIs to explicitly declare the dependencies, but it is not convinient for users. So Vercel did a great job to create node-file-trace, which is a tool to detect all the dependencies of Node.js programs.

We use it. There's alreay a Rust implementation of it, so we can easily use it in our Rust code.

Dynamically sized compile unit

For ECMAScript, we can easily track all dependencies of a file. But for other languages, it is not easy. So we have various kinds of compile units. For example, for a Rust project using cargo, a crate is a compile unit. The compile unit is the base unit of dependency query, and test execution. It means, if you change a file in a cargo project, all dependent crates will be re-compiled and re-tested. You may think this is not efficient, but it's about correctness, and it's not that bad. For example, the CI time of the SWC project is largely dominated by the time for swc_plugin_runner crate. But it's not changed frequently even indirectly. So with taskend, we can skip the test of swc_plugin_runner crate in the most cases, and it's a huge win.

Special thanks