« Back to home

There are different approaches to shared code: libraries, frameworks, plugins, etc. (and "the React way", we will get to it a bit later)

But it turns out, it's very hard to write a good library, and it's very easy to write a bad library.

This problem is amplified nowadays, because general community doesn't understand dangers of adding too many features into one library and press on similarly unexperienced library authors to add more features to their libraries, which typically has adverse side effects...

How libraries grow into monsters and why it is a problem

Most of the libraries grow into huge projects that try to support all possible use cases. And surprisingly, these "monsters" have a good chance to become popular: typical user looks for something that supports their particular scenario, so the more features library provides, the more users will choose it, the more they will advertise it and so on.

But why is it so bad to have a big library that supports a lot of features? Besides the obvious frontend bundle size considerations, I think one big problem is that when library grows, complexity also grows. Growing complexity leads to:

  1. It becomes harder to read the code and understand what's going on
  2. It becomes harder to troubleshoot issues
  3. It becomes harder to maintain library security
  4. It becomes harder to maintain library performance
  5. And it becomes much easier to make a mistake

Every senior software engineer knows: it is not easy to fight complexity, you need to put a lot of effort into that. And for opensource libraries, which are made in spare time, it's even less easy.

So the funny conclusion from this is: popular libraries might be less secure and less performant than smaller ones, and sometimes even than the code that you would naively write from scratch yourself.

Example from my experience

I was using a popular pan-zoom library in my startup for about a year. It was a big library, over 5000 lines of code, and a very popular one. But I always had some issues with it. In our particular setup of a full-screen office map with some elements on top of this map, the library was buggy on some devices, for example on Macs when using touchbar (when using mouse it was working fine), and also on some mobile devices.

I was trying to fight these problems off for many weeks. Tried several different layouts, tried many different combinations of settings. Unfortunately, nothing worked exactly right. Even when it was working on Macs with touchbar, it would stop working on Windows in IE, and so on.

So what I did in the end, I wrote the core pan-zoom functionality from scratch (ok, to be completely precise, based on a small gist I found in the internet). The final result was 200 lines of code and it was working perfectly on ALL devices. Today, after 4 years in production, this solution still works without flaws, and I haven't had a need to touch it.

I have a lot of similar examples.

Ways to solve the issue

So, how do we prevent a library to become a monster library? Off the top of my head, I can think of at least three naïve approaches:

  1. Keep libraries very small and feature-poor.
  2. Create a plugin system and allow extending your library via plugins.
  3. Integrate your library into some existing standard so that it plugs into an existing ecosystem.

If we look closer, every one of them has some disadvantages though.

Keep libraries small

Small libraries rarely accept new features, they mostly accept performance, size and security improvements. They only do one thing, but do it well.

I really like this approach, but small libraries are typically very low level. For example, there's a library for creating GUIDs, or a library for padding strings... Great, but you need tens of such libraries to "construct" some business level solution out of them.

Second flavor of small libraries is libraries that are very niche. For example, let's say there's a JWT library that only supports one algorithm, only supports verification but not signing, and only checks exp claim. It might be appropriate for some scenarios, but there's no extension mechanism, so if you like this library but you need to also check iss claim, then you would need to once again decode and deserialize the payload (which was already done within this library), and then check the iss claim manually. Double deserialization can be costly in scenarios where performance is important.

Lack of extension mechanism is quite inconvenient for the users. Also, creating niche libraries is not very rewarding for the library authors, because such libraries cannot become as popular as feature-rich libraries can.

Plugin system

There are lots of popular libraries that use plugin systems. Webpack and rollup plugins, express middlewares, PostCSS plugins, eslint plugins, etc.

Some libraries are essentially collections of plugins, with a very small amount of their own code - in this case, plugin system could well be a very good approach. But in other cases, it doesn't work that well.

The main problem with plugin system is, with time, the plugin system itself can become very complex. The amount of hooks/handlers inside your code grows, you give more and more control to the plugins, you need to orchestrate plugins and sanitize their outputs, you need to correctly handle different combinations of plugins, you need to ensure that plugins are always compatible with each other, you need to somehow enforce performance and security, etc.

Integrate your library into existing ecosystem

Classical example of what I mean by existing ecosystem would be Unix pipes. Every Unix/Linux terminal program will typically use pipes to interact with the environment. You get some inputs, and you can produce some outputs, and usually it means text (so you can read it in the terminal). So if you can read from the pipe and write to the pipe, then you're eligible to become a part of the ecosystem. Simple and powerful!

And the application of Unix pipes for the web would be CGI, of course.

So "existing ecosystem" here is a set of standards that you need to conform to in order to become part of the system.

One special case of such ecosystem could be a framework. For example, you can write a Vue.js component and publish it on Github, and people know how to plug it into their application.

But what would be an example of an ecosystem for creating an extensible library? So kind of a "framework for frameworks"?... Unfortunately, I don't think such thing exists yet.

But, there's another way...

The React way!

There are several things that React really got right, in my opinion. For example, JSX, which was initially very controversial (and I didn't like it myself), but I think it's clear now how great it is (and the greatest thing about it, in my opinion, is that it is 100% covered by the intellisense). Another thing that I really like, it's the approach to extending React.

I call it "creating a vacuum".

For example, React initially didn't have any mechanism for managing state centrally (nowadays it has Context). You could pass params through the chain of components, and it would work for simple applications, but a central state? Just no such thing. None. Vacuum.

So to fill this vacuum, people created Redux, Mobx, Recoil and a bunch of other solutions.

The main thing about those, is that they don't really integrate into React, rather, they exist independently. React doesn't expose any APIs to which a state manager would connect to, it doesn't require any "middleware registration", nothing. You just put it next to React, and start using, they exist in parallel, they don't touch each other, but they still serve the common goal and create an "integrated whole" together.

But what is the secret? How does React achieve this exact kind of vacuum which is not a lack of a feature, but rather an opportunity to create a feature? And why nobody else does it this way?

Creating the vacuum

So let's take some libraries that use plugin systems, and imagine, how would they look if they instead used the React way of creating the vacuum...

Express.js middlewares

With Express, our goal would be to keep the possibility to create middlewares, but without explicitly registering them via app.use().

Disclaimer: there's nothing wrong with Express middlewares, I am simply using them to explain the concept.

Normal Express application would look something like this:

var express = require('express')
var bodyParser = require('body-parser')

var app = express()

app.use(bodyParser.json())

app.use(function (req, res) {
    res.setHeader('Content-Type', 'text/plain')
    res.end("Your name is: " + req.body.name)
})

app.listen(3000);

Let's break up what Express does behind the scenes: whenever a new request arrives, Express will run it through all the middlewares it has, and then pass to the request handler through the router.

But what would be an alternative way to execute a middleware that Express so that Express is not even aware of it? Right, just execute this same middleware inside of the request:

var express = require('express')
var bodyParser = require('body-parser')

var app = express()

var parseJson = bodyParser.json();

app.post("/test", function (req, res) {
    parseJson(req, res, () => {
        res.setHeader('Content-Type', 'text/plain')
        res.end("Your name is: " + req.body.name)
    })
})

app.listen(3000);

And yes, it works just fine, even though we never registered this middleware via app.use():

$ curl -X POST -H 'Content-Type: application/json' \
     -d '{"name":"Andrei"}' localhost:3000/test
Your name is: Andrei

Of course, in it's current form, it's not very convenient, especially if we have more than one API :)

But it's very easy to improve. For example, we could do something like this for a start:

var express = require('express')
var bodyParser = require('body-parser')

var app = express()

var parseJson = bodyParser.json();
var postWithBody = (route, handler) => {
    app.post(route, (req, res) => {
        parseJson(req, res, () => handler(req, res))
    });
}

postWithBody("/test", function (req, res) {
    res.setHeader('Content-Type', 'text/plain')
    res.end("Your name is: " + req.body.name)
})

app.listen(3000);

And so on, we could then add error handling and whatever else is needed. And now you probably think, but wait, aren't we going to eventually re-implement the middlewares engine that is already in Express?

Kind of yes, I agree. But there's a couple of nuances:

  1. We no longer need this functionality to be provided by Express
  2. Very likely, we can reuse exactly same code with other web server libraries
  3. There's much more flexibility (e.g. you can use different middlewares for different requests, etc.)

Again, there's nothing wrong with Express middlewares, but you can probably see now that we can totally live without that code bundled into Express, which slightly reduces complexity of Express itself, improves flexibility and interoperability, etc.

And if the plugin system of the library was more complex, we would get even more benefits.

ESLint plugins

Now, we probably got a bit lucky with Express, because the middlewares can also be run inside of the request handler, and request handler receives all the data necessary for the middleware execution.

But let's say with ESLint or Babel plugins, you need to plug your code into the middle of AST processing, we might not able to do it that easily, right? Let's see...

A simple ESLint plugin looks something like this:

module.exports = {
  rules: {
    "async-func-name": {
      create: function (context) {
        return {
          FunctionDeclaration(node) {
            if (node.async && !/Async$/.test(node.id.name)) {
              context.report({
                node,
                message: "Async function name must end in 'Async'"
              });
            }
          }
        }
      }
    }
  }
}; // this amount of nested brackets almost reminds me of Lisp :D

And then you would also need to add some code to .eslintrc, defining how ESLint will use rules from this plugin.

{
  "parserOptions": {
    "ecmaVersion": 2018
  },
  "rules": {
    "my-eslint-rules/async-func-name": "warn"
  },
  "plugins": ["my-eslint-rules"]
}

Obviously, the trick that we did with Express, will not work here.

So, we need to think about how would we redesign ESLint, so that explicit plugin registration is not needed.

Rules as independent applications

One (naïve) approach to get rid of the plugin registration here would be to create an independent application for each rule. This application would traverse the filesystem, read the files, parse AST, analyse it, and show the warning. And we can use some libraries for all this common code for parsing AST, etc.

But of course this won't work well in practice, too slow (imagine executing tens of rules against project with thousands of files - such setup is actually quite common for ESLint).

So we need to improve. What else can we do?

Unix pipes

Another (random) idea would be to use something like Unix pipes, e.g.

eslint --ext .js,jsx src/ | async-func-name --warn | another-rule | one-more-rule | ...

In this example, eslint would traverse the filesystem, read the files and stream the result to the rules. Rules will need to parse AST, echo the input, and drop warnings to stderr...

And actually... we might not need eslint anymore, it can be replaced by standard find:

find /src \( -name "*.js" -o -name "*.jsx" \) -exec cat {} + | rule1 | rule2 | rule3 | ...

Ok, this still means that rules have to parse AST many times. To avoid that, we could compile files to binary AST format before passing them to the rules.

And finally, we can run rules in parallel, and also "bundle" them into presets, all with standard Unix commands.

find /src \( -name "*.js" -o -name "*.jsx" \) -exec binjs {} + | parallel --pipe --tee < my-rules-preset.txt

This approach already provides some benefits:

  • Complexity of ESLint library is greatly reduced. We might still want to ship a handy bash script so that we don't need to remember these shell commands, and maybe a library to use in the rules for reading the AST, but both of these components are very generic and can be used for a number of purposes
  • Rules can be used independently in any imaginable scenario

Of course, in practice, this is still quite slow and too resource-consuming. Remember, we will have to run a separate process for each rule, that would be tens of processes. And in overall, it's very Unix-y and not very librar-y anymore :)

Can we do better?

The magical disappearing library

Well, the right approach would be, I think, something like this:

const find = require("find");
const astParser = require("ast-parser");

const rule1 = require("rule1");
const rule2 = require("rule2");
const rule3 = require("rule3");

const rules = [rule1, rule2, rule3];

find.eachfile(/\.jsx?$/, __dirname, (filePath) => {
    const ast = await astParser.parse(filePath);
    rules.forEach(rule => rule(ast));
});

So yep, pretty similar to Unix pipes, but in JS... ¯\_(ツ)_/¯

The point is, we don't have a library for doing linting, rather, there are completely independent lint rules, and then there are some utilities: for traversing the file system and for parsing the file into AST. Then, if we have a unified parsed AST format (such as ESTree), we would be able to reuse these utilities for other purposes.

One practical example: now we can combine ESLint and Prettier rules without a need to traverse the file system two times. Also, we can apply certain rules only to JSX files, and so on - so much flexibility!

Also the fact that the library "disappears" kind of makes sense if we think of ESLint as a collection of linting rules.

Conclusion

When you create your next library, think of it. Can you make it really open? Can you create a vacuum, rather than closing off you library by introducing plugins or middlewares? It's kind of a mindshift, but I think the world could become a bit better if people would use "the React way" more often :)

Comments

comments powered by Disqus