The Ultimate Guide to yarn.lock Lockfiles
The npm ecosystem is a big reason why JavaScript has taken off like a rocket in development communities. The ability to npm install
modular bits of code and compose them together has been a massive boost of productivity for developers.
However, this modularity introduces its own problems: packages need a way to specify their requirements for what other packages they need to work properly. This is the problem package managers like npm aim to solve.
For a time, npm was really the only solution for JavaScript package management. It worked well enough but it wasn't perfect. Facebook for example experienced a number of issues scaling npm to meet the needs of their impressively large engineering team and in response, they built an alternative and Yarn was born.
What is yarn.lock
?
One of the innovations introduced by Yarn is the lockfile (called yarn.lock
). This generated file describes a project's dependency graph: direct dependencies, child dependencies, and so on. It's a one-stop-shop describing everything your project installs when you run yarn install
.
Another feature of yarn is it acts as a security measure by recording a checksum of installed files. That way you can be confident some bad guy isn't sneaking in malicious code.
In short, the lockfile contains all information necessary to ensure you're always installing exactly the same dependencies every time on every machine.
This article is your guide to the in's and out's of the yarn.lock
lockfile. We will discuss the anatomy of a lockfile entry, best practices for managing your project's lockfile, and why these concepts are important.
However, to best understand the value lockfiles bring, we first need to understand the concept of dependency graphs.
What is a dependency graph?
Throughout this article, we will be using one of my favorite npm packages as an example: @testing-library/react
. If you haven't used it before, no problem. We won't be discussing how the library works; just how the project manages dependencies. (That being said, it's one of the best dang testing libraries I've ever used. I highly recommend it!)
If we look at the package.json file for this project, as of writing we see the following dependencies:
"dependencies": {
"@babel/runtime": "^7.12.5",
"@testing-library/dom": "^8.5.0",
"@types/react-dom": "*"
}
This is a list of dependencies @testing-library/react
depends on to function properly; without these, parts of the library (or the entire thing) just won't work.
What this list doesn't tell you are the dependencies that these dependencies rely on. If we were to dig into the package.json
files for these dependencies, we would find even more dependencies.
In fact, we could build out a graph of dependencies by following the rabbit trail of package.json
files. npm.anvaka.com is a tool that visualizes a project's entire dependency graph. If we plug in @testing-library/react
we get this visualization:
(You can play around with this dependency graph yourself here.)
While initially it appears @testing-library/react
has 3 dependencies, it in fact has 34 total dependencies including transitive dependencies (aka, dependencies of dependencies). The totality of this graph comprises library's "dependency graph" and it's this graph that the yarn.lock
lockfile captures.
Anatomy of a yarn.lock lockfile
A yarn.lock
lockfile describes a project's dependencies as well as its transitive dependencies. Each entry in a lockfile has a similar shape and definition with several important attributes. Let's take a closer look at one of these entries.
In the following image, we have installed @testing-library/react
into our own JavaScript and get the following entry in our yarn.lock
:
Dependency Name
This is the name and requested version of the dependency as defined in your project's package.json
or one of your project's dependencies' package.json
. Since yarn.lock
is a flattened list of all dependencies that your projects needs to run, transitive dependencies are defined at the same level as dependencies your project defines directly.
This line may contain multiple entries if multiple versions of the same package are requested in different pacakge.json
files.
For example, our project may directly take a dependency on @testing-library/react@^12.1.2
but one of our project's dependencies has a dependency on @testing-library/react@^12.0.0
. In that case the yarn.lock
file would generate something like:
"@testing-library/react@^12.1.2", "@testing-library/react@^12.0.0":
But the rest of the entry would be exactly the same. This is yarn saying, "both of these dependencies can actually use the same version of this dependency." Yarn determines whether two versions can share a resolved dependency via semantic versioning.
A deep dive on semantic versioning is out of scope for this article. What's important from yarn's perspective is as long as two dependency versions are semantically the same, they can share the resolved dependency. (aside: npm has a handy semantic version calculator that is great for playing around with this concept!)
Resolved Version
The resolved version is what version of a dependency was actually installed. Prior to yarn.lock
being generated this is determined by the semantic versioning rules in package.json
. This means the resolved version could differ from the number specified in the dependency name.
In the example above, the package.json
specifies ^12.1.2
. If @testing-library/react
releases 12.1.3
, the ^
means we are open to installing the new patch version. However, it's important to remember that once a resolved version is specified in yarn.lock
that will always the version installed whenever you run yarn install
. Your project is locked in to that version.
Installation URL
This is the URL yarn uses to fetch your dependency. By default, this will use registry.yarnpkg.com
.
You can also specify a different yarn registry using the --registry
CLI flag, or by defining a registry <registry_url>
in your .yarnrc
file. The latter option will make sure yarn always resolves to the specified registry. The CLI flag will only set the registry value for that one CLI command meaning you'll need to set the value every time.
For example, you can set your registry to the default npm registry by adding the following line in a .yarnrc
file in the root of your project:
registry "https://registry.npmjs.org/"
Changing this in many cases isn't necessary because the yarn registry is a reverse proxy of the npm registry. That means any package that is on the npm registry can also be installed via the yarn registry.
One common use case for changing the registry is if your team has it's own internal registry for private packages that shouldn't be shared with the public. In that case, the only way to install such packages is to set the registry value to your team's registry value.
Integrity Hash
The integrity hash value in a yarn lock entry is critical to the security of your project. As part of the generation of the yarn.lock
file, yarn will compute a hash value for each dependency install based on the contents that were downloaded. Next time you download that dependency, yarn generates the hash again.
If there is a mismatch between the new value and the value stored in yarn.lock
, yarn will throw an error that looks like this:
Integrity check failed for <package-name> (computed integrity doesn't match our records, got "<integrity-hash-value>")
And the entire installation will abort.
The reason yarn generates these hashes and compares them at installation time is to prevent potential bad actors from tricking you into installing malicious code.
For example, imagine an author publishes a library at version 1.1.1
. This library advertises that it does something simple like adding two numbers together. We add this dependency and verify it does what it says on the box. Perfect!
Our yarn.lock
will track the dependency in yarn.lock
and store a hash of what was installed.
A few weeks later, turns out the author of this library is a bad guy and adds to their library a script that logs credit card details on any site that uses this library. But instead of publishing a new version, they swap out the file stored for version 1.1.1
with this new malicious code.
Next time we go to install this dependency in our project, yarn will go look at the installation URL, download the file contents and generate a hash. Before now, the hash generated always matched what was in yarn.lock
because the downloaded contents were always exactly the same. But, now the contents have changed so the generated hash will be different causing installation to fail with an error like we saw above.
In this scenario, yarn has saved us from installing and using a library that has been hijacked with malicious code. While not every instance of a hash failing automatically means there is a malicious actor, this is a very important aspect of the yarn.lock
that will make users aware of some funny business going on and prompt them to investigate further.
How to fix integrity check failed
The most important first step is to verify that the dependency is still safe to use. Often the best way to do this is to check where the project is hosted (ie, GitHub) and see if others are seeing the same issue. There will often be a discussion detailing either the mix up or how the library has been compromised.
Once you've verified the library is still safe, you can uninstall the dependency (which will remove the entry from yarn.lock
) and reinstall to add the library back with an updated hash.
Package Dependencies
When you install a dependency, that dependency will often include its own dependencies in its package.json
. In the example above, @testing-library/react
has two dependencies: @babel/runtime
and @testing-library/dom
. The yarn.lock
also tracks which versions should be requested via semantic versioning.
The package dependencies are are a list of dependencies that package must have available in order to work properly. This is important because we want to share dependencies as much as possible and not duplicate code. Code duplication leads to bloated bundle size and unexpected behavior (like duplicate instances of react).
Remember that all entries in yarn.lock
are flattened into a single list of dependencies. This means even if your project doesn't take on a dependency directly, the dependencies you do take may themselves require dependencies. yarn.lock
tracks all of this in a single file where you can see all of these relationships.
Optional dependencies
A yarn.lock
entry may also include optionalDependencies
. The yarn docs sum this up nicely:
Optional dependencies are just that: optional. If they fail to install, Yarn will still say the install process was successful.
This is useful for dependencies that won’t necessarily work on every machine and you have a fallback plan in case they are not installed (e.g. Watchman).
An example of this can be found in the jsonfile package. The goals of this package aren't relevant. But if we look at the package's package.json
, we see a declaration for graceful-fs
as an optional dependency:
"optionalDependencies": {
"graceful-fs": "^4.1.6"
}
If we look at how graceul-fs
is used we see the following:
let _fs
try {
_fs = require('graceful-fs')
} catch (_) {
_fs = require('fs')
}
The graceful-fs
dependency can safely be considered optional because the package will fallback to Node's built-in library, fs
, if graceful-fs
is not installed.
In summary, yarn will always attempt to install optionalDependencies
entries. But, if one or more fails fails – either due to an incompatibility with your project, your operating system, or otherwise – yarn will continue installation instead of aborting entirely.
How to visualize your dependency graph
As mentioned earlier, the yarn lockfile includes all information necessary to describe how your project's dependencies interact with each other. While you can manually follow the dependency chains (or use a visualizer) to figure out why a dependency was included, you can also run a command, yarn why <package-name>
, to get a breakdown of the dependency tree.
Here's an example: If we had a brand-new create-react-app project, we see under node_modules
there is a folder for lodash
even though lodash
is not a direct dependency of the project.
Running yarn why lodash
outputs something like this:
[1/4] 🤔 Why do we have the module "lodash"...?
[2/4] 🚚 Initialising dependency graph...
[3/4] 🔍 Finding dependency...
[4/4] 🚡 Calculating file sizes...
=> Found "lodash@4.17.21"
info Reasons this module exists
...
- Hoisted from "react-scripts#html-webpack-plugin#lodash"
...
I've omitted some of the output for brevity as the important bits are the shape of the "Hoisted from..." logs. What this is telling us is lodash
is installed because:
html-webpack-plugin
requireslodash
react-scripts
requireshtml-webpack-plugin
- our create-react-app project requires
react-scripts
directly
Sometimes you might get a "root" dependency that isn't one of your project's direct dependencies. In this case, you can then run yarn why
on that root dependency until you get to one of your project's direct dependencies.
Should you manually modify yarn.lock?
In short: no. The lockfile is a generated file that is managed entirely by yarn. If try to edit the contents yourself, you run the risk of invalidating the lockfile, possibly causing installation to fail.
For example, as we discussed with integrity hashes, modifying this value to the incorrect value could result in yarn throwing an error because yarn thinks a dependency has been incorrectly modified.
One scenario you might be tempted to manually "fix" the lockfile is when you have a merge conflict. Rather than trying to do this yourself, yarn can fix it for you automatically.
How to fix yarn.lock merge conflicts
Since yarn 1.0, yarn has had built-in support for automatic resolution of git merge conflicts in the lockfile. While it's possible to fix these merge conflicts yourself, for the majority of cases it's not necessary.
When rebasing your branch and runing into a merge conflict, do the following:
- Manually fix conflicts in
package.json
- Run
yarn install
yarn will take the fixes made in package.json
and determine the correct way to resolve conflicts so that the lockfile reflects the state of the package.json
file.
Should you regenerate yarn.lock?
When you run into lockfile merge conflicts, you may be tempted to blow away the yarn.lock file and start from scratch. While this would technically work, you are now taking on the responsibility of updating every package that has an update since you last generated the lockfile.
The issue is with semantic versioning. As discussed earlier, the lockfile makes sure the exact same version of a package is installed every time. Let's look at an example:
- Your project specifies a dependency with version
^1.0.0
. - The first time you generate a lockfile, the resolved version for this package is
1.0.0
. - This package publishes version
1.1.0
. - You regenerate your lockfile which updates the resolved version to
1.1.0
.
In some cases, this may be fine. However, multiply this scenario by every one of your packages (as well as all your dependencies' dependencies and so on) and you could unintentionally be updating 100s of packages at once. That is a lot of verification to do (I hope you have really good tests!).
Personally, I prefer to be in control of when my packages update. That means:
- Don't regenerate
yarn.lock
- Use auto merge conflict resolution
- Be explicit about updating dependencies by only updating through your project's
package.json
. - Use exact versions (eg, dont use
^
,~
). This will make sure that even if you regenerate your lockfile, you'll install the same dependency version (note, this doesn't help with sub-dependencies where other package.json files may specify "loose" versions).
Should you commit yarn.lock?
Every project using yarn should commit the yarn lockfile to source control. The lockfile is the source of truth for telling other developers how to install dependencies for your project.
Without this lockfile, other developers will be at risk for installing the wrong packages. This could lead to any number of incompatibilities where there is a version mismatch and the project won't build or run. You don't want to be stuck saying "works on my machine!".
What about Yarn 2 and beyond?
As of writing, the fact is versions of yarn beyond yarn@1.x
have failed to gain traction with the community. As far as I know, many of the concepts covered in this doc are also applicable to newer versions of yarn. But since they have less usage, and I have less experience, any differences between versions are out of scope for this article.
Conclusion
Yarn is a wonderful tool for managing your project's dependencies. It solved a number of issues folks had with npm regarding speed and security by introducing lockfiles. (Though it's worth noting that npm has come a long way since then with features similar like package-lock.json
).
Because of these enhancements, Yarn has proved to be a big boost to productivity and an essential tool in my developer toolbelt.