Around 7.15am PT on Thursday, July 2nd, we were notified by Jarrett Cruger and Terin Stock of unexpected data in their replication streams. We identified this extra data as scoped modules, including metadata about private modules. We decided this was a critical security issue, and halted the external replication stream at 7.30am. To the maximum extent possible we then purged the private data from the replication stream, and restarted the replica at 8.07am.
We believe the leak first began around 4pm PT on June 26th, and was completely halted by 8.10am on July 2nd.
npm runs a public CouchDB replication endpoint all metadata in the public npm registry. This allows anyone to run a full replica of the registry, or any subset of it. This is very useful for people running major mirrors, such as our friends at CNPM who make it easier to access npm from inside the Great Firewall; npm Enterprise customers, who frequently use filtered subsets of the registry; and several thousand other replicas, run for businesses, hobbies, alternative npm search sites, and various visualizations of the registry. It's a very useful thing, and it's not going away.
When we launched Registry 2 earlier this year, we created the concept of scoped packages, and within scoped packages private packages. Scoped modules have a new naming scheme that is incompatible with earlier mirrors, so scoped packages were excluded from the replication stream entirely (we plan to create a new replication endpoint that includes public scoped modules in future).
To create this filtered stream, we turned off "standard" replication from our main CouchDB instances to the public replica and ran a modified follower that filtered out scoped packages and put them into the public replica. On June 26th at 4pm PT, as part of unrelated operations work, we re-ran an older ansible script which inadvertently re-activated standard replication on the public replica. Replication faithfully duplicated all information that had arrived since it had been switched off, which was replicated.
Relative to information that was already in the existing public replication stream or the npm website, the information that leaked was:
What's in a package JSON document? You can see a sample package. It includes:
As you can see, links to the package tarballs are included in this metadata. However, tarballs are delivered by a separate system which continued to successfully check for authorization on downloads, so it was never possible to get the full package tarballs.
In our opinion, the most concerning information leaked is the READMEs, which can include sensitive information about package uses, etc., although the detail READMEs varies greatly from user to user, so many users were unconcerned.
It's important to make a few things clear:
Fundamentally, this was an operational error rather than a design bug. As is usually the case with incidents of this kind, a number of things had to go wrong one after another:
As an immediate step, the replication was halted and the public replica removed from the generic replication automation and put in an isolated role with its own configuration.
Going forward, we are making some longer term operational and process changes:
Please contact support@npmjs.com and we will get back to you immediately. If you believe there is a critical security issue, you can always contact security@npmjs.com.