Public replication endpoint taken offline due to security incident

Incident Report for npm

Postmortem

Around 7.15am PT on Thursday, July 2nd, we were notified by Jarrett Cruger and Terin Stock of unexpected data in their replication streams. We identified this extra data as scoped modules, including metadata about private modules. We decided this was a critical security issue, and halted the external replication stream at 7.30am. To the maximum extent possible we then purged the private data from the replication stream, and restarted the replica at 8.07am.

We believe the leak first began around 4pm PT on June 26th, and was completely halted by 8.10am on July 2nd.

What happened?

npm runs a public CouchDB replication endpoint all metadata in the public npm registry. This allows anyone to run a full replica of the registry, or any subset of it. This is very useful for people running major mirrors, such as our friends at CNPM who make it easier to access npm from inside the Great Firewall; npm Enterprise customers, who frequently use filtered subsets of the registry; and several thousand other replicas, run for businesses, hobbies, alternative npm search sites, and various visualizations of the registry. It's a very useful thing, and it's not going away.

When we launched Registry 2 earlier this year, we created the concept of scoped packages, and within scoped packages private packages. Scoped modules have a new naming scheme that is incompatible with earlier mirrors, so scoped packages were excluded from the replication stream entirely (we plan to create a new replication endpoint that includes public scoped modules in future).

To create this filtered stream, we turned off "standard" replication from our main CouchDB instances to the public replica and ran a modified follower that filtered out scoped packages and put them into the public replica. On June 26th at 4pm PT, as part of unrelated operations work, we re-ran an older ansible script which inadvertently re-activated standard replication on the public replica. Replication faithfully duplicated all information that had arrived since it had been switched off, which was replicated.

What information leaked?

Relative to information that was already in the existing public replication stream or the npm website, the information that leaked was:

all scoped package names, both public and private
full package JSON documents for all scoped packages

What's in a package JSON document? You can see a sample package. It includes:

package version numbers and publication dates
node and npm versions used to publish
shasums of package tarballs
URLs of git repos
README files

As you can see, links to the package tarballs are included in this metadata. However, tarballs are delivered by a separate system which continued to successfully check for authorization on downloads, so it was never possible to get the full package tarballs.

In our opinion, the most concerning information leaked is the READMEs, which can include sensitive information about package uses, etc., although the detail READMEs varies greatly from user to user, so many users were unconcerned.

What didn't leak

It's important to make a few things clear:

This was a security incident, but not the result of a malicious attack
As already mentioned, package tarballs (and therefore source code) were never available
User information such as passwords and billing information were in no way involved (npm user emails are public and have always been listed on the website)
The registry 2.0 system worked as designed; the problem was an operational mistake that re-activated part of registry 1.0

Why did this happen?

Fundamentally, this was an operational error rather than a design bug. As is usually the case with incidents of this kind, a number of things had to go wrong one after another:

The previous replication system was deactivated but not totally disabled
Operational automation was not fully up-to-date, allowing old configuration to reactivate unexpectedly
Network configuration within npm's LAN was insufficiently granular to prevent the public replica accessing private data when misconfigured
Monitoring of replication endpoints did not include checks for private information

How will we prevent this in future?

As an immediate step, the replication was halted and the public replica removed from the generic replication automation and put in an isolated role with its own configuration.

Going forward, we are making some longer term operational and process changes:

We are no longer allowing any manual configuration changes to production; everything must be in automation
We will be regularly running all automation roles to ensure they are always up to date
The public replica will be moved to a separate network zone that is physically unable to access private data
Monitoring and metrics will be enhanced to spot worrisome patterns such as private leaks

I have additional concerns

Please contact support@npmjs.com and we will get back to you immediately. If you believe there is a critical security issue, you can always contact security@npmjs.com.

Posted Jul 06, 2015 - 23:48 UTC

Resolved

Public skimdb is now repaired and back up.

Posted Jul 02, 2015 - 15:14 UTC

Identified

The public skimdb replication endpoint at skimdb.npmjs.com has been taken offline for urgent, unscheduled maintenance work. We are working to restore it as fast as possible.

Posted Jul 02, 2015 - 14:32 UTC