published packages not visible / installable
Incident Report for npm
Postmortem

A quick post-mortem on this bug: As you may have heard, Amazon Web Services' EC2 service is currently doing a fleet-wide reboot. While our architecture is mostly redundant and therefore not affected by the reboots, we do have a single write root which is scheduled to go down over the weekend.

Rather than subject users to the risk of publish downtime, we are in the process of migrating our write root machine to new hardware. Unfortunately while doing so we accidentally misconfigured our replication topology, which meant that while publishes were successful, they did not reach all the read-only "leaf" nodes. Fortunately, once identified this was a very quick problem to fix, but it took about 45 minutes for us to notice for the very simple reason that the ops team had taken a break for lunch.

We already monitor replication on all our hosts, but this incident didn't trip those alarms because every host was replicating -- just not from the right place. In the short term, we will be implementing a new monitoring check that checks the topology of replication is intact as well.

In the longer term, we are moving away from our current single-write-root architecture to remove this single point of failure. This work is already underway and is scheduled to launch by the end of the year.

Posted Sep 25, 2014 - 20:10 UTC

Resolved
All replicas are now back up to date.
Posted Sep 25, 2014 - 20:02 UTC
Update
The configuration error responsible has been identified and is being fixed.
Posted Sep 25, 2014 - 20:00 UTC
Identified
The primary npm, Inc. registry is having issues replicating newly-published packages between its write master CouchDB and its read replicas. Packages publish successfully, but are not installable via the npm client because they have not yet been published out to the replicas. We are investigating the situation and hope to have it resolved shortly.
Posted Sep 25, 2014 - 19:44 UTC