February 2017

www search 503
we found some invalid search queries causing increased load on our search cluster. we are adding additional filtering and should be out of the woods.
Feb 16, 14:35-16:00 UTC

January 2017

some readmes are unavailable on npmjs.com
README rendering of new package publications was lagging behind, due to a production database running out of space. We have corrected this issue, and will be adding monitoring to prevent similar issues in the future. The database that ran out of space stored only README information, which has now caught up. No data was lost.
Jan 26, 13:09-16:50 UTC
Increased 503 rates for users in Los Angeles region
Our monitoring indicated increased 503 rates originating from our CDN provider's LAX Point of Presence, starting at ~13:03 UTC up to ~13:40 UTC.
Jan 25, 13:44 UTC
increased origin latency resulting in burst of 503s
briefly, between (2:54 - 2:58PM PDT) we saw a increase in origin latency, resulting in increased 503s from the registry; the problem has been resolved and we are investigating the underlying cause.
Jan 13, 23:03 UTC
website not responding
All services back to normal.
Jan 5, 04:24-05:23 UTC

December 2016

Website unavailable
Website is back up and we're working on resolving the root cause of the issue.
Dec 25, 13:42-14:13 UTC

November 2016

Downloads API unavailability
Downloads API is now back online. The unavailability was caused by an outdated SSL certificate being deployed to the machine handling api.npmjs.org traffic, which, due to a misconfiguration, our monitoring did not catch in time.
Nov 22, 10:38-11:12 UTC
increased incidence of 404s for recently-published packages
We believe everything is back to normal.
Nov 17, 18:29-23:39 UTC
Elevatse 503s from SIN pop
issue seems to be resolved
Nov 7, 10:35-12:05 UTC

October 2016

download counts API server replaced
we have replaced and tuned-up https://api.npmjs.org/downloads/point/last-month, such that weekly and monthly statistics will again return results.
Oct 28, 17:57 UTC
replicate.npmjs.com replaced
Today we replaced replicate.npmjs.com, the server that provides a public replication endpoint for npm packages. The new server is running on faster hardware, which should help solve stability issues that we have seen over the past several weeks. Users consuming the CouchDB _changes feed should reset their sequence number.
Oct 21, 21:53 UTC
Connectivity disruption for users in certain areas
Our systems are back to full operation. Local DNS issues are still lingering for some users.
Oct 21, 12:10-20:47 UTC
skimdb.npmjs.com has been replaced
Today we replaced skimdb.npmjs.com, the server that provides a public replication endpoint for un-scoped npm packages. The new server is running on new hardware, which should help solve stability issues that we have seen over the past several weeks. Users consuming the CouchDB _changes feed should reset their sequence number.
Oct 18, 20:56 UTC
Increased 503s for the npmjs.com website
A blip on a package metadata box left zombie connections on a DB box, consuming needed connections & causing 503s on the website. The old connections have been flushed and the service appears to have recovered.
Oct 10, 04:31-05:41 UTC
replicate.npmjs.com outage
Friday afternoon, between the hours of 2PM PDT and 4PM PDT, we experienced a partial outage of the registry's scoped replication endpoint. All systems are back online, and we have added additional monitoring to detect this category of failure faster in the future.
Oct 7, 23:17 UTC

September 2016

Increased 503 rate on tarball downloads served from Sydney
The 503 rate has returned to nominal levels.
Sep 28, 04:59-05:45 UTC
Increased 503 rates for European users
Our monitoring reports that error rates returned to base levels and both the website and the registry should be accessible to European users as usual.
Sep 14, 07:46-08:08 UTC
elevated rates of 500s from the registry
50x error rates for the registry appear to be back to normal.
Sep 2, 18:12-18:49 UTC

August 2016

Increased 503 rates for European users
We've replaced affected hardware and European users should be going through our European infrastructure again.
Aug 28, 14:58-17:51 UTC
public-skimdb is offline while we investigate a configuration error
skimdb.npmjs.com is back in service. The sequence number advertised by couchdb has changed. Your follow scripts might need to be updated.
Aug 24, 20:22-23:35 UTC
"fs" unpublished and restored
For a few minutes today the package "fs" was unpublished from the registry in response to a user report that it was spam. It has been restored. This was a human error on my (@seldo's) part; I failed to properly follow our written internal process for checking if an unpublish is safe. My apologies to the users and builds we disrupted. More detail: the "fs" package is a non-functional package. It simply logs the word "I am fs" and exits. There is no reason it should be included in any modules. However, something like 1000 packages *do* mistakenly depend on "fs", probably because they were trying to use a built-in node module called "fs". Given this, we should have deprecated the module instead of unpublishing it, and this is what our existing process says we should do. If any of your modules are depending on "fs", you can safely remove it from your dependencies, and you should. But if you don't, things will continue to work indefinitely.
Aug 23, 20:34 UTC
Increased 502 and 503 rates for the website
At 7:55 UTC our monitoring alerted us to increased 502 and 503 rates for the website. We've determined the culprit to be a stuck Node.js process and restarted it at 8:12 UTC, which fixed the issue immediately. We will continue investigating the root cause of this problem.
Aug 19, 10:20 UTC
Intermittent website timeouts and 503s
Starting at 22:06 UTC our monitoring started reporting increased 503 and timeout rate originating from the website. We detected what we think was an accidental mis-use of an endpoint that was creating extraordinary load. We have blocked the IP responsible and are contacting the user involved. This was resolved at 23:15 UTC.
Aug 15, 22:52 UTC
[Scheduled] skimdb.npmjs.com IP address change & sequence number jump
Cutover is complete. DNS will take time to propagate, so the older hardware is still serving the unscoped public registry database and will continue to do so until Monday.
Aug 6, 02:01-02:13 UTC
skimdb.npmjs.com unavailable
The new skimdb.npmjs.com should now be fully accessible.
Aug 5, 11:39-15:04 UTC
Intermittent website timeouts and 503's
Starting at 11:13 UTC our monitoring started reporting increased 503 and timeout rate originating from the website caused by increased crawler traffic. While investigating the issue we discovered that some of our services weren't configured to load balance across database servers, which caused one of them to become overloaded due to increased traffic. We reconfigured our services to alleviate that and saw error and timeout rates drop to base levels around 12:46 UTC.
Aug 3, 13:19 UTC

July 2016

Increased 503 rates
Our monitoring indicates that no more burst were registered after 14:23 UTC. We are still investigating the root cause.
Jul 26, 14:45-16:44 UTC
Increased 503 rates
We confirmed that 503 rates have dropped down to base levels for an extended period of time. No root cause has been determined yet, but we will continue investigating.
Jul 25, 13:19-17:08 UTC
Slew of 503s on npmjs.com
The npmjs.com website was down for up to 15 minutes today, between 10:30 AM PT and 10:45 AM PT. This was due to a web spider, which slammed the website with a 500% spike in requests in the span of that time. We have identified the bot and banned its user agent.
Jul 20, 18:10 UTC
Reported 502s serving tarballs to some regions
Issues with tarball serving should be resolved now. A writeup of the outage will be made available on the npmjs.org blog.
Jul 6, 13:38-17:01 UTC

June 2016

Some couchdb hosts are behind
The couchdb in question rebooted; registry package metadata should be up to date.
Jun 16, 22:38-22:46 UTC
Sydney EC2 outage disrupting registry traffic
we were seeing an increased number of 503s in Sydney POPs, as a result of an ongoing EC2 outage
Jun 5, 07:12 UTC
Access cache issues preventing access to private modules
Flushing the cache appears to have resolved the issue.
May 31, 21:45 - Jun 1, 00:10 UTC

May 2016

No incidents reported for this month.

April 2016

Errors with authenticated actions
This incident has been resolved.
Apr 26, 00:12-01:12 UTC
package tarballs availability in us-west
metadata has been cleaned and failed versions have been removed where possible.
Apr 15, 20:54 - Apr 16, 02:00 UTC
garbled package metadata delivered to old npm clients
We believe all package data with incorrect caching headers has been replaced in cache.
Apr 6, 20:15 - Apr 7, 03:36 UTC

March 2016

Partially-failed publications earlier today
Failed versions have been cleaned up from metadata for nearly all of the packages affected. We were unable to repair 23 newly-published packages and will be in contact with the publishers.
Mar 25, 18:39-22:57 UTC
Increased publish error rates
We've removed affected hardware from the publish path, and are working on adding additional capacity. Publishes should be back to normal. Our apologies.
Mar 25, 16:43-16:48 UTC
Increased 502 rates for publishes
We've confirmed that the issue is now fixed, and that publishes work across the board again. Our apologies.
Mar 24, 18:33-19:20 UTC
increased publish latency
We've confirmed that the issue is now fixed, and that publishes are back to normal. Our apologies.
Mar 24, 17:16-17:46 UTC
Website and private package installation issues
The failed hardware has been replaced and services have been switched back into redundant mode.
Mar 24, 00:43-02:52 UTC
increase 503 rate from our Australia based tarball servers
Errors rates have been returned to normal.
Mar 17, 21:47-22:04 UTC
Increased 504 rates for tarballs
the servers in question have been removed from rotation. access has been confirmed.
Mar 17, 18:58-20:11 UTC
elevated 503s for json from the SJC PoP
This incident has been resolved.
Mar 11, 15:24-16:16 UTC
503s on installs
The problem has been resolved.
Mar 10, 21:17-21:35 UTC
elevated 404 rates for scoped modules
From 10:17am PST to 10:39am PST today the registry was returning 404s for some scoped module tarballs. We rolled out a configuration change to our CDN and served an incorrect version format in package.json files. The incident was resolved by rolling back to the previous configuration. 404 rates are now back to normal.
Mar 9, 19:16 UTC
Increased registry 503 rates
We've now resolved this issue for all the packages. We will be publishing a post-mortem regarding this issue. Our apologies!
Mar 2, 02:15-04:33 UTC

February 2016

Investigating 503 rate on registry
The issues with the POP appear to be resolved.
Feb 22, 21:29-22:16 UTC
Difficulties with our CDN
Registry 503 responses were elevated because of errors with one of the points of presence of our CDN provider. They identified & fixed the incident. (You can read more details in their incident report here: https://status.fastly.com/incidents/gjgcmfljjdpk .) Registry responses should be back to normal.
Feb 18, 20:22-21:15 UTC
Increased WWW 503 rates for some users
We started observing short influxes of increased 503 rates for some WWW users starting at around 10:30 UTC today, up to 15:00 UTC today. This was identified as a misconfiguration and has since been fixed, with additional steps being taken to prevent this class of a problem in the future. Our apologies.
Feb 18, 03:13 UTC
Publish failures
Publications are back to normal.
Feb 4, 23:09 - Feb 5, 00:33 UTC

January 2016

503s on www and auth failures on CLI
Our authentication services were briefly overloaded by an external event, leading to auth failures on package pages and the command line. Monitoring detected the incident and it was resolved after approximately 30 minutes of instability.
Jan 30, 08:10 UTC
Elevated 503 rates for users in Europe
The registry was returning elevated 503 rates to our users in Europe from 7:00 UTC to 13:00 UTC due to a misconfiguration of one of the package servers. Our apologies!
Jan 28, 23:16 UTC
Intermittent scoped module installation failures from US East
The registry was returning elevated rates of 503s from a specific origin host that US East users were likely to be routed to. This was due to a misconfiguration that caused requests to an internal service to be rejected. This configuration has been corrected. 503 rates are back down to normal.
Jan 27, 17:31 UTC
Intermittent publish failures
This incident has been resolved.
Jan 22, 03:06-04:38 UTC
Elevated 503 rates for users in Europe
The configuration problem that caused this issue is now solved.
Jan 17, 13:46-14:07 UTC

December 2015

elevated 503 and error rates on registry and website
Registry and website services should now be back to normal.
Dec 31, 21:47-22:56 UTC

November 2015

Networking issues in AWS us-east-1
This incident has been resolved.
Nov 5, 23:05-23:10 UTC

October 2015

503s on www
Our primary cache server for the website failed. The site is supposed to fail over to the spare cache server automatically, but due to a bug this didn't happen. Instead, we manually failed over to the spare server. The website was totally unavailable for 15 minutes from 15:47 to 16:02 Pacific Time. Our apologies!
Oct 21, 23:13 UTC
503s to European users
Networking issues caused users in Europe to receive higher numbers of HTTP 503 responses (caused by network timeouts to servers in the US) from 15:31 to 16:02 GMT today. These issues are now resolved.
Oct 18, 16:11 UTC
Elevated 503 rates on WWW
From 06:46 to 07:31 UTC the website was returning elevated rates of 503 errors. The issue was resolved by a configuration fix and the website is now operating correctly.
Oct 16, 07:40 UTC
Public replication endpoint under heavy load
This migration is now complete and all is well.
Oct 2, 18:00-21:09 UTC

September 2015

Empty tarballs responses
Cache invalidation is complete.
Sep 16, 15:25-16:29 UTC
Elevated 503 rates in Australia
Network issues have been resolved. We continue to work on longer-term architectural changes to be more resilient to trans-pacific network problems.
Sep 14, 00:19-21:42 UTC

August 2015

Older packages temporarily unavailable
A security upgrade released yesterday accidentally prevented the registry from serving ~200 very old packages whose package data was incomplete. These packages combined account for less than 0.02% of registry downloads, so their errors did not trigger our monitoring overnight. Following user reports, we corrected the error this morning. The packages were unavailable for a period of ~14 hours, from 5pm PT 2015-07-04 to 7am PT 2015-08-05. Even though the absolute numbers involved are small, the absence of these packages severely disrupted some users and we apologize. We are putting additional auditing into place around releases to ensure this kind of accident cannot occur in future.
Aug 5, 18:50 UTC

July 2015

External replication endpoint is unavailable
All views are now up to date. Please contact support@npmjs.com if you have any issues get your mirrors back in sync. We have initiated some engineering work to make recovery from this kind of hardware failure much faster in future.
Jul 31, 00:51-20:53 UTC
www is down
We have corrected our health checks. Sorry about that!
Jul 11, 02:34-03:29 UTC
Public replication endpoint taken offline due to security incident
Public skimdb is now repaired and back up.
Jul 2, 14:32-15:14 UTC
503s on www
We are going to take more extensive action in the next few days, but the site remains stable, so resolving this incident.
Jul 1, 02:51-03:15 UTC

June 2015

Issues with auth
Underlying cause has been identified; we are going to put better checks in place to catch this rare network event faster.
Jun 21, 23:39 - Jun 22, 00:43 UTC
Elevated 503s on registry
New code rolled out to the registry failed under load and was rolled back back; there were 5 minutes of highly elevated 503s worldwide from 10.36am to 10.41am Pacific Time. We are making changes to our rollout process to more slowly roll out new code to prevent a recurrence of this kind of error.
Jun 16, 17:57 UTC

May 2015

Packages failed to display on website
A bad rollout caused package pages on production to fail to render for a few minutes. The change has been rolled back. No data was corrupted and no information was lost. Some package pages are currently still failing to display but will correct themselves once data falls out of cache in 5 minutes.
May 22, 23:39 UTC
Temporary loss of publishing
This incident has been resolved.
May 20, 17:18-20:22 UTC
503s on user profile pages
This problem has now been resolved.
May 15, 19:24-19:28 UTC
Elevated 503s for users near Amsterdam
A single machine testing a kernel upgrade was responsible for the unusual errors. It has been pulled out of rotation.
May 11, 15:26-16:07 UTC

April 2015

lag in displaying package meta-information on website
This incident has been resolved.
Apr 16, 23:56 - Apr 17, 00:12 UTC

March 2015

DDoS attack on our CDN
Our CDN continues to monitor, but given several hours of normal performance we are resolving this incident. We'll re-open it if the situation deteriorates again.
Mar 27, 22:03 - Mar 28, 08:53 UTC
Timeouts on some package tarballs
For 2 hours 8am - 10am Pacific time a small percentage of requests, approximately 2%, for package tarballs were timing out (HTTP 503). This issue has now been corrected, but we are conducting a more detailed investigation into the root cause.
Mar 21, 17:31 UTC
Brief website outage
The npmjs.com website was briefly offline at around 15:30 PDT today. The redis service used for caching was unavailable for a short time and the site was most distressed about the absence. It has received counseling.
Mar 20, 23:32 UTC
www downtime
The website was sporadically available over the course of about 50 minutes this morning as a result of a major network problem in Amazon's AWS us-west-2 region. Unlike the registry itself, the website is not completely redundant across AWS east and west regions. We have long-term plans in place to improve this design and we are continuing to work towards them.
Mar 20, 15:09 UTC
Registry outage
A misconfiguration caused a production push to the registry to fail, leading to 120 seconds of registry unavailability from 13:34 - 13:36 Pacific Time before the change was rolled back. We have modified our push process to catch this form of error in future.
Mar 19, 21:19 UTC
Download Counts are Down
Download counts are back!
Mar 13, 21:07 - Mar 19, 21:13 UTC

February 2015

No incidents reported for this month.

January 2015

No incidents reported for this month.

December 2014

cert errors on registry
The second rollback was successful and all machines are back to correct configuration.
Dec 12, 00:09-01:49 UTC
503s on www
We have deployed a hot fix for the issue and 503s have been resolved. Download counts may not be available for all packages. We are working on a longer-term solution. Update: we have deployed a better fix and download counts are once again available for all packages.
Dec 11, 01:25-01:32 UTC

November 2014

Elevated 503s for users near Washington, D.C.
From 0040 to 0110 PT (0840 to 0911 UTC) our CDN Point of Presence near Washington D.C. experienced networking issues that led to the registry serving 503s. Up to 1.5% of registry responses were 503 for the first 10 minutes, then a much smaller percentage (~0.1%) until the problem was entirely resolved. Only users physically close to Washington D.C. were affected.
Nov 19, 09:19 UTC
Sporadic 404s on Some Packages
A misconfigured machine was identified and pulled out of rotation, and 404 rates are back to normal levels as of 30 minutes ago. Our apologies.
Nov 3, 22:03 - Nov 4, 02:03 UTC

October 2014

No incidents reported for this month.

September 2014

published packages not visible / installable
All replicas are now back up to date.
Sep 25, 19:44-20:02 UTC
Scoped packages published to registry
Although the npm client supports scoped packages, the public npm registry does not yet fully support them. We put code in place to prevent packages with scopes being accidentally pushed to the registry before we had completed the necessary work, but a bug in that code allowed 2 scoped packages to be published. This broke some down-stream follower applications -- we expected that would happen, which is why we hadn't intended to allow scoped packages to publish yet. The bug that allowed the 2 packages in has been fixed, and those 2 packages deleted. If you run a follower, you should confirm that you are up to date and skip both the revs including the scoped packages and if necessary also the revs deleting them. Our apologies for this incident. Scoped packages are coming to the public registry later this year, but will require explicitly switching to a new upstream host to avoid this kind of incompatibility.
Sep 24, 18:53 UTC
503 Responses in IAD Region
The bad machine in the IAD POP has been taken out of rotation, and the errors have stopped.
Sep 10, 23:52 - Sep 11, 00:10 UTC
Elevated 503s affecting central/eastern USA
This incident has been resolved.
Sep 9, 05:12-05:33 UTC
Website 503s to European users
From 3:25pm to 4:14pm Pacific time, visitors to our website served by our CDN's European point of presence were receiving 503s. Most of the failed requests were for website static assets, though other failures affected search results. The issue was resolved by our CDN provider.
Sep 4, 23:48 UTC

August 2014

Website Browse/Front Page Views Incorrect
This incident has been resolved.
Aug 28, 23:44 - Aug 29, 05:05 UTC
Download API unavailable
The API at api.npmjs.org/downloads/ was unavailable for 1 hour. There was no associated website downtime.
Aug 27, 20:23 UTC
Registry 503s to Australian/Asia/Pacific users
From 1330 to 1501 UTC today, customers served by our Australian CDN point of presence (users in Australia and nearby parts of southeast Asia and the Pacific) were receiving 503s as a result of an issue with that PoP. At peak, this affected 5% of active users. It was resolved by our CDN provider.
Aug 23, 15:56 UTC
Elevated 503s on www.npmjs.org
From 4-5pm Pacific Time, the website was sending 503s to at most 3% of users. This was due to network problems on one of the www servers. That server was removed from rotation and full service was restored.
Aug 22, 00:35 UTC

July 2014

No incidents reported for this month.

June 2014

more Fastly 503s
Fastly have pulled the affected data center out of service manually, and all services are now working as expected. We apologize for the interruption of service.
Jun 12, 07:09-07:28 UTC
503s on registry and www
Fastly resolved the issues with the San Jose point of presence and both www.npmjs.org and registry.npmjs.org have recovered.
Jun 12, 03:31-04:01 UTC
Registry and www outage 1203 to 1420 UTC
Registry was mostly down for 60 minutes and partially down for another 17 minutes starting at 5.03AM Pacific (1203UTC). The root cause was a failure in our caching provider (Fastly), which meant that instead of about 5-10% of normal requests hitting our servers, 100% of requests did. We have a lot of servers, but this 5x-10x spike in load was too much for them, and they simultaneously overloaded and could not serve enough requests to keep up, which meant that about 90% of requests failed. This cache failure at the CDN was a human-caused accident, and Fastly have already given us a detailed explanation of what happened and changes they are putting in place to avoid it being possible in future. We are working with them on further changes we can make to our architecture to withstand this kind of event in future, including cache configuration changes and additional hardware on our side to be able to withstand a sudden burst of traffic like this. www was also down for the length of the outage; www is a client of the registry just like everybody else, so when the registry is down it cannot serve package information.
Jun 4, 14:25 UTC

May 2014

Registry 500s
From 2118 to 2220 UTC yesterday the registry was serving 500s for ~1% of requests. This was a networking issue affecting a single geographic area (the American midwest and Texas). We apologize for not notifying of this outage at the time; most of the affected period errors were below our alerting threshold, so we did not immediately realize the duration of the outage.
May 29, 16:11 UTC
fastly 503s
All of the essential services are back online and working, and 100% of packages are available.
May 27, 20:37-21:28 UTC

April 2014

www downtime
The website was unavailable for 10 minutes as a result of a deployment error; we pulled a box out of rotation for upgrade without realizing another box was already out of rotation for maintenance. We are enhancing our deployment automation to automatically avoid this kind of error in future. In addition, there was a caching error at Pingdom, which significantly over-reported the length of the outage. We are discussing this problem with their support staff.
Apr 28, 23:16 UTC
www outage
The npmjs.org website was unavailable for about 10 minutes this morning while our master Redis server failed to save its data to disk. This is a repeat of the previous outage on April 20. IN response to that outage, we had already built a replacement Redis host which will not exhibit this problem, and are switching over to it presently. While this switch occurs, we've temporarily changed our Redis configuration to not crash when it fails to save to disk, and resized the problematic instance up once again.
Apr 23, 18:37 UTC
[Scheduled] Manta maintenance
The scheduled maintenance has been completed.
Apr 21, 20:00-22:00 UTC
Website Outage
The npm website was down for approximately 30 minutes unexpectedly. The master Redis instance was failing to persist session data to disk. We resized the box to give it more RAM and disk size. We will be adding additional monitoring to avoid this situation in the future.
Apr 20, 15:36 UTC
Reduced Performance - Web and Registry
No further issues since the last update. Calling this one resolved.
Apr 17, 01:12-02:59 UTC