4/10/2014 Processing Delay Postmortem

Yesterday from about 2:30pm PDT until 4:55pm PDT, we experienced a service degradation that caused our customers to see processing delays up to about 2 hours. While no data was lost, alerts were not being sent and new data was not appearing in the rollbar.com interface. Customers instead would see alerts notices on the Dashboard and Items page about the delay.

We know that you rely on Rollbar to monitor your applications and alert you when things go wrong, and we are very sorry that we let you down during this outage.

The service degradation began following some planned database maintenance, which we had expected to have no significant impact on service.

The Planned Maintenance

We store all of our data in MySQL in a master-master/active-passive configuration. Yesterday we needed to add partitions to our largest table - a routine procedure. Normally, this process takes about 15 minutes, during which time customers experience small delays in data processing. This process generally goes unnoticed by customers. However, this time something caused the database to load new data extremely slowly which, in turn, caused the outage.

Timeline

  • 2:06pm - We began the planned maintenance by promoting the passive master to be the new active master.

  • 2:29pm - The planned maintenance was complete.

    All connections to our old active master were closed and the new active master was getting new data and processing it.

  • 2:40pm - It became apparent that new data was being loaded and processed very slowly.

    We turned off our data loaders to decrease any contention in the database.

  • 2:41pm - We began profiling the slow worker.

  • 2:47pm - We tested a theory that a single recurring item was causing most of the slow processing.

  • ~2:50pm - We noticed that ping times to the new active master were 1-5 milliseconds - an order of magnitude slower than normal.

  • 2:52pm - We turned off replication from the passive to the active database which seemed to help loading data, but not by much.

  • ~2:50pm - 3pm

    • The ALTER to add partitions completed on the passive database.

      • Up to this point, there were no connections on the passive database.
    • We decided to switch back to the previous active master but quickly reverted after finding that our passive database was missing a significant number of rows.

      • MySQL replication was 0 seconds behind.
      • It was unclear how the passive database thought it was caught up but was missing data.
  • ~3:15pm - We identified a the slowest portion of the slow worker, which happened to be unused. We removed this code and deployed to all workers.

    • This got processing back to normal speeds and allowed us to begin catching back up.
  • ~3:30pm - We turned the data loaders back on.

  • 4:30pm - The worker responsible to making new occurrences appear in the interface was caught up, but notifications were still delayed.

  • 4:55pm - Everything was caught back up and we were back to 0 delay.

Follow-on Tasks

We have two open questions:

  1. Why did the data loaders slow down when we switched to the new active master?
  2. How did the databases get out of sync?

We have some theories as to why the data loaders slowed down so much but we are not sure. It could have been the amount of concurrent processes trying to load data into the same table. It could also have been something about the disk layout or cache on the new active master. We plan to investigate serializing loads in general and/or slowly ramping up loads after maintenance in the future.

To determine why our databases became out of sync, we wrote a script to tell us the exact moment when they diverged. Once it completes we will find the coordinates in the new active master’s binlogs that correspond with the point in time where the databases became out of sync, then restart replication on the passive master using those coordinates.

Conclusion

We take downtime very seriously and we want to be as transparent as possible when it happens.

We are sorry for the degradation of service and we are working on making sure it doesn’t happen again. If you have any questions, please don’t hesitate to contact us as support@rollbar.com


Heartbleed Bug Response

Updated 4/9 7:30pm

What is Heartbleed?

CVE-2014-0346, known as “Heartbleed”, is a bug in OpenSSL v1.0.1 through 1.0.1f that allows a remote attacker to access private memory on the target server. It has existed for almost 2 years. More info can be found here: http://heartbleed.com/

With this vulnerability, an attacker can:

  • Get your private key for your domain’s ssl cert
  • Decrypt all current and past SSL traffic to/from all affected machines

If this sounds bad, it is. Most sites on the Internet are affected.

Are you affected?

Probably. If your web server or load balancer is running on linux and you’ve updated your packages anytime in the last 2 years, you are more-than-likely affected.

To check your OpenSSL version, run openssl version -a.

Check out http://filippo.io/Heartbleed/ to test your servers for the vulnerability.

How We Responded

We learned of CVE-2014-0346 at around 4:50pm on 4/7 and immediately began our response. We completed the most important fix (patching OpenSSL) within about an hour, and have been working over the past 24 hours on related issues.

Here is a timeline of what we’ve done since the vulnerability was announced:

  • 4/7 - 3:01pm - Ubuntu Security Announcements email

    Subscribe to this list here

  • 4/7 - 4:50pm - Began updating our load balancers with the fix. All servers patched by 6pm.

    We’re running nginx on Ubuntu 12.04. Updating is as simple as:

      apt-get update
      apt-get upgrade
      openssl version -a  # should show that it was built on April 7, 2014
      service nginx restart
    

    The above didn’t work for us on the first try because our servers were talking to a mirror that hadn’t updated to the latest packages (after all, they were only a couple hours old). Changing the domain in each line in /etc/apt/sources.list to archive.ubuntu.com and then running apt-get update again solved this.

  • 4/7 - 11pm - rollbar.com and api.rollbar.com SSL certs were rekeyed

    We use DigiCert for our SSL certs. The process was quick and easy.

  • 4/7 - 11:10pm - Previous rollbar.com and api.rollbar.com SSL certs revoked

    In order to prevent a possible man-in-the-middle attack we had Digicert revoke our old certs.

  • 4/7 - 11:30pm - ratchet.io and submit.ratchet.io rekey requested

    We still support our old domain, ratchet.io which use NetworkSolutions SSL certs

  • 4/8 - 11:50am - All rollbar.com cookies were invalidated, forcing users to re-auth

    Since an attacker could have accessed our customers’ cookies, we changed the secret key that we use to encrypt cookies. This invalidated all logged-in users’ sessions.

  • 4/8 - 12:30pm - 2:25pm - All third-party tokens and keys were regenerated and deployed

    We use services like Stripe, Mailgun, Heroku - All required new keys to be generated.

  • 4/8 - 3:30pm - ratchet.io and submit.ratchet.io certs were rekeyed and deployed

  • 4/8 - 5:30pm - Published this blog post and added in-app notifications to change passwords and cycle access tokens

Update 4/9

Thanks to this post on security.stackexchange, we additionally patched our application and compute servers (everything that can make outgoing HTTPS requests). This was started at 3:45pm and completed at 4:45pm. The attack surface here is much lower, as it requires creating a Rollbar account and setting up a webhook that points to the attacker’s malicious server. We audited our logs and confirmed that there has been no such suspicious activity.

Recommended actions for Rollbar Customers

  • Change your password
  • Cycle any access tokens you have used (create and start using a new one, then disable or delete the old one).

    • For projects, go to the project dashboard, then Settings -> Project Access Tokens. Most customers will need to do this.
    • For accounts, go to Account Settings -> Account Access Tokens. Most customers will not need to do this.

Note for Heroku Users

If you’re using Rollbar through Heroku, we’ve already started the process of cycling your access tokens. We’ve created new tokens and updated them in your Heroku config. You should update the token in any other locations (i.e. development environments, and anywhere it might be hardcoded) and then disable/delete the old tokens.

Closing notes

This was painful, but we’re thankful to the security researchers who discovered and responsibly disclosed this issue, and to the security teams at Ubuntu and elsewhere who quickly released patched packages.

If you have any questions, please don’t hesitate to contact us at support@rollbar.com.


Connecting Rollbar with PagerDuty

Using Rollbar with PagerDuty is now a lot more seamless. PagerDuty provides SaaS IT on-call schedule management, alerting, and incident tracking. With our new integration, you can automatically send issues found by Rollbar into incidents in PagerDuty.

We have a few customers using it already. Here’s what Richard Lee, CTO at Polydice, a mobile development studio, has to say:

“With Rollbar’s integration of PagerDuty, we’re able to get notified as soon as errors detected, and avoid possible downtime to our customers. This powerful combination becomes a must have tool for us.” — Richard Lee, CTO at Polydice

Integrating Rollbar with PagerDuty is easy; just create a new Generic API System in PagerDuty, and then link it in Rollbar’s Notification settings. See our docs for detailed instructions.


Resolving Rollbar Items in Versions

We just rolled out a new feature to help track which versions/revisions errors are resolved in. When resolving items within Rollbar, you have the option of entering a revision or version number. If one is entered, it will appear in the item’s status history to let anyone looking at the item better understand specifically when it was fixed.

This version can be combined with a new code_version parameter set in the configuration options of the latest versions of our notifiers. This can be set to either a numerical value (eg. 1, 24, 300), a semantic version value (eg. 1.0.3, 2.9), or a git revision sha. Here are examples on how to set this parameter in our JavaScript and Ruby notifiers:

In the JavaScript snippet:

1
2
3
4
_rollbarParams = {
    // ... other configuration
    "client.javascript.code_version": "bdd2b9241f791fc9f134fb3244b40d452d2d7e35"
}

In your rollbar-gem configuration:

1
2
3
4
Rollbar.configure do |config|
    # ... other configuration
    config.code_version = 'bdd2b9241f791fc9f134fb3244b40d452d2d7e35'
end

The other notifiers have a very similar top-level code_version configuration settings. See the notifier readmes for more info.

If you resolve an item within Rollbar in a certain version and are also specifying a code_version for your code, we will use both of these values to decide whether or not to reactivate the item.

For example, say you have a bug in version 1.0 of your app. The bug is fixed and will be deployed to users in verision 1.1, but that won’t happen for a few days. You can just resolve the Rollbar item associated with this bug now, but also specify that the resolved version is 1.1. You will no longer get reactivation notifications for this item until occurrences of this item with a code_version >= 1.1 come in.

If you connect Rollbar with GitHub, this process will also work with Git SHAs. We’ll query the GitHub API to determine whether one commit is a parent of the other.

Auto-resolving items in GitHub commits

You can now also include Rollbar item tags in your GitHub commit messages to automatically resolve them in the correct revision when deploying. Just include one of the following strings in your commits for each item you want to resolve:

  • fix $ref
  • fixed $ref
  • fixes $ref
  • resolve $ref
  • resolved $ref
  • resolves $ref
  • close $ref
  • closed $ref
  • closes $ref

Where $ref is one of the following item tags:

  • Full item URL, eg. https://rollbar.com/item/123456789
  • Item ID, eg. rb#123456789
  • Short item ID, eg. rb#22 This appears at the top left of an item page.

Then execute a deploy by hitting the deploy API endpoint. The items referenced in any of the commit messages of the deploy will be resolved using the respective revision of that commit.

You can start tracking and resolving errors in Rollbar by signing up for free.


Ad-hoc error reporting with Rollbar CLI

We just coded up a quick tool to send Rollbar messages from the command line. It’s useful for quick, one-off monitoring scripts that you don’t have time to instrument with one of our notifiers.

To install, just pip install rollbar and you’re done.

e.g. Tracking all non-500s as WARNINGs from HAProxy

1
tail -f /var/log/haproxy.log | awk '{print $11,$0}' | grep '^5' | awk '{$1="";print "warning",$0}' | rollbar -t $ACCESS_TOKEN -e production -v

e.g. Watch failed login attempts

1
tail -f /var/log/auth.log | grep -i 'Failed password' | awk '{print "error user ",$11,"failed auth from ",$13}' | rollbar -t $ACCESS_TOKEN -e ops

More info on how to install and use it can be found here..


jQuery Error Instrumentation

Today we are releasing a new feature for our JavaScript notifier that should make tracking down errors much easier if you use jQuery 1.7 and above. The new functionality comes in a separate JS plugin snippet that should be placed right below where jQuery is loaded. Here is the first version of the plugin:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<script>
!function(r,n,e){var t={"notifier.plugins.jquery.version":"0.0.1"};n._rollbar.push(
{_rollbarParams:t});r(e).ajaxError(function(r,e,t,u){var o=e.status;var a=t.url;
n._rollbar.push({level:"warning",msg:"jQuery ajax error for url "+a,jquery_status:o,
jquery_url:a,jquery_thrown_error:u,jquery_ajax_error:true})});var u=r.fn.ready;
r.fn.ready=function(r){return u.call(this,function(){try{r()}catch(e){
n._rollbar.push(e)}})};var o={};var a=r.fn.on;r.fn.on=function(r,e,t,u){
var f=function(r){var e=function(){try{return r.apply(this,arguments)}catch(e){
n._rollbar.push(e);return null}};o[r]=e;return e};if(e&&typeof e==="function"){
e=f(e)}else if(t&&typeof t==="function"){t=f(t)}else if(u&&typeof u==="function"){
u=f(u)}return a.call(this,r,e,t,u)};var f=r.fn.off;r.fn.off=function(r,n,e){
if(n&&typeof n==="function"){n=o[n];delete o[n]}else{e=o[e];delete o[e]}
return f.call(this,r,n,e)}}(jQuery,window,document);
</script>

The source can be found on GitHub here.

The snippet wraps the ready(), on() and off() functions in jQuery to wrap any passed-in handlers in try/except blocks to automatically report errors to Rollbar. This lets us collect the full stack trace with line and column numbers for each frame, instead of just the last frame with only a line number. When combined with source maps, this makes debugging JavaScript errors much more doable.

The new snippet also adds a handler to ajaxError() to automatically report any jQuery AJAX errors such as 404s and 500s to Rollbar. If you don’t want this, add the following option to your base snippet’s _rollbarParams:

1
"notifier.plugins.jquery.ignoreAjaxErrors": true

You can start tracking errors in Rollbar by signing up for free. Or read more in the docs.


JavaScript and Source Maps in a Django App

It’s pretty well known that every web app needs frontend JavaScript these days to provide the best possible user experience. You are probably going to have a bunch of JavaScript files that need to be loaded by your users for that to happen, and since we all care about performance, minifiying and compressing these files is an absolute must. But what happens when it comes time to debug issues in these minified files? Stack traces will more or less be completely useless. How do we solve this problem?

JavaScript source maps solve this problem. They allow you to map a point in a minified file back to the unminfied source, making it possible to actually identify and fix issues encountered in a production app environment.

Below I have outlined a simple guide for setting up source map generation and usage in a sample Django app. You’ll learn how generate source maps for minified files, debug errors that happen in these files, and also a quick overview of what’s required to get this working for your production environments.

Local Debugging with Source Maps

Say you have a simple Django app with the following directory structure:

...
app/
    ...
    views.py
    static/
        js/
            site.js (containing various models and functionality used in your app)
            jquery.js (unminified)
            util.js
    templates/
        index.html

site.js would have the following code:

1
2
3
4
5
6
7
8
var aFunction = function() {
    var a = b;
}

App = {};
App.errorCausingFunction = function() {
    aFunction();
}

views.py would just contain one view that rendered index.html, and here is how index.html would look like:

1
2
3
4
5
6
7
8
9
...
<script src="static/js/site.js"></script>
<script src="static/js/jquery.js"></script>
<script src="static/js/util.js"></script>
....
<script>
    App.errorCausingFunction();
</script>
...

Let’s minify. Start by installing UglifyJS2:

npm install -g uglify-js

Here is an example command run from app/ that will generate minified Javascript:

uglifyjs static/js/site.js static/js/jquery.js static/js/util.js --output static/js/all.min.js

Here we are using uglifyjs to minify three JS files, site.js, util.js and jquery.js, into all.min.js.

Update your index.html to include only all.min.js:

1
<script src="static/js/all.min.js"></script>

Now let’s try navigating to index.html and seeing what happens:

Not the most helpful stack trace. All the line numbers are 1, and if this were a much larger file with more generic function/variable names, it would be completely useless in helping you debug where things went wrong.

Let’s introduce source map functionality to our app.

Modify your minification command to look like this:

uglifyjs static/js/site.js static/js/jquery.js static/js/util.js --output static/js/all.min.js --source-map static/js/all.min.map --source-map-url /static/js/all.min.map

Here we are adding two new options, --source-map and --source-map-url. UglifyJS2 will now generate the resulting source map as static/js/all.min.map, and will append a comment to the end of the minified file containing the url to the source map on your website, in this case /static/js/all.min.map. Note, you may need to modify the comment in all.min.map to read //# instead of //@, as this is a recently new convention.

Now navigate to your app in Chrome with Developer Tools open. If everything is set up right, Chrome will automatically translate the frames in the stack trace to the unminified equivalents, like so:

Note that the filenames and line numbers now refer to the original source code, instead of the minified source.

Production Debugging with Source Maps

The above process is all fine and dandy for errors encountered on your local machine, but what if you want to keep track of errors encountered by your users in real-time?

Here at Rollbar, we have recently reworked our error processing pipeline to support the application of source maps on JavaScript errors. Here’s how you would get Rollbar hooked up and reporting from your production environment:

  1. Create a Rollbar account

  2. Follow the instructions to insert the Rollbar Javascript notifier snippet into your base template

  3. Modify the snippet configuration to signal that source maps should be used:

1
2
3
4
5
6
7
_rollbarParams = {
  // ... other params ...
  // set this to 'true' to enable source map processing
"client.javascript.source_map_enabled": true,
  // provide the current code version, i.e. the git SHA of your javascript code.
"client.javascript.code_version": "bdd2b9241f791fc9f134fb3244b40d452d2d7e35"
}
  1. Make sure your minified files link properly to publicly accessible source maps using the sourceMappingURL comment:
1
2
// ... minified js file contents ...
//# sourceMappingURL=<url for source map>

Now, when your app sends Rollbar an error report, Rollbar will automatically attempt to download source maps defined in your minified files and apply them to stack frames located in these files.

Here is an example of the source map application process in action with an unminified stack trace that you would see in Rollbar:

Notice the unminified source filenames with relevant line and column numbers.

Automating things

It’s a bit annoying to have to minify everything every time you change one of your Javascript files. We have a small script set up here in our dev environments that uses macfsevents to listen for Javascript file changes. Once such events occur, we check to see if only the Javascript files we care about are affected. If so, we run an uglifyjs command on all the Javascript files to generate minified sources and source maps.

You can even go one step further by making an API call to Rollbar to upload your source map as part of your deploy process. This API endpoint also accepts source file uploads for files referenced by the source map, giving us the ability to print out the unminified source code for each frame in the stack trace. For example:

Here’s a sample command you could use to upload a source map and source file to our API:

1
2
3
4
5
6
curl https://api.rollbar.com/api/1/sourcemap \
-F access_token=aaaabbbbccccddddeeeeffff00001111 \
-F version=bdd2b9241f791fc9f134fb3244b40d452d2d7e35 \
-F minified_url=http://127.0.0.1:8005/static/js/all.min.js \
-F source_map=@app/static/js/all.min.map \
-F app/static/js/site.js=@app/static/js/site.js

The last param is a mapping of source file path to actual source file contents. The path would need to match the one defined in your source map, generated by your minification tool.

More info

Check out the documentation for more info about integrating your JavaScript and source maps with Rollbar. Rollbar also integrates with your Python, Rails, PHP and Node.js based backends.

Contact us at team@rollbar.com if you have any questions, and be sure to follow @rollbar for more updates regarding new releases!


Debug Production Errors in Minified JS with Source Maps and Rollbar

Rollbar just got a much-requested feature: Source Maps support for Javascript. If you minify your Javascript code in production, this will make debugging production errors much easier. This feature is now live for all accounts.

What Are Source Maps?

If you minify your Javascript code (i.e. using UglifyJS2 or the Closure Compiler), it gets harder to debug errors. Stack traces reference the line/column numbers in the minified code instead of the original source code.

Source Maps were designed to resolve this; they provide a mapping back from the minified line/column numbers to the original code. Chrome and Firefox have tools to use them in development, but what about errors that happen in production?

Source Maps and Rollbar

Rollbar can now map stack traces that reference minified code back to the original source files, lines, and column numbers. Here’s what a stack trace might have looked like before:

Here’s the de-minified version:

We’ll also use the de-minified stack trace in our grouping algorithm, which should result in more useful grouping.

Getting this set up

To get started, you’ll need to make a change to _rollbarParams in the on-page javascript snippet. Add the following two parameters:

1
2
3
4
5
6
7
_rollbarParams = {
  // ... existing params ...
  // set this to 'true' to enable source map processing
  "client.javascript.source_map_enabled": true,
  // provide the current code version, i.e. the git SHA of your javascript code.
  "client.javascript.code_version": "bdd2b9241f791fc9f134fb3244b40d452d2d7e35"
}

Next, either: - Add a sourceMappingUrl comment at the end of your minified file to point to the source map - Upload the source map (along with all source files) separately, as part of your deploy process

This second step is a bit more involved so please see our docs for more details.

Caveats

All of this relies on having a stack trace with line and column numbers. Unfortunately, browser support for column numbers is inconsistent. As of today, this will work in Chrome, Firefox, and IE10+, and only for caught errors reported like this:

1
2
3
4
5
try {
  doSomething();
} catch (e) {
  _rollbar.push(e);
}

Uncaught errors (reported via window.onerror) don’t have column numbers in any browser we’re aware of, so they aren’t able to be de-obfuscated. For best results, catch all your exceptions so you don’t fall back to the top-level error handler.

Happy debugging and please don’t hesistate to contact us (team@rollbar.com) if you have any questions.


Async node.js API server testing

This post is about how we built our test suite for our API server at Rollbar and some of the tricks and gotchas we ran into along the way. We wanted to build a test suite that not only tested the API logic, but also the underlying code, namely the Express and the Connect middlewares we use. If our API server was going to break, we wanted to know before we deployed it to thousands of customers and millions of requests per day.

Testing is super important. If you don’t want to test, this probably won’t be very helpful or interesting.

We use Vows. Why not Mocha?

Mocha is, by far, the most widely used testing framework for Node.js apps. So, why didn’t we use it? The two main reasons were that Vows was the first thing I found when Googling “nodejs async testing” and the other is that the syntax of Mocha tests felt like another language and less like code. Mocha tests are more readble but the benefit of readability was overshadowed by the need to remember all of these new, special-case methods that Mocha injects.

1
2
//Mocha
[1,2,3].indexOf(5).should.equal(-1);

vs

1
2
//Vows
assert.equal([1,2,3].indexOf(5), -1);

There’s something that bothered me about the former. I didn’t like how the library used a bunch of magic to enable something this small/strange.

Mocha has a lot of awesome features but none that were important enough for me to switch.

A simple Vows test

Vows works just as you’d expect it to, except when it doesn’t. More on that later…

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
var vows = require('vows');
var assert = require('assert');
vows.describe('testmodule').addBatch({
  "call username() with a valid user id": {
    topic: function() {
      var callback = this.callback;
      return username(42, this.callback);
    },
    "and verify username is correct": function(err, username) {
      assert.isNull(err);
      assert.isString(username);
      assert.equal(username, "cory");
    }
  }
}).export(module, {error: false});

The above test will make sure that the function username() calls its callback with (null, "cory").

Note that we use this.callback since everything is assumed to be async and we use {error: false} when we export the batch. More on those later.

Check out the Vows website for better examples.

Useful design patterns (I swear this will be short)

We’ve found a few idioms and conventions that have been super helpful. Without going too much into design patterns and architecture, here are a few tips that have made writing tests super-easy; almost enjoyable—almost.

Separate your view logic from your API business logic

Your server’s views should have one job, to marshall data from the request/socket/carrier pigeon and provide it to your API library.

Any error checking done in your views should be to make sure the types provided to your API library are correct.

Make every function you write use a callback.

This is super-important for refactoring and adding new features. If you find yourself wanting to add a feature that requires i/o into a code path that was assumed to be completely synchonous, you’ll need to refactor the hell our of your code to make it work. Don’t bother. Make everything take a callback. Embrace async!

Make the first argument to every callback be an optional error.

This is how the Node.js developers do it and I agree. It makes for a lot more boiler-plate code but it forces you to keep error handling in-mind when developing. Writing defensive code is more important than writing fewer lines of code.

This will also make testing much, much easier with Vows. How? Read on…

Testing the API server, for reals

Definitely write tests and exercise your API library directly but don’t stop there. Fork a process, start your API server up in it and start firing requests at it using Vows.

testcommon.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
exports.initTestingAppChildProc = function(config, promise) {
  // ... Setup temporary config file
  // ... Get path to your main app.js
  // ... Initialize the api library

  // fork a child process to start the api server
  var args = [configPath, 'test'];
  var appProc = fork(appJsPath, args);

  // This is used to tell if our API server died during its
  // initialization.
  var pendingCallback = true;

  appProc.on('message', function(message) {
    if (message == 'ready') {
      pendingCallback = false;

      // This is how we know our API server is ready to 
      // receive requests. The message is emitted in the
      // API server once it's ready to receive requests.
      promise.emit('success', null, appProc);
    }
  });
  appProc.on('exit', function(code, signal) {
    if (pendingCallback) {
      var msg = 'child process exited before callback';
      console.error(msg);
      promise.emit('error', new Error(msg));
    }
  });
  appProc.on('SIGTERM', function() {
    process.exit();
  });
};

In our API server:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// initialize the API and start the web server when it's ready
api.init(config, function(err) {
  if (err) {
    log.error('Could not initialize API: ' + err);
    process.exit(1);
  } else {
    // Start up the server
    var httpServer = app.listen(port, host, function() {
      log.info('API server is ready.');
      log.info('Listening on ' + host + ':' + port);

      // Use the "ready" message to signal that the server is ready.
      // This is used by the test suite to wait for the api server
      // process to start up before sending requests.
      if (process.send) {
        process.send('ready');
      }
    });
  }
});

tests/routes.project.js:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
vows.describe('routes.project').addBatch({
  // Provides a reference to the api server child process
  'Start up an API server': {
    topic: function() {
      var promise = new events.EventEmitter();
      common.initTestingAppChildProc(config, promise);
      return promise;
    },
    teardown: function(err, childProc) {
      var callback = this.callback;
      var shutdown = function() {
        api.shutdown(callback);
      };
      if (childProc) {
        childProc.on('exit', function(code, sig) {
          shutdown();
        });
        childProc.kill();
      } else {
        shutdown();
      }
    },
    'and get a valid project': {
      topic: function(err, childProc) {
        common.apiGet(url('api/1/project/',
            {access_token: config.test.validEnabledReadAccessToken}), this.callback);
      },
      'returns 200 OK': common.assertStatus(200),
      'returns JSON': common.assertJsonContentType(),
      'fast local response time': common.assertMaxResponseTime(20),
      "returns a valid api response": common.assertValidApiResponse(),
      "has a result key in the JSON response": common.assertJsonHasFields(['result']),
      "there's no api error": common.assertNoApiError(),
      'all of the deploy fields are available': common.assertJsonHasFields(db.projectFields(),
          'result'),
      'cross-check account id with api.getAccount': {
        topic: function(err, resp, body) {
          var project = body.result;
          api.getAccount(body.result.account_id, this.callback);
        },
        'verify the account is not null': function(err, account) {
          assert.isNull(err);
          assert.isObject(account);
        }
      }
    }
}).export(module, {error: false});

There is a lot happening in these tests.

  • We use promises to notify our test when the API server is ready.
    • Documentation for using promises with Vows can be found here.
    • I’m not completely on-board with the Promise design pattern but it seemed like the easiest way to get this working. Mostly, I needed an event to be fired when there was an error that caused the API server process to shut down.
  • We use a Vows teardown function to shut down the API server process.
  • We use our API library to help test our API server.
    • We cross-check our API server’s response by using our API library directly.
  • We use Vows macros for reusable tests on all API requests.
    • We also make use of Vows contexts even though there are none in this example.
    • Documentation for macros and contexts are here.

Gotchas

Never, ever, ever throw an uncaught exception in a Vows topic. It makes debugging impossible. I’ve wasted hours looking through my API library for a bug only to find that I had a silly bug in my topic.

Always use export(module, {error: false}) in your Vows batches. This option is not really described in the Vows docs. I had to find it in the source. Basically, if you don’t have this, Vows will inspect the first argument to each test to see if it’s an error. Vows will potentially call your test functions with a different set of arguments depending on if the first parameter to the topic’s callback is an Error or not. It’s completely strange and magical and confusing.

Testing without mock objects means that you need a real database which means you probably need real-ish data to test with. This is tough. We chose to maintain a DB SQL fixture that we have to update whenever the schema changes. It’s a bit clunky but it works. I’m open to suggestions for this if anyone knows of a better way.

Wrapping up…

We use CircleCI to run all of these tests and are really happy with their service. It’s fast and easy to set up. Also, it has all of the systems that our API server uses like MySQL, Beanstalkd and Memcache pre-installed. This gets us closer to testing in a production environment than would otherwise be possible.

Hopefully you were able to glean some useful tips from our experience at Rollbar. We love building tools for devs like you!

Add me on Twitter @coryvirok. Follow @rollbar for more updates.

Moment of zen

1
 OK » 497 honored (33.232s)

May Release Roundup

Here’s a roundup of what’s new at Rollbar in the month of May.

Big Features

We revamped our notifications system, and added integrations with a bunch of new services. Rollbar now works with Asana, Campfire, Flowdock, Github Issues, Hipchat, JIRA, Pivotal Tracker, and Trello, as well as any arbitrary system via a Webhook. See the announcement blog post for more details.

Small Features

  • You can now customize how occurrences are grouped. This first release allows you to define rules of things that should always be grouped together. See the documentation: Custom Grouping Rules. An in-depth post on how to use this is coming soon.

  • There’s now a “Download CSV” link at the bottom of the Items page, which will let you download a CSV of what you see on the page. Note that this information is also available via our API.

  • You can now sort the Items page by Total Occurrences or Unique Users, in additon to Last Occurrence. Click on the column headers to change the sort.

  • Links to files in Github are now linked to the appropriate revision, when this information is available. We’ll use one of the following (trying each in order):

    • the value of server.sha
    • the value of server.branch, if it looks like a SHA
    • the revision from the last deploy before the first occurrence of the item

Library Updates

Ruby

We released rollbar-gem versions 0.9.11 through 0.9.14. The changes include a fix for use with Rails 4, a concurrency bugfix, better support for JSON requests, and the ability to include custom metadata with all reports. See the full changelog for details. To upgrade, change the rollbar line in your Gemfile to:

1
gem 'rollbar', '~> 0.9.14'

We also contributed a fix to resque-rollbar to force use of synchronous mode when reporting Resque failures (instead of async mode, which doesn’t play nicely with Resque).

Python

pyrollbar gained a feature and now at version 0.5.7. See the changelog for details.

Bug Fixes

  • Fixed an issue where pressing the back button would sometimes cause Chrome to render one of our JSON responses as if it were HTML
  • Fixed a bug where removed email addresses could not be re-added

Documentation Updates

More is on the way. Stay tuned! And don’t forget to send us any feedback: team@rollbar.com — we love hearing from you.