Rollbar vs. Airbrake

Why are Airbrake users switching to Rollbar?

We're seeing quite a few former Airbrake users switch to Rollbar. Why are they doing that? We're glad you asked...

This article will cover a few key areas where we think Rollbar differentiates. It's kind of long, so feel free to skim. There's also a big comparison table at the end.

Reliability and Site Speed

We aren't regular Airbrake users ourselves, but a frequent refrain we see on Twitter and hear in conversations has been disatisfaction with Airbrake's uptime and site speed. We've taken this to heart and done a lot of work to make Rollbar reliable and fast. We're not perfect, but we're proud of the work we've done and it's paying off.

Reliability

Our team has built systems at scale (50M users, 200+ servers) before and we're applying what we've learned to make Rollbar rock-solid. Here are a few things to note:

  • We are hosted on Google Cloud Platform. Our primary data center where our VMs are located is in the Iowa region, and we utilize Google’s network of over 100 points of presence across the globe.
  • The API tier is separate from the Web tier. It's on its own hardware, currently in 4 of our datacenters: Dallas, San Jose, Singapore, and Amsterdam. We use Google's IP addresses to deliver across PoPs and route traffic to the closest datacenter worldwide.
  • The pipeline from the API to alerts and the dashboard is fully asynchronous. If our primary database goes down, payloads will queue up but won't be lost. API servers write to local files, which are loaded in the background to the 'raw' database. A separate system processes items in the raw database into the main database and enqueues alerts. Still other systems actually send the alerts and build the various summary views used through the interface.
  • For the Web tier, we rely on Google's PoP network across the globe for load balancing and redundancy.
  • Each of our database layers (raw, main, UUID) is set up with one master and two replicas. We take a snapshot backup from one of the replicas every other hour and test all backups every hour.
  • All of this is monitored extensively both internally (Sensu) and externally (Pingdom). Our status site shows the status as seen by Pingdom. At least one engineer is on-call at all times.

Site Speed

The key to our current snappiness is in keeping everything in memory. The primary databases are each beefy boxes with 208gb RAM and terabytes of SSDs. Raw occurrence data is stored on regular disks and fronted by memcache.

If you have more questions about our infrastructure, please email us at team@rollbar.com. We're happy to answer questions and we'll share that knowledge here.

Logging things that aren't exceptions

Rollbar was built from the ground up to support general-purpose logging as well as the common special case of exception logging. This means:

  • The key abstraction is called an "Item" (not very descriptive, we know), not an "Error"
  • Items have a severity level, ranging from "debug" to "critical". This can be set in the API payload, and can be changed later in the interface.
  • Occurrences (which are aggregated into Items) can have a 'trace' (if they're an exception) or a 'message' (if they're a generic log message).
  • Notification preferences can be configured per-level. By default, we'll notify you about errors and criticals but not warnings or lower.

Want to record a non-exception warning condition, like not being able to connect to an external API? Rollbar.warning("Could not connect to Acme API.", :some_data => @its_value). Have an exception that you want to keep an eye on, but isn't really an error? Mark it as a Warning in the interface.

If you're familiar with logging frameworks, you'll feel right at home here. We use this ourselves for everything from logging business events ("user logged in") to form validation errors to slow queries.

Search

Want to find exceptions that occurred in a particular file? In a date range? Warnings on a particular host? That contain the word "hello"? Rollbar can do all of this and more.

The search index is updated in real-time and runs on the fantastic Sphinx Search.

Graphs Everywhere

Debugging complex issues is often about finding patterns, and often the best way to see that is visually. The Rollbar dashboard includes graphs of occurrence and new item counts, with deploys overlayed (so you can see how deploys affect error rates). There are also inline graphs (sparklines) showing the rates of each individual item on the Dashboard and Search pages.

On the Item Detail page, there are graphs of occurrence counts by hour and day (again with deploys overlayed), a table of you can scan for patterns, and graphs showing the Browser and OS breakdowns.

A few cool features

Replays

If an error happened on a GET request, we have a Replay button that will re-run the same request (URL, headers, cookies and all) so you can instantly reproduce it.

Person Tracking

Airbrake has the basics here (recording which user was affected by a particular error), but Rollbar takes it to the next level. You can see the history of events seen by each user, as well as a list of all your users we have any data for.

One customer we know of is using this for debugging complex JavaScript issues—they use Rollbar.debug to log debug messages, and then view the results in the Rollbar interface.

Flexible notification rules

You can set up notification rules for any of our supported channels (see the list in the table below), filtered by environment, severity level, string match on the exception class+message, and string match on any of the filenames in the traceback.

Comparison Table

If you were looking for a big comparison table, this is for you:

 RollbarAirbrake
Basics
Hosting providerGoogle CloudAWS
Data modelExceptions and log messagesExceptions only
SSLPaid plans only
Pricing modelPer-occurrence; optional rate limitPick a rate limit
On-premises available
Rate limitsConfigurable per access tokenSet by plan
UsersUnlimitedVaries by plan
ProjectsUnlimitedVaries by plan
TeamsUnlimited
Official Libraries
Ruby / Rails
Python / Django
PHP
Node.js
JavaScript
Flash
iOSBeta
AndroidBeta
Managing Errors
Severity levels
Auto-resolve old items
"Resolved" state
"Muted" state
Batch actions (set status, level)
Resolve from the item list
Finding Errors
Dashboard with reports and graphs
Search by status
Search by date, level
Search by title, hostname, context
Daily summary emails
Understanding Errors
Affected user counts
Graphs of rate over time
Browser and OS graphs
Table of occurrences by parameter
Basic person tracking
History by person
Replay button
Non-project code optionally hidden in stack traces
Collaboration
Comments on items
@-mentions in comments
Deploy Tracking
Record when you've deployed
Auto-resolve items on deploy
History of deploys and commits in each
GitHub Integration
Link from stack traces to file in GitHub
Available GitHub sign-in and project sync
Deploy history includes code commits
"Suspect Deploy" analysis
Notification Channels
Email
Asana
Campfire
Flowdock
GitHub Issues
HipChat
Lighthouse
JIRA
PagerDuty
Pivotal Tracker
Slack
Sprintly
Trello
Webhook
Miscellaneous
Free trial
Team-based permissions
RSS feed

We tried to be as unbiased as possible, but if any of the above is inaccurate, please let us know: team@rollbar.com