← Blog Home

engineering Blog Posts

Technical details and challenges of building Error Merging

Written By Rivkah Standig June 19th, 2017

Hopefully you've had the chance to try out our latest feature, error merging. We've heard a lot of positive feedback from our users. They're especially excited to be able to easily merge and un-merge related errors. We thought it would be useful to share how the Rollbar team made this happen from a technical standpoint. If you're interested in the nitty-gritty of how we implemented error merging, read on.

I interviewed Todd Dampier, one of the engineers here at Rollbar who was instrumental in making error merging possible, about what was involved in engineering this feature.

Read more

Increasing max-open files for beanstalkd

Written By Cory Virok February 28th, 2015
Quick tip: If you are running out of file descriptors in your Beanstalkd process, use /etc/default/beanstalkd to set the ulimit before the init script starts the process.
Read more

Missing daily summary emails

Written By Cory Virok July 21st, 2014
We just rolled out a fix for missing daily summary emails. The bug was introduced last week when we refactored a bunch of our email code. As a result, some projects did not receive their daily email for the previous 24 hours.
Read more

Post-mortem for website assets outage

Written By Brian Rue June 6th, 2014

We had an issue from late last night through this morning where many users were not able to use the rollbar.com website because CSS and Javascript assets were not loading in some regions. This post will cover what happened, its cause, why we didn't notice it sooner, and the changes we're making going forward.

Read more

Security patch for the recent CCS Injection Vulnerability

Written By Cory Virok June 5th, 2014
For the security conscious folks out there - We just finished patching our load balancers with the latest security updates.
Read more

Processing Delay Postmortem

Written By Brian Rue and Cory Virok April 11th, 2014

Yesterday from about 2:30pm PDT until 4:55pm PDT, we experienced a service degradation that caused our customers to see processing delays up to about 2 hours. While no data was lost, alerts were not being sent and new data was not appearing in the rollbar.com interface. Customers instead would see alerts notices on the Dashboard and Items page about the delay.

We know that you rely on Rollbar to monitor your applications and alert you when things go wrong, and we are very sorry that we let you down during this outage.

The service degradation began following some planned database maintenance, which we had expected to have no significant impact on service.

Read more

Heartbleed Bug Response

Written By Brian Rue and Cory Virok April 8th, 2014

Updated 4/9 7:30pm

What is Heartbleed?

CVE-2014-0346, known as “Heartbleed”, is a bug in OpenSSL v1.0.1 through 1.0.1f that allows a remote attacker to access private memory on the target server. It has existed for almost 2 years. More info can be found here: http://heartbleed.com/

With this vulnerability, an attacker can:

  • Get your private key for your domain’s ssl cert
  • Decrypt all current and past SSL traffic to/from all affected machines

If this sounds bad, it is. Most sites on the Internet are affected.

Read more

JavaScript and Source Maps in a Django App

Written By Sergei Bezborodko August 2nd, 2013

It’s pretty well known that every web app needs frontend JavaScript these days to provide the best possible user experience. You are probably going to have a bunch of JavaScript files that need to be loaded by your users for that to happen, and since we all care about performance, minifiying and compressing these files is an absolute must. But what happens when it comes time to debug issues in these minified files? Stack traces will more or less be completely useless. How do we solve this problem?

JavaScript source maps solve this problem. They allow you to map a point in a minified file back to the unminfied source, making it possible to actually identify and fix issues encountered in a production app environment.

Below I have outlined a simple guide for setting up source map generation and usage in a sample Django app. You’ll learn how generate source maps for minified files, debug errors that happen in these files, and also a quick overview of what’s required to get this working for your production environments.

Read more

Async node.js API server testing

Written By Cory Virok July 12th, 2013

This post is about how we built our test suite for our API server at Rollbar and some of the tricks and gotchas we ran into along the way. We wanted to build a test suite that not only tested the API logic, but also the underlying code, namely the Express and the Connect middlewares we use. If our API server was going to break, we wanted to know before we deployed it to thousands of customers and millions of requests per day.

Testing is super important. If you don’t want to test, this probably won’t be very helpful or interesting.

Read more

Taking UNIQUE indexes to the next level

Written By Brian Rue March 29th, 2013

You’ve probably seen unique constraints somewhere – either in Rails’ validates :uniqueness, Django’s Field.unique, or a raw SQL table definition. The basic function of unique constraints (preventing duplicate data from being inserted) is nice, but they’re so much more powerful than that. When you write INSERT or REPLACE statements that rely on them, you can do some pretty cool (and efficient) things that you would’ve had to do multiple queries for otherwise.

This post covers unique indexes in MySQL 5.5. Other versions of MySQL are similar. I’m not sure about Postgres or other relational databases but presume they’re similar-ish as well.

Read more

Post-mortem from last night's outage

Written By Brian Rue January 11th, 2013

tl;dr: from about 9:30pm to 12:30am last night, our website was unreachable and we weren’t sending out any notifications. Our API stayed up nearly the whole time thanks to an automatic failover.

We had our first major outage last night. We want to apologize to all of our customers for this outage, and we’re going to continue to work to make the Rollbar.com service stable, reliable, and performant.

What follows is a timeline of events, and a summary of what went wrong, what went right, and what we’re doing to address what went wrong.

Read more

Using a Request Factory in Pyramid to write a little less code

Written By Brian Rue September 7th, 2012

At Rollbar.com, we’ve been using Pyramid as our web framework and have been pretty happy with it. It’s lightweight and mostly stays out of our way.

Pyramid doesn’t have a global request object that you can just import[1], so it makes you pass around request wherever you need it. That results in a lot of library code that looks like this:

# lib/helpers.py
def flash_success(request, body, title=''):
    request.session.flash({'body': body, 'title': title'})

and a lot of view code that looks like this:

# views/auth.py
@view_config(route_name='auth/login')
def login(request):
    # (do the login...)
    helpers.flash_success(request, "You're now logged in.")
    # (redirect...)

That is, there ends up being a lot of function calls that pass request as their first argument. Wouldn’t it be nicer if we could attach these functions as methods on request itself? That would save a few characters every time we call them, and let us stop thinking about whether request is the first or last argument. Pyramid facilitates this by letting us provide our own Request Factory:

Read more

Writing a simple deploy script with Fabric and @roles

Written By Brian Rue August 16th, 2012

I first heard about Fabric a couple years ago while at Lolapps and liked the idea of:

  • writing deployment and sysadmin scripts in a language other than Bash
  • that language being Python, which we used everywhere else

but we already had a huge swath of shell scripts that worked well (and truth be told, Bash isn’t really that bad). But now that we have at clean slate for Rollbar, Fabric it is.

I wanted a simple deployment script that would do the following:

  1. check to make sure it’s running as the user “deploy” (since that’s the user that has ssh keys set up and owns the code on the remote machines)
  2. for each webserver:
  3. git pull
  4. pip install -r requirements.txt
  5. in series, restart each web process
  6. make an HTTP POST to our deploys api to record that the deploy completed successfully

Here’s my first attempt:

Read more

Join Our Community

Get the latest updates, tutorials and more, delivered to your inbox, once a month.

Join Our Community

Get the latest updates, tutorials and more, delivered to your inbox, once a month.