NEWJoin our webinar on Sep 13th on Continuous Delivery: a pragmatic approach to mitigating risks Register now →
June 14th, 2017 • By Mike Smith
I'm eager to share an insightful interview our friends at Changelog recently did with Andrew Childs, CTO at Clubhouse and Rollbar power-user. We're big supporters of the Changelog podcast and we asked them, to help us produce a handful of interviews with our customers. It's a fun project that lets us pull back the curtain and learn more about our customers processes for handling errors and deploying code. Read. Listen. Enjoy!
Featured in this interview: Adam Stacoviak, Founder & Chief Editor at Changelog, a podcast on software development and open source. Andrew Childs, CTO of Clubhouse, an easy-to-use project management tool for software teams.
Adam: Andrew, let's start off with you telling me a bit about Clubhouse.
Andrew: Clubhouse is a software company based in New York, building project management software, specifically for software teams that are looking for something simple and flexible but gives them a little bit more visibility into what the whole team is doing and where the team is going at a high level.
Adam: How important is error tracking to Clubhouse?
Andrew: It's really important, because our software isn't perfect. You could see if you looked at our Rollbar, we have a constant stream of errors coming in that need to be fixed. People are leaving these windows open for weeks and they've got all these third-party extensions loaded, and those extensions are doing things to the page, rewriting things and rewriting stuff on DOM and it's like you're in a very hostile environment and you can only do so much to guard against that locally when you're testing things yourself.
You can only do so much to guard against errors locally when you're testing things yourself.
Adam: What's your policy on errors? Do you have a no-error policy?
Andrew: Not really. The front-end team here is a small team and we don't have a lot of formal processes in place yet.
Adam: Does Rollbar help you organize that better then? Since you have a small team, Rollbar's awesome because it helps you visualize those things and categorize things and keep track of those errors better than maybe having a slightly larger team where you can squash bugs faster?
Andrew: Yeah, absolutely. It does a good job of combining similar JavScript errors. Like, if multiple people are having the same issue, it will combine all those. There's a good overview of how many people an error is affecting. If it's just one single who's on Linux, running an old version of Firefox, then we know; or if it's something that's affecting everybody.
Adam: It helps you power test too, I assume, which is important at your stage.
Andrew: Yeah, absolutely. Whenever we get a new error that comes in, it goes to our Slack channel so we're notified right away whenever we have a new issue. The page on Rollbar that has the error information is nicely laid out. Since they've got source-map support, we can actually see exactly where in the code - even though our code is minified, we can see exactly which line is causing the issue.
Whenever we get a new error that comes in, it goes to our Slack channel so we're notified right away whenever we have a new issue.
Adam: Does it deminify your code then?
Adam: The help of Rollbar is obviously in tracking errors, wondering what's going on; having an application that actually works is the entire point, right? So what was the process like for monitoring, tracking, fixing bugs before you started to work with Rollbar? Did you use something else? What were the tools that you might have used?
Andrew: We started out using actually Google Analytics to basically track bugs, but Google Analytics isn't built for error tracking at all, really; it's built for counting pageviews and unique visitors, conversions and things like that. That didn't work, so we were looking around, we found a tool called Sentry which did a little bit better. It gave us some stack traces and it gave us a little bit more information, but it was not exactly working very well for us. So we started looking around and we found Rollbar, switched over to that in probably about an hour.
Adam: What were some of the challenges you faced in general and how did it just not work right for you, and ultimately what was the thing that made you feel like "I need to find another solution because this isn't working"?
Andrew: Previous tools provided some information about the exact error that we were getting, but we weren't able to really drill down and see who it was affecting and how often, and there were just some usability issues around the other products we tried.
Adam: It sounds like there was a lack of visibility to the details. You need access to specifics, "Hey, if you're gonna tell me my application is broken in this area, then give me on this information. Who else has had this issue? Is this issue reoccurring? What code is tied to it?", things like that.
Andrew: Actually I just found the story that I had from November of 2014 to switch over from Sentry to Rollbar. You can basically send anything to Rollbar, you can send arbitrary data, and I think at the time at the other tools where lacking and not as robust.
Adam: Right. So not only just tracking the error, but tracking things like customer name, location in the world, things that matter to you when you track errors. Kind of like customizability.
Andrew: Yeah, I mean the thing is that we could not only send errors, but also track user interactions and start looking at a specific person to see what chain of interactions lead up to this specific error happening.
Adam: What were some of the challenges you were facing related to error tracking? You said you've got a small team, so those are the ones that are kind of tracking errors, it seems, unless I'm wrong. But talk to me about some of the challenges faced in general when trying to track errors and trying to monitor for errors, and ultimately trying to build a better product.
Andrew: It's a small team working on the web app, and on the client-side we don't have a QA team, so it's us doing all the QA and all the firefighting when errors happen. There is definitely a tension between having a rigorous integration test suite and having test-run everything, and then on the hand having a barebones integration test suite and the fixing things quickly in production. Obviously, you don't want errors to be happening all the time, you don't want errors to be happening... Basically, every error that happens, you lose credibility in the face of your customer, and what Rollbar let us do is to very quickly react when a error actually happens, and keep errors from becoming huge issues for us as a company.
Basically, every error that happens, you lose credibility in the face of your customer, and what Rollbar let us do is to very quickly react when a error actually happens, and keep errors from becoming huge issues for us as a company.
Adam: No matter how good your engineers are, no matter how many cups of coffee you drink that day, no matter how alert you are, you're gonna have errors. Everyone writes errors, so it's important to not only find errors and fix them fast, but be able to really understand why the error came about. Can you talk a bit about the Rollbar interface, how you can look at an error, see different customers it might have interacted with, maybe actually unfurl the actual code there - can you talk to me about that piece there?
Andrew: Let's see I'm looking at a specific error - I can see the exact stack trace for that error; because we had source-maps enable, we can see exactly what line caused the issue, not just line 6000, column 20. We can actually look at the stack traced and know exactly what the problem is, which is great. On top of that, we can see who it has affected, how often it's happened over the last 60 days (which is great) and with any alert we can pass in arbitrary data, so we can pass in basically a list of interactions that lead up to that point in time. User clicked on story, open story, update the owner, and if that was the last thing in the action, then we know it's possible that it was related to that.
Because we had source-maps enable, we can see exactly what line caused the issue, not just line 6000, column 20. We can actually look at the stack trace and know exactly what the problem is, which is great.
Adam: So it's definitely given you a trail of breadcrumbs to follow, much better than obviously just knowing an error occurred and the piece of code it touched. You actually have some other information that you're able to send to Rollbar through the way you interface with it as you see fit. It's something that is customizable to your needs.
Adam: You've got a small team, right? Speed matters, so being able to reduce the amount of time it takes for you to investigate an error, maybe even fix the error that day and ship out to production and solve the problem once and for all... Talk to me a bit about examples of reduced time and how important that is. Do you have any particular examples where because of Rollbar you were able to track the error and then see the code quickly and fix the error that day, potentially?
Andrew: Absolutely. It actually just happened last week, we did a deploy and a couple minutes later everybody got a new alert in Slack, and because we had source-maps enabled, I knew exactly what the problem was and we had it fixed and deployed within five minutes. It's not like that every time, but it happens often enough that it's a huge benefit for us.
Last week, we did a deploy and a couple minutes later everybody got a new alert in Slack, and because we had source-maps enabled, I knew exactly what the problem was and we had it fixed and deployed within five minutes.
Adam: Talk to me about how important it is to have a tool like that available to you. Like you said, it doesn't happen every time, but it's a possibility that it can not only track the error, but it can also give you the insight to be able to use your own brain as an engineer - look at the code, figure it out and actually within five minutes solve a potentially big problem. How important is that to you?
Andrew: It's really important. Actually we not only fix the error, but we can contact the user directly and say, "We noticed that you're trying to do this one thing... It's fixed now." Rollbar lets us be proactive and fix impactful issues that could be a disaster for us in terms of how people perceive us and the quality of our product. It basically guards us against that.
Rollbar lets us be proactive and fix impactful issues that could be a disaster for us in terms of how people perceive us and the quality of our product.
Adam: I imagine a small team like your is pretty customer-focused because you're a startup. You're obviously counting every user experience you get, wanting that user experience to be positive. How has that specific thing where, since you're able to send arbitrary data to Rollbar and actually which customer interfaced with this error, had this error, got hit with this error - how has that changed how you interface with customers on a support level, even if they haven't said, "Hey, I have this error" because Rollbar told you, they didn't tell you through support, or something like that. How has that changed how you focus on and deliver customer happiness?
Andrew: Yes, it can be really difficult to investigate and figure out what is going on on a customer's screen, and pretty much the first thing that we do is actually going to Rollbar and take a look at the error feed. Since Rollbar lets us drill down and see specific people, we can see what if anything that user's been doing to kind of give us an idea of what possibly could have gone wrong. So it's definitely helped with the initial conversation that we have, where we don't have to say "What browser are you using?" - we know that already. We know exactly what version of the code they're looking at, which is great, because we can do a deploy, but a user might have that tab open for two weeks and is running two-week-old code.
I hate to admit how many errors we've had to fix, but there've been many cases where we've been able to fix whatever issue the user reported within minutes, if not before the report of the error, because they were still looking at the old version of the page, so all we had to do was say "Just refresh your page, it's already been fixed." Their response is almost always like, "Wow, you guys are amazing."
It's pretty great to be able to take what should be a crappy situation where we have a error and turn it into a success story where the customer is actually really happy about us.
Adam: Now be honest with me... Don't say this because I'm asking you to, but based on this entire conversation, could you build a successful version of Clubhouse, keep customers happy, do all the things we've talked about in this conversation without Rollbar?
Andrew: It's very safe to say that we wouldn't be where we are right now if we weren't using Rollbar.
Adam: That's awesome. Thanks so much for your time today, Andrew, I appreciate it.
Thank you to Andrew and the team at Clubhouse for sharing this level of insight and for leading by example when it comes to error tracking and working to maintain error-free experiences for their users.
If you haven’t already, sign up for a 14-day free trial of Rollbar and let us help you take control of your client-side and server-side application errors.