You can now rename/edit your error titles. Fix ugly long titles. Hover over, click, edit, and save.
You can now rename/edit your error titles. Fix ugly long titles. Hover over, click, edit, and save.
Supercharge your issue and error tracking workflow when you connect your Rollbar and Bitbucket accounts. New Items in Rollbar will instantly create Issues in your Bitbucket repo, or you can create and link Issues with the click of a button within Rollbar.
Go to your project's Settings, then Notifications, and select Bitbucket Issues from the list of channels. Click 'Connect with Bitbucket” to grant Rollbar access to your account.
From here, you can choose which repository, and add/edit/remove rules for Issues to be created automatically.
Like magic, your Rollbar error items and details now show up in your Bitbucket repo. Success!
Prefer to create Issues by hand? You can create an Issue directly from the error Item page in Rollbar, or link with an Issue that already exists. You can use this alongside the automatic rules; or, remove the rules for full manual control.
We're working toward full support for Bitbucket, like we have for GitHub - Issues, Source Control and Authentication. I know Rollbar users who rely on Bitbucket in their workflows are rejoicing. :)
Let us know if you have any feedback or questions. We're here to help.
Deploy and enjoy!
Yes, that's correct.
Daily, Hourly, New Errors, and Trend graphs are now clickable. You can find and fix bugs even faster, and in less clicks. :D
Common usability feedback we get from our users:
Sure would be nice if I could click the dashboard bar graphs and sparklines to quickly see what caused a spike in error events etc.
Couldn't agree more. We love aggregating data and we love it clickable. So we enabled it!
The following are now clickable in the project Dashboard:
Trends are also clickable on the Items page. For reference Trends are these guys also called 'sparklines'.
When viewing a specific error item, the Last 60 Minutes, Hours, and Days are now clickable and aggregate error data by your selection.
We're excited to get this features out the door. It reduces a lot of friction in navigating Rollbar. One of many UI and UX improvements to come. :)
Login today and go click through your data now.
Don't have a Rollbar account? No worries, you can give our Live Demo a try and 'click all the things'. ;)
Deploy and enjoy!
The infrastructure behind most modern web applications includes an assortment of tools for collecting server and application metrics, logging events, aggregating logs, and providing alerts. Most systems are made up of a collection of best-in-class tools and services, selected and deployed over time as team members arrive and depart, needs change, the system grows, and new tools are introduced. One of the challenges web development and operations teams face is collecting and analyzing data from these disparate sources and systems and then piecing together what’s happening by looking at multiple reports and dashboards.
Two common pieces in this puzzle are Logstash and Rollbar.
Logstash (and the Kibana web interface, both of which are heavily supported by and integrated with Elasticsearch) lets you collect and parse logs, store them in a central location, search and explore the data via the Kibana UI, and output events to other services. Logstash provides a powerful tool for taking logs in many different formats, converting them into JSON events, then routing and storing those events.
Rollbar collects errors from your application, notifies you of those errors, and analyzes them so you can more efficiently debug and fix them. With a few lines of code or config changes to your application, you can make errors, complete stack traces, trends and affected user reports accessible via your Roller dashboard. Like Logstash, Rollbar collects and analyzes events represented in JSON.
By connecting Logstash and Rollbar, you can not only centralize and analyze your system and application logs, but also improve error tracking and simplify debugging by providing context to developers looking at errors generated by their code.
For Rollbar users, Logstash allows you to collect errors from external applications and ship them to Rollbar, where they'll appear on the same dashboard as your application errors. Database and web server errors, for example, can be passed along to Rollbar to help developers determine whether the error is due to a bug, database performance issue, or operational issue with the web server.
To get started...
Quick tip: If you are running out of file descriptors in your Beanstalkd process, use /etc/default/beanstalkd to set the ulimit before the init script starts the process.
# file: /etc/default/beanstalkd BEANSTALKD_LISTEN_ADDR=127.0.0.1 BEANSTALKD_LISTEN_PORT=11300 START=yes BEANSTALKD_EXTRA="-b /var/lib/beanstalkd -f 1" # Should match your /etc/security/limits.conf settings ulimit -n 100000
Lot's of resources online tell you to update your /etc/security/limits.conf and /etc/pam.d/common-session* settings to increase your maximum number of available file descriptors. However, the default beanstalkd installation on Ubuntu 12.04+ uses an init script that starts the daemon process using start-stop-daemon which does not use your system settings when setting the processes ulimits. Just add this line to your defaults and you're good to go!
You can now setup notifications every time an error occurs. Previously specific error Notifications were only avaiable for New Items and 10^th Occurrences. Notification Rules are available for all Channels (Email, Slack, HipChat, Trello, PagerDuty).
Ever wanted to assign error items to other team members in Rollbar? Of course you have. Now you can. It is a pretty straight forward enhancement, but here is an overview.
On the error ‘items’ details page, there's an “Assigned to" dropdown with the members of your team. Once assigned, we’ll shoot an email to that team member letting them know you assigned that specific item to them, including link and details. They'll be automatically added as a 'watcher' for that specific item and will receive notifications about any comments and updates.
Assignment events will be listed in the item history section, so you can see who assigned it to whom, when.
To quickly find items assigned to yourself or others on your team, search 'assigned:me', ‘assigned:username’, or 'assigned:unassigned' on the Items page.
We're excited to get this out into the wild. Especially for some of the larger teams using Rollbar. Let us know what you think and how we can make it better for you and your team.
Node.js has a built-in debugger that you can start in running processes. To do this, send a
SIGUSR1 signal to the running process and connect a debugger. The one, big caveat here is that the debugger only listens on the local interface,
The following are instructions for debugging Node.js applications running in your company's private network from your laptop, through a bastion host.
prod-host $> kill -s SIGUSR1 <pid>
prod-host $> ssh -N -q -L <private-ip>:8585:localhost:5858 <private-ip>
laptop $> ssh -N -q -L 5858:<private-ip>:8585 <username>@<bastion-host>
127.0.0.1using port 5858, name it and save.
At this point your laptop will have connected to your local SSH tunnel which will be connected to your production host's private network interface which will be tunneled to your production host's local network interface and your Node.js process.
PyCharm → local SSH tunnel → bastion host → production host private network → production host localhost → Node.js
Set some breakpoints in PyCharm and watch as your production process begins waits for you to step through your app.
Note: If you'd rather use the command line instead of PyCharm just run the node debugger from your laptop:
laptop $> node debug localhost:5858
Sometimes PyCharm will just not connect to the running process on your production machine. Try restarting each of the SSH tunnels.
RQL now includes a basic library of string functions. You can use these to slice and group your data in arbitrary ways. For example, "email domains with the most events in the past hour":
SELECT substring(person.email, locate('@', person.email)), count(*) FROM item_occurrence WHERE timestamp >= unix_timestamp() - 3600 AND person.email IS NOT NULL GROUP BY 1 ORDER BY 2 DESC
The new functions: concat, concat_ws, lower, upper, left, right, substring, locate, length, char_length. The functions are implemented to be compatible with MySQL; see the RQL docs for details.
Yesterday from 2:20am PST until 10:22am PST, we experienced a service degredation that caused our customers to see processing delays reaching nearly 7 hours. While no data was lost, alerts were not being sent and new data was not appearing in the rollbar.com interface during this time.
We know that you rely on Rollbar to monitor your applications and alert when things go wrong, and we're very sorry that we let you down during this outage. We'd like to share some more details about what happened and what we're doing to prevent this kind of issue from happening again.
When data is received by our API endpoints (api.rollbar.com), it hits a load balancer which proxies to an array of "API servers". Those servers do some basic validation (access control, rate limiting, etc.) and then write the data to local disk. Next, a separate process (the "offline loader") loads these files asynchronously into our database clusters. Then, a series of workers process the raw data into the aggregated and indexed form you see on rollbar.com, and send alerts for any triggered events. This system is designed for reliability first and performance second.
When occurrence processing latency exceeds 30 seconds, we show an in-app notification that processing is behind. This is calculated as follows:
The API tier primarily receives three kinds of data: occurrences (the "item" endpoints), deploys, and symbol mapping files (source maps, Android Proguard files, and iOS dSYMs). Currently, all three of these are loaded by the same offline loader process, to different database clusters depending on the type of data.
At about 2:00am PST, a node in the database cluster that stores the symbol mapping files ran out of disk space. Unfortunately, this did not set off any alerts in our monitoring system because the disk space alert had been previously triggered and acknowledged, but not yet resolved.
At about 2:20am PST, the next symbol mapping file arrived on one of the API servers and since the database server was out of disk, it could not be loaded. This caused other files on that API server--containing occurrences and deploys--to not be loaded either. At this time, a processing delay first appeared in the Rollbar interface, and some (but not all) data was delayed. Over the next several hours, the delay continued to rise (as data on some API servers was not processed at all) and the percent of data that was delayed also rose (as more API servers enocuntered the same problem).
At 8:25am PST, a Rollbar engineer started work for the day and noticed a support ticket about the processing delay. He immediately escalated to a second engineer who began investigating. At 8:40am PST, a third engineer joined and updated status.rollbar.com to say that we are investigating the issue.
At 9:05am PST, we identified the immediate problem that the symbol mapping files were blocking occurrences from being loaded. We began mitigating by moving those files aside to allow the higher-priority occurrence data to load. This began the recovery process, but created a backlog at first level in the processing pipeline, causing all data to be delayed (instead of just some).
At 9:11am PST, we identified disk space as the root cause, and resolved this a few minutes later. At 9:35am PST, we updated status.rollbar.com to state that we had identified the issue.
At 9:55am PST, processing latency hit a peak of about 25,000 seconds. We updated status.rollbar.com with our estimate of 36 minutes to full recovery.
At 10:43am PST, processing was fully caught up. status.rollbar.com was updated a minute later.
Once our team became aware of the issue, we were able to identify and fix it relatively quickly (40 minutes from awareness to identification, with fix immediately afterwards). Recovery was relatively fast as well, given the length of the backlog (1hr 38minutes to recover from 7hrs 45min of backlog).
It took far too long to for us to notice this issue, however, as our automated monitoring failed to alert us and we only discovered the issue via customer reports.
We've identified and planned a number of improvements to our processes, tools, and systems to address what went wrong. Here are the highlights:
We're very sorry for the degradation of service yesterday. We know that you rely on Rollbar for critical areas of your operations and we hate to let you down. If you have any questions, please don't hesitate to contact us at firstname.lastname@example.org.