Transforming Engineering at Rollbar

December 18th, 2019 • By Francesco Crippa

It's been a busy year at Rollbar! While many of the new features we built got all the needed attention (press, blog posts, conferences), the underlying work to transform and modernize our platform might have been less visible.

With this blog post I'd like to share with you all the exciting initiatives we launched in this past year and recognize all the deserved credit for all the engineers that worked restlessly to get us where we proudly are today.

In order to really understand the amplitude of the transformation and the reasons behind it, it's important you realize why and where all of this started.

A year ago our Engineering team was less than half the size, our User Experience was struggling to keep pace with modern design and, at the same time, our business was growing double digits in constant and continuous expansion.

The number one priority that every customer would have underlined was availability! The more Rollbar was getting adopted from teams focusing on deeply integrating it in their CD/CI pipelines thorough APIs and WebHooks, the more our ability of processing errors in real time was a dealbreaker for our customers.

It was clear that to be able to match the forecasted demand, some parts of our architecture needed to change.

At the same time it become evident that to win in the Error Monitoring space, we needed to be absolutely best in class in avoiding "noise" across our platform. We needed a way to maintain our leadership in providing the best classification, automatic merging, and fingerprinting of the industry.

And so an ambitious and visionary plan was made. It involved hiring (a lot!), acquiring and re-architecting.

BigData, AI and SameBug acquisition

While most of our competitors rely mostly exclusively on a client-side, light form of digital fingerprinting, Rollbar decided to invest since its origin on a more substantial server-side realtime data pipeline to process each error.

This allowed us to be the first company in our space to experiment and realize the value of AI applied to our data processing pipeline. In order to accelerate the investments in this area we decided to acquire a Hungarian company with years of experience on classification and many IP for the automatic discovery of fundamental error patterns.

SameBug joined Rollbar in March 2019, providing an incredible energy and unmatched experience on AI and classification. The newly formed Intelligence team immediately joined the "Artificial Intelligence Graduate Certificate" at Stanford University and set the direction for years to come: automatic classification by error route cause.

Less than a year later, as first tangible outcome of this investment, our production pipeline can steadily rely on a freshly designed Apache Spark integration, that allowed us to dramatically decrease the number of Singular Instances (errors that happen only once) observed by our users.

Zero Touch Deploys

Since most of our codebase was shaped in relatively Monolithic form, our CI/CD pipeline was relatively simple and straight forward.

But the progressive scaling limits presented by a monolithic database convinced us to reconsider our approach. Instead of optimizing for faster cycle time and a simplified architecture, we shifted in optimizing for scalability. A distributed rethinking of our platform was made and new micro-services started to be written.

With this new approach in place, we also needed to modernize our runtime environment in order to leverage the newly introduced architecture. Kubernetes became our primary deployment target and Spinnaker replaced our internally made Continuous Delivery orchestrator.

Our Kubernetes production environment heavily leveraged Google Cloud managed services, allowing us to reduce the overall investment in proprietary and internally built platforms, and maximize the return over the successful migration to Google Cloud Platform.

The Metamorphosis of Kafka

One of the competitive advantages that distinguished Rollbar from the competitors has been its ability to process the entire data set (not just a sample of it) in real time streams.

To avoid the possible loss of data in case of maintenance of our pipeline, we designed specific off-line loaders to make sure our endpoints would be able to continue accepting and caching new data even in the case of major outages of all the processing components.

The stateless nature of Kubernetes pushed us to rethink the approach to data persistence, introducing a secondary channel to feed our processing pipeline built on Kafka. While this is still in the very early stages of infancy, Kafka is providing a reliable backbone for the decoupling many of our components and micro-services in a way that matches our reliability requirements.

Micro-services, new UX and Single Page Apps

A major redesign or Rollbar UX has been in the air for a while, but we always lacked the right priority in our roadmap to make it happen.

A year ago we focused our investments to upgrade the full set of our functionality to become "multi-project", allowing our customers to better deploy Rollbar in an increasingly common micro-service scenario.

And this became the center-piece of our UX redesign: a new experience where the user would choose the level of granularity required, from the singular crash report to the high level analysis of the whole system.

With a clear definition of what would drive the engagement of our users, we started redesign key elements of our web app, starting from the whole navigation system, to the general approach to filtering and views, allowing for a much fresher and smoother interactivity.

Under the hood, the Engineering team began a major refactor to change the fundamental architecture of our web application, moving towards a "Single Page App" approach.

It's incredible what can be done in such a little amount of time. It hasn't been an entire year yet since we started this major initiative, and here we are, with a brand new User Experience, fresh UI, and a new foundation for our future product releases.

A new DEV environment

It's interesting to see how all the different pieces of the puzzle tend to fit very well with each other in this story.

A common request from our Engineering team was to make our development environment much faster and less resource draining for their laptops.

With our production runtime moving to be fully containerized and with the renew need of running new services in our local environments, the move to Docker was clear, quick and effective.

While drastically improved speed and performance, this new development environment helped also in a different area: onboarding! With the organization steady growth rate this approach simplified dramatically the onboarding procedures for new employees.

The ability for a company to always find the time to modernize its own internal tool and upgrade it's own processes is probably the best indicator of the long term potential that it can express, and I'm really glad our team found the right levers to complete this transition smoothly and in record time!

Service Ownership

With a general shift to distributed architectures, another shift had to happen: transforming our Engineers from being Developers to become Owners!

Engineers love to fully own autonomous components: they can fully express consistency and care for the details, maintaining a high standard of quality when they're empowered to do so.

At the beginning of the year our Ops team was the only team on call. Today all our engineers in all our locations are on call, fully responsible for the health status of our platform and empowered to make any interruption immediate priority for their teams.

Today, they fully own the design, maintenance and operations of all our services.

What's Next?

It truly has been an incredible year in Rollbar. And as so often happens, a lot of parallel stories tend to finally converge to a common and powerful ending. All the small pieces of the puzzle are starting playing a much bigger role in the overall system.

But this was all done just in preparation for the next step. The real game still has to start.

In a year from now, the majority of employees would have been in the company for less than a year. We would spend more time with people we haven't met yet than with the ones we shared the past few years in building what Rollbar is today.

While exciting and inspiring, the next chapter of our story is surely gonna present new and unexpected challenges. The confidence that we'll be able to overcome them is brought to me by the fantastic time we've all experienced together in 2019.

I'm really thankful for what our teams have done, starting with the love, care and dedication they all express in everything they do.

I hope this article will shine some light to all the incredible achievements they've accomplished and to the reasons I'm so proud of being part of this adventure.

See you all in the next level!

Francesco

Father, Maker and VP of Engineering Rollbar

Get the latest updates delivered to your inbox.