In a typical setup, we need a resilient way to track down the errors. To think of it, we're writing code and we're generating the error. Thus, we should know it even before a user reports.
While tracking logical errors might be a technical challenge, it's not difficult to track exceptions/errors. In short,
We deploy code → Customer sees the error → Customer verifies it's an error → Reports to support → Reports to tech team → debug → resolve → deploy → update customer.
We deploy code → Customer sees the error → We get it reported as a bug to tech team automatically with all the necessary debug info (browser, OS, mobile device, Stacktrace, Line number in the code, number of users getting affected, number of times the issue had come, etc.) → resolve → deploy.
We just don't want to give the customer an opportunity to report error. We want to be more proactive, never reactive.
We use a tool called Sentry. It's an open source tool, which has support for most of the languages and platforms to capture exceptions and errors. It has a nice dashboard, where we can see set of exceptions, and prioritize bugs based on number of users getting affected, blah blah. It sends email notification, at a very low frequency by smartly aggregating events into issues*. Going further, we can assign the issues to corresponding people, create JIRA tickets, Link zendesk issues, etc.
Event – An exception happened. If an event is new, it's created as a new issue. If an event had been seen earlier, it's added to existing issue.
I'm a developer, What am I supposed to do?
Register yourself on your company's hosted sentry website. Browse to projects and teams, join teams of your interest. Manage your notifications. DO NOT MARK AS SPAM. Create new projects for your microservices and add them. To embed sentry in your project, refer to https://docs.sentry.io/clients/ and make necessary changes in your logger.
I'm a support team member, how can this be helpful?
We won't have to call customer to ask what had (s)he done, leading to the error page. Whenever there's a support ticket, you can quickly check if there's a related issue in Sentry (by searching for license code, or something) and go to the right person to get it fixed. Developers should be able to identify issues quicker than earlier, as we have some amount of info.
What else can I do with this tool?
The tool helps you drive certain reports and set triggers based on various metrics. If your only interest is to get alert whenever something goes haywire with Flipkart, you can have it configured. Also, if you want a report like how many users from UC Browser are facing issues, we can get them too. Take this opportunity to find legacy code, which had always been working (or thought so) and fix the code.
Technical details, please?
This is almost never required, however good to know. Sentry is django application, we use Postgres as database and redis for in memory database. Background worker processes (celery) takes care of sending emails, cleaning up events, etc. It's open source and considered to be one of the cleanest projects. We run it in a Ubuntu VM. However, we'll move it to docker in a couple of months.