One of my younger brothers and friend joined a new company. The brand was old, but they restarted their operation, So they were still hiring people. They did not have a technology team yet (engineers were yet to join the next month). But they got the task for creating ticketing solution for one of the biggest cultural events in the country. I offered to help them make the application.
It was a three day event. The registration system will be open for five days. Every day, sixty thousand unique people could register. When I heard about the requirement, it felt like a single page single operation application. Then I went into the details, there were some challenging parts:
- The registration will start at a certain time of day, will stop automatically after the daily quota is full.
- Users will enter their phone number, Get a OTP, and only after verifying the OTP the registration will be complete. The OTP is valid for 5 minute.
- The users will be able to download a Three page PDF with Barcode on the next page containing ticket for the three days, and that PDF will also be sent to them via email.
- The users will get a confirmation SMS after registration.
There's more to it, As they were the partner of the show, they were not getting paid for the application. So the authorities wanted to keep the cost (including server cost) under $100.
So, I had a few problems to solve:
- Send a lot of SMS in the shortest time
- Send a lot of emails.
- Generate Three barcode, and Three page PDF synchronusly for every registration.
- Keep the server cost low.
I started creating the backend application with nodeJS
and frontend application with Angular JS
. We approached one of the local bulk SMS providers for bulk SMS, and bought a Mailgun subscription for sending email.
The server was a $20 Digital Ocean Singapore server. And the development took three days to finish. It was deployed in the server and scaled up to 12 instances
using pm2
.
The application went live and on the first day when the registration started. I thought, "well, people should not jump on it like crazy right?". Man I was wrong. We looked at google Analytics and the visitor counter started going up. And when we were ten minutes into the registration, The visitor count was 100k+
. The server was literally crying.
Within next ten minutes, the whole system crashed with roughly 10k
registration. The daily quota was almost six times, but the application could not take the huge load. I was not prepared either, I didn't think 100k+
people would stay awake to get a cultural event ticket in the middle of the night. We started getting phone calls from the event organiser. And everyone was having hard time explaining the problem.
I thought the bottleneck was outgoing SMS. but the Bottleneck was PDF generation. I utilized a nodeJS library that uses PhantomJS (webkit)
to generate PDF. Allthough I created only one webkit instance per application instance, It was eating all the memory. Webkit being webkit.
I had to rewrite the whole PDF generation part using a lighter library (PDFKit
) within a very short time, We increased the server capacity to a ~$60
Digital Ocean instance and Increased the number of application instances to 18.
After a couple of hours of downtime, The application was up. And the visitor counter started going up again. That was 5 in the morning and more than 80k
people tried to register.
The first day was fine and we thought the rest of the days will go fine as well. The next day I went to a designer's event. The organisers changed their ticketing schedule again and in stead of 12 at night, they will start distributing tickets at 12 at noon. At that time, the registration automatically started and I was having Chicken biriyani at the designer's event. Suddenly got a call from the organiser. The registration site was inaccessible. I tried to figure out the reason and found abnormal traffic surge. Traffic was not even reaching the server. Digital Ocean was rejecting them. It was a DDOS attack. and entirely my mistake. I should have used cloudflare
.I started configuring CloudFlare, sitting in a nearby coffee shop. I was having hard time doing this simple task, thanks to the tension. Sabbir, helped us to configure CloudFlare DNS. and the site was accessible again.
Then came another challenging part. It took only half an hour to fill the daily quota. Many people called the organiser and complained that they could not get ticket. We had a hard time convincing the organiser that for real all the tickets were taken within the short time. They thought there will be very less audience this time. They also thought there must be a technical problem we are hiding. They never had the experience of running out of tickets within half an hour.
As it turned out, That year, the biggest number of people could come to the event. The event was a huge success and The ticketing system worked fine.
I had to use every bit of resource, and try every type of optimisation in order to make it work. The experience taught me a lot.