In today’s world of technology, we have apps available in the market that can enable us to do our day to day tasks effectively. The success of any app hugely depends on one of the main factors: End user Experience. 

There are many things that will enhance the end user experience and one amongst them is user notification. It becomes even more important for a company like Cashfree Payments which is in the Payments space and handles 20 million+ API requests in a day. This blog will explain how we have built a Notification Service to increase the notification delivery along with scalability.

Why Do We Need a Centralised Notification Service?

Over the years, Cashfree Payments has launched multiple innovative tech products like Payouts, Cashgram, SoftPoS etc. These applications deal with large volumes of user payments related data and it is very important for these applications to notify these users about the status of various user operations performed in the application in near real time.

  1. A centralised micro service like a notification service will help in abstracting these common user notification requirements from various internal services.
  2. Having a notification service allows us to add new notification modes easily. It also helps us add or update notification service vendors without affecting specific applications.
  3. As Cashfree Payments grows bigger and starts its operations outside India, we will need to have the ability to handle internationalisation. In such cases, a centralised notification becomes important so that all notification related changes can be done in a single service.
  4. In general, as the organisation’s use cases expand, we can add more sophistication into the notification services, for example priorities, scheduling, better quality of reporting

How Have We Built Our Notification Service?

Our notification service is built by keeping scaling and notification success rate as primary success criteria. As of today the service handles approximately 3000 requests per minute and this number is growing every single day.

We currently support four different modes to send the notification:

  1. SMS: SMS notification will be sent to the end user mobile number.
  2. Email: Email notification will be sent to the list of mail ids.
  3. Whatsapp: Whatsapp notification will be sent to the end user mobile number.
  4. Webhook: Webhook URL will be shared by the end user and we will notify the user by triggering the webhook url

Architecture Diagram

Major components and their responsibilities:

  1. Kafka: Kafka allows us to decouple the speed at which requests come, from the speed at which they can be delivered to destination. They can also be buffered if the destination is down.
    In our case, Kafka will be used mainly to perform a couple of tasks:
    1. To help internal teams publish a notification request on a particular topic that the notification service can process.
    2. To publish the notification response from the notification service that internal teams can consume.
  2. Centralised Redis: We use Redission (Redis client for Java) delayed queue that will help to retry a failed notification using exponential backoff strategy. We can even use Kafka for retry but we preferred Redis as we started and over a period of time, we have plans to use Kafka for the retry use case too.
  3. MySql: We are currently publishing the notification response to the response Kafka topic. We will be persisting this data in a MySql database too which will help the end users query the notification responses over a period of time.

Below is the typical happy path flow of how a notification is delivered to the end user:

  1. Various internal applications publish to the notification service Kafka topic.
  2. The notification service listens to the Notification Request Kafka topic and parses the incoming request to notification request which will have details around the mode of notification (For eg: SMS, Webhook, Email, Whatsapp), notification template (we use a moustache template system) and notification related details.
  3. Our notification service uses Spring Webflux framework which helps us in having a concurrency model which can further help us handle increasingly more requests with relatively less number of threads. Once the notification request is ready, the request will be sent to the respective implementation based on the mode of notification (For eg.: SMSImpl, EmailImpl, WebhookImpl or WhatsAppImpl). We have multiple service providers for each notification mode.
  4. We choose a service provider to send the notification and forward the notification request to the service provider and attach a callback to get the delivery status.
  5. We publish the notification response back to notification response Kafka topic which can then be consumed by our internal teams to get the status of notification.

Having a happy path workflow running is good. But while building large enterprise applications, it is extremely important to identify the failure cases and have flows to handle the same. 

So below are some of failure cases in Notification delivery and how we handle them:

Retrying a Failed Notification

We have a centralised delayed queue, where we will push a notification request if there is a failure seen in the notification delivery status. Below is the detailed workflow:

  1. Failed notification is sent to a delayed queue with some initial delay based on the notification mode.
  2. We have notification retry workers, which keeps polling this queue and if it will help in retrying the request. If the request fails again, then we will use the exponential backoff strategy and send back the request queue again. We will repeat the process for some time, again based on the notification mode.
  3. If the notification fails after all the retries, then we mark the request as failure and an alert will be sent to internal systems.

Replaying a Failed Notification

Notification response will be written back to Kafka response topic.

  1. Our internal teams can consume the notification response.
  2. If there is a failure, then our team can send the same request back to the notification service and the service will try to send the notification.

How Do We Trace Notification Status/logs?

There are two ways in which we can now track the notification status:

  1. Subscribing to the Notification Response topic which will provide the notification status related details.
  2. Notification status will be published to our internal logging system which users can query to get the details.

How Do We Monitor Notification Service Health?

  1. We have micrometer metrics enabled for all the critical flows in the Notification Service.
  2. We have internal dashboards which read the metrics and help us monitor the system closely.
  3. We have an internal alerting setup which will alert the notification team whenever any of the set thresholds are breached.

Future Scopes

  1. We are building a data persistence layer to persist the notification data in the database for a certain interval of time.
  2. APIs to help internal applications to get the notification status and replay a notification
  3. Support for new modes to notifications like Slack.
  4. Provide ways for apps to replay a failed notification.

We have a lot of data from this service and we are planning to utilise this data to derive useful insights.

Does this sound exciting and intriguing to you? Then we have some exciting opportunities lined up for engineers like yourself!

How Cashfree Payments Achieves Concurrency: With Great Power Comes Great Responsibility
Author