Building a serverless business intelligence (BI) cloud service -- providing a BI service with serverless components
serverless and data stack (5)
In the previous post, we wrote the first half about how to build a serverless BI cloud service, “basic building blocks for a BI server”; and in this post, we will be discussing the latter half — how we can make a BI stack evolve into a serverless architecture.
Instead of talking about theories, we will be using an open source BI code base and concretely talk about how we may build a working BI service with serverless components.
Why may you want to build a BI service using serverless components?
Business Intelligence (BI) stack is an important stack that present a data project in an intuitive and easy-to-understand way. Imagine a CEO asks for some data so that she/he can understand the top line and bottom line of her/his business, a dry table may not be what she/he really would like to get. Instead, some form of data visualization may give her/him a lever to present and interpret the data in an intuitive and easy-to-understand way.
BI stack typically is the ‘front-end’ of a data project. And the truth is that we often can use a BI stack to solve a lot of internal data presentation problems ranging from finance to IT tickets trend analysis.
Yes, you may choose to write from scratch with data visualization tool to present the data — but you will run into the problem that for every new data presentation request, you need to write new code;
Yes, you may choose to subscribe a commercial BI service, such as Looker / Tableau, to fulfill the need — but some of these commercial offerings may not fit your bill.
If your org has to build some kind of data visualization and data understanding solution internally, then building a BI service that doesn’t need much operational maintenance (Serverless architecture enables NoOps) may be a very attractive option.
A quick intro to Redash
Redash is an open source Business Intelligence (BI) stack. Here are some facts enough for you to understand its current status:
Redash is developed mainly in Python and Javascript, under Apache 2.0 license;
Redash as a company was aquihired by Databricks in the middle of 2020. (After acquisition, the community is still active);
There was a hosted Redash cloud SaaS service at app.redash.io, but the team has announced that “We've decided to discontinue Hosted Redash effective November 30, 2021.” (understandable, because the team has been acquired by Databricks to work on their internal features)
You can try it on your laptop with its docker image;
High level steps to evolve Redash into a service that operates in a serverless manner?
We can serve the web tier server component with AWS lambda + AWS API-Gateway. Specifically, we can set up the API-Gateway to serve in {proxy+} mode so that we can leverage the existing code out of box;
Instead of hosting our own Task Queue service, we can use a serverless Redis service Upstash;
Instead of running a serverful database solution such as RDS or serverful Aurora, we can use a serverless Aurora;
NOTE: Redash works with PostgreSQL as its metadata DB and cache result DB, but does not work with MySQL; as of this post (9/14/2021), we can only use Serverless Aurora v1 because serverless Aurora v2 does not provide PostgreSQL compatibility yet; but once Serverless Aurora v2 supports PostgreSQL, we should use v2.
For the controller component, it’s used for control tasks such as making the scheduled query tasks get into the queue; we can run this component with AWS Fargate;
For the query worker component, it’s used for conducting the actual queries and write the results to cache DB; we can run this component with AWS Fargate as well;
NOTE: this layer does the heavy lifting work for the entire BI stack, and we will need to have an autoscaling mechanism with monitoring services such as AWS Cloudwatch.
The beauty of using the above serverless components is that the long-term operational and maintenance cost is very low once the solution starts working.
Last: find an opportunity to serve the data need for your org while developing your serverless computing skill-set
Building a BI solution with serverless components is a wonderful engineering exercise to familiarize yourself to the serverless building blocks as introduced in this post.
BI stack is a common stack that can provide a lot of value to present data in an intuitive way, therefore, there could be a lot of internal needs where it can show huge value to your employer — this means you may identify the opportunity and propose a solution that you drive the technical roadmap;
Therefore, you may easily find an opportunity where you can step up and propose a solution with the open source BI stack mentioned here — align your personal technical growth opportunity to company’s interest;
In this way, you can not only develop the serverless skill-set but also do so to accelerate your career in your organization.
One more thing…
If you have any questions when practicing that exercise, please join this serverless cloud computing community; if we found enough interest, we can have a cohort of group exercise together.
This is a very good proposal. I would like to join the cohort group.