Building a serverless business intelligence (BI) cloud service -- basic building blocks for a BI server
serverless and data stack (4)
We are going to discuss how to build a serverless business intelligence (BI) stack in two posts.
This is the first half, a brief introduction about the basic functional building blocks of a BI service. In the next post, the 2nd half of this discussion, we are going to discuss how to build a serverless BI service by slightly modifying the open source BI software Redash code base. If you enjoy reading this post, I’d appreciate if you show your interest so that I get the positive feedbacks 😀.
A quick intro to BI software
Business intelligence software vendors tend to say that BI software can solve all the problems that is related to understand data. For example, here are some quotes from the vendor’s websites:
Tableau helps people see and understand data. Our visual analytics platform is transforming the way people use data to solve problems.
Looker serve up real-time dashboards for more in-depth, consistent analysis. Access to trustworthy data enables teams to collect fresh results for more precise reporting.
Roughly speaking, a BI software gives data teams and their customers the final asset of a data project, be it a report or a dashboard. And to software engineers, representing the asset through some kind of data visualization is probably the most noticeable nature of these softwares.
Lately, a term called “embedded analytics” seems to have made business intelligence software to attract new attentions — just imagine, what if the asset of a data project can be embedded into a CRM or other type of SaaS platform seamlessly, which could make these asset to penetrate into audience that did not use / consume them before.
Some examples of open source BI software and their architecture building blocks
Other than commercial BI vendors such as Tableau and Looker, there are several very usable open source BI software alternatives:
Apache Superset (the venture funded commercial company is Preset)
Metabase (the venture funded commercial company is Metabase)
Redash (the engineering team has been acquired by Databricks, but the online service is here)
Among these open source BI software, here we introduce their common architecture and building blocks.
These BI software stacks commonly contain the following components:
web tier servers that take customer’s requests;
a task queue service that is used to facilitate the query tasks between the user requests and the actual query engine;
a database service that stores metadata, typically metadata in a business intelligence software contains users, groups, access privilege, query task owners…;
query workers that conduct the outbound query toward database or data warehouse and then store the results into the query result cache engine👇;
query result cache engine: a database service (or other type of storage engine) used for storing the query result
Note: there are other helper services such as a mailer service that is used for email notification delivery, but we ignore it in the diagram for now.
Next post…
We are going to discuss, using Redash as a concrete example, how we may build a serverless BI stack on AWS. With a serverless architecture, a BI service provider’s infrastructure cost and maintenance cost may be significantly reduced, so that they can provide cheaper service so that they can penetrate bigger target audience.
If you are interested, please stay tuned…