Star schema is the most common approach to data modeling — primarily made up of fact and dimension tables.
Why you need data modeling
You break your analytics data up into:
Data Engineers create many of these fact and dimension tables on top of your raw production data in your data warehouse, which enables analysis, reporting, and BI.
Star schema was created because storage was expensive and fact and dimension tables make it easy to aggregate large amounts of data into smaller chunks. With the advances in data warehousing, storage is now super cheap, BUT we’re still using fact and dimension tables.
Companies are different now. Data sources and formats change frequently and companies use data in very different ways.
Your data, and your ability to answer questions rapidly and accurately is your competitive advantage. Therefore, for a competitive company, the number of data questions grows exponentially.
Unfortunately, a star schema is bad at handling change.
When deploying a star schema, data engineers have to imagine all the questions you want to ask (facts/measures) and all the ways you want to ask those questions (dimensions) and it’s truly impossible to capture all of it.
The problem is when you look behind the curtain of each fact and dimension table you’re met with a cryptic 1000 line query. This query is full of crucial assumptions that were necessary when the table was generated months, if not years ago.
And when your CEO asks for a new dimension on the dashboard, no one wants to decode that 1000 line query. It’s too risky. So you end up duplicating the logic to create a similar reporting table with the new dimension and then leave the old table with all the old assumptions intact (no risk). But, now that you have a similar table, you’ve separated the source of truth. There are two tables with similar data, but because of slightly different assumptions dashboards won’t match.
There is a right way to use a star schema, but you have to spend weeks planning out every change before deploying to production. And this is for EVERY table and EVERY change! Then data teams can’t move fast enough to keep up with all the new data questions being asked on a daily basis.
That’s right. The assumptions that created star schema are no longer relevant.
Listen, before you do that, go talk to companies that have had a star schema for more than two years. They will tell you the struggles they’ve experienced, and the dangers of this approach: the teams that are being blocked by not being able to get the data they need.
No! Actually, it’s funny because I’ve thought about what would I do if Narrator doesn’t take off. I’ve seen the alternative. With Narrator, every 1 analyst was supporting 8 companies. When I was at my last company we had 40 people supporting only 1 company and answering much fewer questions using a star schema.
That’s why I want everyone to try Narrator. Once you experience it, there’s no going back!