自助服务仪表板

自助服务仪表板
Self-Serve Dashboards

原始链接: https://briefer.cloud/blog/posts/self-serve-bi-myth/

通过“下拉菜单和复选框”或文本到 SQL 接口等工具实现自助式商业智能 (BI) 的概念面临着多项挑战，使其真正为非技术用户提供自助服务。工程师和数据科学家常常希望简化 BI 的使用会导致广泛采用，但现实描绘了一幅不同的图景 - 他们反而成为唯一的用户，从而违背了初衷。尽管另有声明，大多数自助式 BI 提供商都将 SQL 隐藏在抽象层之下。然而，主要问题不仅仅是 SQL；还有 SQL。其原因在于用户缺乏对底层数据的理解。无论某人如何轻松地编写 SQL 查询，他们仍然缺乏对数据含义、来源、计算方法、解释技巧和验证技术的洞察，导致整个练习徒劳无功。相反，重点应该转向对非技术人员进行特定数据集的教育，包括含义、处理和可视化方面。虽然此类培训需要投资和耐心，但与不断更新和维护技术娴熟的个人破译的文档相比，它提供了长期的好处。最终，自助式 BI 的目标不应该是完全独立于 IT 员工，而应该是简化部门之间的协作和沟通。通过为技术人员提供语言模型集成、交互式笔记本环境和协作数据探索平台等现代工具，企业可以最大限度地提高效率，同时确保整个组织的可访问性。

本文讨论了让没有广泛数据理解或技术专业知识的个人访问和使用数据所面临的挑战。作者领导数据工程和数据科学团队超过 15 年，他对去中心化数据访问、自助仪表板以及低代码商业智能 (BI) 和文本转 SQL 等人工智能驱动工具表示担忧/可视化。尽管尝试解决该问题（例如使用这些工具），但核心问题仍然存在：缺乏对数据的理解和信任。作者认为，仅靠 SQL 并不能提供解决方案，因为有大量的人可以执行查询，但却误解了数据及其结构。在承认数据可视化和快速洞察的重要性的同时，作者强调，在尝试获得有价值的见解之前，主要重点应该放在数据管理实践和理解数据上。作者认为，直觉和领域专业知识仍然至关重要，而不是盲目依赖数据，并鼓励技术和业务人员之间的协作。作者的结论是，挑战在于解决数据的复杂性和上下文，而不是查询数据的方法。文本包括个人经验和观点，还参考了先前的技术和方法，为数据管理和使用的演变提供了丰富的视角。

原文

Sales pitches are the only place where “self-serve dashboards" work. In the real world, it's a different story.

That story usually starts with an engineer or data scientist who's frustrated because they spend too much time writing queries and preparing dashboards for business people. They think that if they make BI easy enough, everyone will be able to “self-serve”, but that rarely ever happens.

What actually happens is that engineers and data scientists end up being the only people using the "self-serve" BI tool, which, ironically, makes it not self-serve at all.

Why self-serve BI doesn't work

SQL is the only self-serve BI tool. Still, most "self-serve" BI vendors don't want to admit it, so they find ways to reinvent the wheel and disguise SQL as something else. Maybe that's what customers want to hear, or maybe they genuinely believe it, I'm not sure.

Either way, the problem with most approaches to "self-serve" BI is that they consider SQL to be the only barrier to business stakeholders querying data. That's not true.

Let's say you could wave a magic wand and make writing SQL queries as easy as writing an email. Do you think business stakeholders would suddenly start querying data? The answer is no.

Even if these people knew how to write SQL queries, they wouldn't understand the semantics of the data they're querying. They wouldn't know what the data means, where it comes from, or how it's calculated. They wouldn't know how to interpret the results, or how to validate them. In other words, writing the SQL query is actually the easy part.

Now, let's look at two attempts to make BI "self-serve" and see whether the hypothesis above holds.

Attempt 1: The conventional "dropdowns and checkboxes" approach

First, let's take a look at an example of a conventional "self-serve" BI interface which uses dropdowns and checkboxes as a way to make queries more accessible.

The first thing to notice is that this interface is just an attempt at what I call "SQL-by-mouse". I might be just an old grumpy person, but I don't see how this is any better than writing SQL. In fact, it's worse because it's slower, less reliable, more limited, and not generalizable to other tools.

In any case, I'll concede one thing: some non-technical people are just scared of any monospaced writing with syntax highlight, so this interface might be a good first step.

Anyway, let's ignore all these problems and just assume that this interface is the best thing since htmx. Let's assume that it's 100x easier to use than SQL.

Even if that were the case, your CFO, for example, wouldn't just go ahead and use these dropdowns and checkboxes to query data. The reason they wouldn't use it is that they don't have the context to understand the data they're querying, and they'll probably be insecure about the results, which they won't know how to validate because they don't understand how the data got there in the first place.

Now, I've got a question for you: if your CFO doesn't use this interface, who will? That's right, you will. And guess what's the first thing you'll do when you get to these dropdowns? Correct, you'll immediately look for somewhere to type in SQL. Oh, the irony.

Attempt 2: The text-to-SQL approach

Remember that magic wand that makes writing SQL queries as easy as writing an email? That's what text-to-SQL tools are. Nonetheless, they're not enough, just as I explained earlier.

Again, the problem is not the technology itself. In fact, LLMs are almost too effective at translating natural language into SQL. They will find a way to generate a query for any question you ask, even if that question doesn't make sense, which is exactly the problem.

On the other hand, a technical person would notice that the question doesn't make sense, and they would ask for more context. They would ask for details about the business person's hypothesis and the problem at hand. Then, they would explain what type of data is available, and work with the business person to formulate a precise and useful question.

Again, SQL is not the problem.

In any case, I do think LLMs could be the actual solution to self-serve BI, but not in its current form. For them to work, they'd need to be fed with more context, and they need to get better at expressing uncertainty and asking for more information.

What actually works

If we assume that the problem with self-serve BI is not SQL, but the context and semantics of the data, then it follows that the solution is to teach people about the data they're querying, regardless of interface.

The problem with this type of training is that it takes time, and most business stakeholders need answers this week, not next quarter.

Even in the best-case scenario, where companies train business stakeholders to understand the data they're querying, these people will need time to keep up with the changes in the database's schema, data models, and ETL processes.

Additionally, documenting all this knowledge generates significant overhead for the technical team, and it quickly gets out-of-date.

So what's the true solution to self-serve BI? The answer is simple: not to make BI self-serve for non-technical people. Instead, the solution is to make technical people support business stakeholders and help them do it more efficiently, using better tools.

The definition of a better tool may vary, but I have a few suggestions.

The first suggestion is to give LLMs to technical people, not business stakeholders. Even though text-to-SQL is not quite there in terms of understanding context and semantics, technical people already have context and semantics, so why not let them use it? That way, a single technical person can serve more business stakeholders, and do it faster.

The second suggestion is to give technical people more flexible tools. Instead of giving them dropdowns, or a SQL-only interface, let them play around with data using Python, R, or any other tool they're comfortable with. This way, they don't need to be passing data back and forth between different tools, and they're not limited by the capabilities of a BI tool.

The third and final suggestion is to make it easier for technical people to share their work. Notebooks and internal data applications are notoriously bad at that because they require others to deal with containers, dependencies, and infrastructure. It's a lot of overhead for a business stakeholder who just wants to see a chart. Instead, we need tools that have all the building blocks of a data application, but that are as easy to share as a Google Doc.

I'm obviously biased because I'm building a data tool for technical people, and it does exactly these three things. But what other choice do I have? I'd rather build tools that work than tools that don't. And, most importantly, I'd rather not pretend that the problem is something it's not.