(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39511676

根据对话,开发人员之间对 Apache 及其相关软件的看法似乎有所不同。 一些人认为 Apache 托管了几个至关重要的项目或为其各自领域提供了重大价值,而另一些人则认为该组织的产品随着时间的推移,相关性已经下降。 已经提到了 Apache 旗下的几个具体项目,例如 Kafka、Airflow 和 Metabase。 虽然并非每个 Apache 项目都在蓬勃发展,但一些人承认 Apache 作为一个伞式组织,为衰退的项目提供生命支持,直到适当的努力使它们复活。 最终,对 Apache 及其附属软件的看法很大程度上取决于个人的经验和优先事项。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Apache Superset (apache.org)
593 points by tosh 20 hours ago | hide | past | favorite | 168 comments










Had a very good experience with Superset.

Superset allowed us to replace Tableau and not looking back

Took me a while figure out how to embed it into my app using Superset Embedded SDK.

Superset Embedded SDK - "Embedded SDK allows you to embed dashboards from Superset into your own app, using your app's authentication. Embedding is done by inserting an iframe, containing a Superset page, into the host application."

https://github.com/apache/superset/tree/master/superset-embe...

Superset is based on very high quality and well maintained chart library eChart

https://echarts.apache.org/examples/en/#chart-type-linesG

Community Roadmap

https://github.com/apache/superset/projects?query=is%3Aopen

Huge respect to Preset.io and its team for contributing to the project and keep it in a great shape

https://preset.io/blog/

Superset source code is very easy to read and understand, and as a result it's possible to implement some advanced caching techniques reduce the load on charts.

No BI is perfect.

Watching Superset for years gives me confidence the project will work as supposed down the road, and eventually some of its packages can be reusable for all kind of visualizations and data hacking.

Our main approach to visualisation is to start with eChart and simple Reactjs wrapping and spin off Superset on subdomain for power users, and later see which one works better. Same look gives a very pleasant experience.



We use ECharts in our open source BI tool (Evidence) and it's a great library. Has helped us build a declarative syntax for viz which can be version controlled (https://evidence.dev)

Previous HN discussion: https://news.ycombinator.com/item?id=35645464 (97 comments)



Evidence looks cool, and I evaluated sometime back. The docs says the pages are all pre-rendered for all possible combinations. Is that the case still? If so, if I have a date filter, is it going to pre-render all possible dates?


Looks great!

Reminds me Obsidian DataView but with charts https://github.com/blacksmithgu/obsidian-dataview

This whole ideas to have data, visualisations and knowledge base in one private offline place is very appealing



We're fans of Obsidian! DataView looks cool - love the ability to define the tables in code inline in the markdown. That's similar to how we inline DuckDB WASM SQL queries in markdown: https://docs.evidence.dev/core-concepts/queries/


I love Obsidian.

The Markdown Markup typing experience is just so good compared to e.g. Slack, Reddit and other markdown-esque tools



How do you deal with data visibility and permissions? I mean, most tables have data that should only be seen by a specific user or group ID, and that layer is usually handled by the application. It would be awesome to expose the power of Superset for users, but I imagine creating the security layer would be a pain.


You can use row-level security, or specify RBAC with pretty much any SQL query.


I have this question too




eCharts is awesome. We moved from plotly after using it for several months to echarts at https://github.com/openobserve/openobserve and are super happy.


Had good results with echarts. With Superset not so much: complicated to install, lost all dashboards after an update, cryptic error messages, custom queries meh: we decided to use views in Postgres. The project with Superset was finished successfully, but the time spend is a multiple compared to using something like Power BI.

All in all, not very innovative, but highly needed open source version of a traditional BI tool. Definitely something to follow and to use in temporary, not too demanding use cases. And hopefully a future replacement of Tableau or Power BI.



I’d like to see these types of apps start offering SVG embedding of things like graphs. Frames are such a pain.


Bokeh is an option in the frontend-viz space that puts out pretty solid SVG for statically-rendered charts, while also having the option of more Tableau-like interactive functionality with input fields, dynamic filters, etc. Might be a decent option for you?

Their interactive "embedded-mode" avoids iframes too... but it's built with web components, so you wind up in shadow-DOM hell if you want to do anything dynamic on the view's contents.



That's probably not trivial, but it seems plausible. The beauty of open source is that you can help contribute this if you're fired up about it!


I have no experience with Superset. Can you elaborate on a few points where you see it excel beyond Tableau?


I dont want to start a rant against Tableau. It's a powerhouse. It's a great superior software. But when it comes to optimizing cost and comparing the total cost of ownership and opportunity to stop paying for Tableau server license we voted in favor of Superset and mix of Reactjs+Echarts widgets.

https://www.tableau.com/products/server

If you have money, dedicated team of data analytics who are already familiar with Tableau - no need to torture them with other tools.



Honestly it's so hard to compare Tableau and Superset. Tableau has every feature and bell / whistle imagine-able. But it's heavy, desktop oriented, and pricey.

Superset is lightweight and open source, but only has 5% of the features. So it really depends what you need!



> Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.

I tried Superset a few years back, and maybe it's changed since then, but intuitive is about the last thing I'd use to describe it. Things which I could figure out in a few minutes on any other BI tool literally took me hours of searching. It didn't help that they decided to rename core concepts at some point so half the online documentation made no sense anymore. Others at those companies who tried it at the time said similar things.



I also found Superset unintuitive to use and setup as well. I settled on standing up Metabase because it was so simple to get started with trying it since it can be launched as a single jar. The business users loved it and so did I and administration with a Postgres backend instead of the internal h2 database was a breeze.


Metabase is great. It truly is a BI tool. Superset is more of a visualization platform, which works great if you have engineers building reports. Less good if you expect more junior analysts to be super productive.


We ran into the same exact issue with Superset not being intuitive, just for a different audience that is more technical. Also went with Metabase which is good, easy to use, lacks some a few chart types but overall the past year has seen quite a few changes and bug fixes consistently happening.


My experience with Superset was the opposite. It's easy to install using containers. You can have it up and running and connected to ClickHouse in a few minutes. I also found the internal design pretty intuitive--the SQL query lab is much easier than Grafana's editor.

I like Grafana too, but there's basically no isolation between your query and the SQL database at least in the Altinity Grafana plugin for ClickHouse which is the main one I use.



I had the same experience. Featurewise Superset looked better, but after wasting a couple of hours trying to install it, I just gave up.

Instead I installed Metabase in 5 minutes tops: spin ec2 instance, whether and java -jar . I've never looked back.

The only thing that turns me off I'd that it's implemented in an obscure language. At one time I wanted to add some custom postprocessing to an api (given an sql query, get some python/pandas postproc command from a sql comment and execute it in the returned table), but the used language is just not for me (some lisp dialect)



Clojure is not particularly obscure


I just took a look at Metabase.

https://www.metabase.com/demo

Demo is nice.



Yep, we've really liked Metabase for embedding in our platform.


Let’s be honest, intuitive is the last word we’d use to describe most Apache projects.


They are doing pretty well that it is even clear what the project is really even about. Good luck figuring that out within 30 seconds of hitting the average Apache project homepage.


There seem to be dozens of Apache "Big Data" projects that all look kinda the same unless you are a Big Data person.


Even if you are a data person. The ASF doesn't mind overlap between projects [1], it spreads its bets and lets the market choose the winners.

1. https://www.apache.org/foundation/how-it-works/#incubator



Had a similar experience with Superset. A few others have mentioned Metabase and I agree it's better, but if you're looking for a different approach to data, check out Definite (https://www.definite.app/). It's a "data stack in a box". A few things we're doing differently:

1. Built-in data warehouse - We spin up a duckdb database for you to load data to

2. 500+ connectors - You don't need to buy a separate ETL and you can pull in all your data (e.g. Postgres, Stripe, HubSpot, Zendesk, etc.) automatically

3. Semantic layer - Define dimensions, measures, and joins in one place. We have pre-built models for all the sources we support (e.g. the Stripe model already has measures for MRR, churn, etc.)

4. Simple BI - Build a table with the data you want and generate visuals off that table

I'm [email protected] if you have any questions.



I've just been playing with superset. I'd have to agree. Things which are easy in SQL are... disturbingly hard or nonobvious in superset.

And the documentation is sparse at best.



It wasn’t fast either when I used it.

What it was though, was riddled with dozens of Python runtime errors and innumerable glitches.

Metabase is where it’s at.



It’s more intuitive than the open source alternatives but is not as intuitive as tableau and others.


Metabase is more intuitive. Also, being unintuitive isn't great but not the worst thing. A project not even realizing that (and thinking the exact opposite) is much much worse. Unintuitive can be fixed with PRs over time. Delusional project leadership cannot.


Are there better alternatives?


Full fledged BI tools like Superset and Metabase are amazing for their intended use cases.

But they may be an overkill if your primary use case is to infrequently build semi-interactive reports for non-technical end-users and your use cases are are mostly covered by standard graphs & tables. Esp. so if you are familiar with SQL and have access to the underlying data source. Two nifty utilities I have found to be very useful for latter kind of use cases are SQLPage and Evidence.

They make it very convenient to whip out some SQL and convert that to a neat professional looking web ui that can be forwarded to an end user. In case of Evidence it is a statically generated site, and in case of SQLPage it is a web app that connects to a live database.

SQLPage: https://sql.ophir.dev/

Evidence: https://evidence.dev



You can query Wikipedia's internal database by using its superset instance.

https://superset.wmcloud.org

https://phabricator.wikimedia.org/T169452

Back then, I used this to generate some custom statistics

https://github.com/altilunium/wikiidmon



I love Superset.

I've been running it in production since 2017, at two jobs, the current one a big corporation.

Best general-purpose, database-backed dashboarding system out there. I would never pay for Tableau or PowerBI.

Same for Airflow.



Same for Airflow? I’m not sure I understand what you mean.


They were both made by Airbnb and then open-sourced, which is the similarity I assume they meant


They were also more specifically authored by the same individual!


Maxime, the original author of Airflow/Superset, is also the CEO of Preset (where I work), so he/we are still working on Superset every day :)


Unfortunately information about new releases are not available on the superset website, but only at Preset.io: https://preset.io/blog/superset-3-0-release-notes/


Related. Others?

Open source Business intelligence platform made with Python - https://news.ycombinator.com/item?id=29368664 - Nov 2021 (49 comments)

Apache Superset 1.1 - https://news.ycombinator.com/item?id=27439939 - June 2021 (28 comments)

The Apache Software Foundation Announces Apache Superset as a Top-Level Project - https://news.ycombinator.com/item?id=25905277 - Jan 2021 (1 comment)

Apache Superset is an enterprise-ready business intelligence web application - https://news.ycombinator.com/item?id=21133931 - Oct 2019 (7 comments)



Can one run Python scripts in Apache Superset like on can do with PowerBI: https://pycaret.gitbook.io/docs/learn-pycaret/official-blog/...


Superset is powerful, but I wonder why they don't fix "papercuts", e.g., misaligned pixels on a spinner, or inability to copy a value from a table's cell, or non-monospace font for numbers in a table, etc. There are hundreds of small annoyances in the product.


We try! We also accept PRs and Issues if there are things bugging people, of course. It's always a balancing act between building some new feature that people are clamoring for, or fixing those cosmetic issues that always crop up.


We use metabase heavily at work. However where it seems like all these tools fall down is organization around the hundreds of dashboards and questions. I wish it had like a built wiki or something to build out more navigation. Anyone know of any good ways to do that?


100% agree.

One thing that helps is hooking metabase up to its own database and building queries on your queries, e.g.:

    select *
    from report_card 
    where dataset_query ilike '%' || {{query}} || '%'
(You can also join in metadata like the author, when it was last ran, etc.)

We also try really hard to keep the Collection directory structure clean and consistent. But it's still really hard.



Mhmm this gives me an idea.. what if I could "group" metabase sql queries by "similarity" (either of results or of the query itself)

Another option could be to use LLM to summarize, tag and group queries for better discoverability.



Has anyone tried both this and Metabase? I've used Metabase in a few projects and I find it very nice. This seems more powerful, perhaps?

Is it worth it for BI on small datasets?



Yes, I am at a company using Metabase, but I have a decent amount of experience with Superset (albeit from many years ago).

The reason we chose Metabase was that it had table joins, while Superset doesn't (unless it has added them since I used it). It also looks a bit sleeker. But I strongly prefer Superset; I found that with Metabase I had to turn a lot of things off to make it usable (Let me see "the_table" not "The Table"!), I was constantly annoyed at the opacity around models vs "questions", etc. and every time I wanted to change a question Metabase insisted on creating a new one instead. The real issue here was when we wanted to swap out the data source for a lot of questions but there was no clean way to do so without MB just creating new questions.

Also, Metabase doesn't have serialization unless you pay them AND you self-host, (if I'm self hosting then what exactly am I paying for?) and that's pretty annoying. https://www.metabase.com/docs/latest/installation-and-operat....

But it does let you join tables. Sometimes that's enough to make MB worth dealing with.



Superset lets you join tables within the same database. If you want to do cross-DB joins, we have a new (beta) in-memory meta-DB that lets you do this, but we generally see and recommend people using things like Trino for this.


Nice! When was that added?


Is that new? Last time I checked this was the major downside from superset


The "model" vs "question" thing is really annoying as there's no real difference from the user's perspective, and it's easy to accidentally convert a model back to a question without noticing when you publish something. You notice when you try to drill into the chart. There's a lot of annoying manual labor in metabase, e.g. I want to filter something into 10 different charts and I need to duplicate it 10 times and change a filter on each one. Still yeah joins are nice. A non-bugged aggregate count/sum as a window function would be nicer.


Thanks! Very detailed answer.

I've found the weird "make it easy" mindset a bit annoying with Metabase too. The whole questions, nice table names...

I'll give Superset a try in my next project I think.



>I was constantly annoyed at the opacity around models vs "questions"

Yeah, somewhere along the line Metabase decided to get opinionated on "self-serve". I imagine it works well for some teams and companies, but for the tech-oriented, it's annoying.

I prefer my BI tools to be platforms that make for easy charting and cross-filters, while I build and control the models behind the scenes with a tool like dbt.



Metabase is a bit more user-friendly to be honest than Superset. Superset has a WAY more liberal license, so it's ideal for people who want to customize Superset and build data apps.


Metabase is great, I use it with a Oracle Database.


Is Superset a decent tool if you're just a single person doing data analysis? Say I have a handful of sqlite databases, and just want to be able to develop some queries / charts. I was looking into Tableau / Power BI / Superset, and all of them seemed pretty heavyweight for a single user, and none of them seemed super easy to get setup locally.

Any recommendations for a good piece of software for the single user case? Or a more convenient way to run the heavyweight tools?



If you are doing data analysis I don't think any of the 3 pieces of software you mentioned are going to be that helpful.

I see these products as tools for data visualization and reporting i.e. presenting prepared datasets to users in a visually appealing way. They aren't as well suited for serious analytics.

I can't comment on Superset or Tableau but I am familiar with Power BI (it has been rolled out across my org), the type of statistics you can do with it are fairly rudimentary. If you need to do any thing beyond summarizing (counts, averages, min, max etc). It is not particularly easy.

For data analysis I use SAS or R. This software allows you do things like multivariate regression, timeseries forecasting, PCA, Cluster analysis etc. There is also plotting capability.

Both these products are kind of old school, I've been using them since early 2000's, the "new school" seems to be Python. Pretty much all the recent data science people in my organization use Python. Particularly Pandas and libraries like Seaborn (https://seaborn.pydata.org/).

The "power" users of Power BI in my organization tend to be finance/HR people for use cases like drill down into cost figures or Interactively presenting KPI's and other headline figures to management things like that.



Tableau is the best, most powerful, most mature of the three, most feature complete and easiest of the three. I think they give you a 30 day trial.

This is a single user application, unless you make it part of your built application.



> This is a single user application

K8s installation instructions: https://superset.apache.org/docs/installation/running-on-kub...

RBAC configuration: https://superset.apache.org/docs/security/#rest-api-for-user...



Superset isn't a single user application?


Ah, sure


I found Superset difficult to use when I explored it a couple of years back[1], not sure whether this is the same case now.

[1] https://blog.adnansiddiqi.me/create-your-first-sales-dashboa...



Does it have horizontal bar charts nowadays?


Used Superset back in 2016 and 2020; both time chose Metabase for our clients' BI dashboard and Superset for our internal dashboard. Superset is nice, easy to modify and extend but not user friendly as Redash or Metabase. But after the author launched Preset, it seems to have improved much with the company effort. It looks like to me the best way for OSS to advance is to have a company dedicated to improve it.


One thing to keep in mind with BI software is that the users are often very different than, well, those individuals that prefer to use mutt as an email client.

Many, or most, users for a BI tool will be operations, product managers, and business management who simply will not find the interface to be intuitive, responsive, or well designed. At least that's my experience.



We've built a Kubernetes Operator for Apache Superset at Stackable: https://github.com/stackabletech/superset-operator/

It's part of our Open Source Data Platform and it's one of the few open source BI tools out there and there are not a lot of alternatives in this space. We generally like it.



Can vouch for Superset. I use it in a couple of my companies and love it.


Tried installing it, locally in a Python Virtual Env.

Apparently installation will not work with Python 3.12, dur to deprecation of distutils.

Does anyone have any method to install this?



Maybe try the Docker installation to keep the dependencies off your system:

https://superset.apache.org/docs/installation/installing-sup...



I wish more projects had guided tour videos that demonstrated the power of the tool in the hands of an expert user. Not "get started" but "why should I care".

Wes McKinney used to have an excellent 5 minute introduction to pandas in this genre.



You can check this out. This is a Preset Demo, but shows quite a bit of Superset within Preset (which offers multiple instances of Superset as "Workspaces") https://www.youtube.com/watch?v=V0HwGnC1rU8


This might be what you're looking for: https://www.youtube.com/watch?v=kGfUIOK87V8


I saw that video on the website. It isn't narrated or captioned as to what the users is trying to accomplish


We use this at my ginormous employer in order to give devs limited access to production data.


Maybe you can't say who, but I'm sure curious. Add yourself to this page if you can: https://github.com/apache/superset/blob/master/RESOURCES/INT...


Does anybody know why Superset started trending today? Is there a major release?


There is a major release on the horizon (4.0) and there were just a couple of patch releases for the 3.x variants. I'm surprised to see it trending too, but I'm happy about it. More people need to know that Open Source BI is here, and here to win.


Is there more than this single HN submission?


For my last employer, I set up Superset for a number of our clients to show all sorts of heavily customized marketing analytics dashboards, web performance graphs, project management burndown reports, you name it. As with another commenter's experience, we also got a client to replace Tableau with it, and not look back. Such a great product.


Here is a fantastic video made by Soumil Shah, using MinIO+Hudi+StarRocks+Superset. It is amazing to have an interactive query experience on a data lake directly! https://www.youtube.com/watch?v=JkKBzrQTKx0


Thanks for sharing, it's so exciting to see so many OSS BI frameworks


I recently discovered Apache Superset. I would love to use it in our product. Does anybody know if it possible to integrate it into an existing product? I am mostly curious about hooking up its authentication system to our own authentication system, which is based on auth in ASP.NET Core 8.


>Took me a while figure out how to embed it into my app using Superset Embedded SDK.

Superset Embedded SDK - "Embedded SDK allows you to embed dashboards from Superset into your own app, using your app's authentication. Embedding is done by inserting an iframe, containing a Superset page, into the host application."



Neat. I have to admit I about had a heart attack reading "Superset" as "Sunset" at first. I've become too jaded about stuff being shut down and announced on HN. Very pleasantly surprised when I read correctly and clicked through to see its about data analytics.


Bummer that it can't pull data from JSON APIs, which Redash can do.


It should be possible (have not tried myself):

https://preset.io/blog/accessing-apis-with-superset/

"Shillelagh (ʃɪˈleɪlɪ) is a Python library and CLI that allows you to query many resources (APIs, files, in memory objects) using SQL. It's both user and developer friendly, making it trivial to access resources and easy to add support for new ones"

https://github.com/betodealmeida/shillelagh



love superset, but one thing that I would love to see is to make it easier for dashboards/charts to use a dynamic table that the user can select.

we have multiple tenants + developer instances of our warehouse. to reuse the same dashboard in this setup we need to create at least 3 virtual datasets, plus wrangle a bunch of boiler plate jinja.



Wow, those Apache guys have so many projects. Of course, they've been at it for years, starting with the Apache web server, then Tomcat, etc., and also, many projects were first developed outside and then handed over to them, for whatever reasons.


And sometimes projects are handed to them to die. The way they (mis)handle OpenOffice is unforgivable.


Interesting, did not know.

In what way, any details?

Not been tracking that or using OpenOffice for a while.



How would you compare Superset with PowerBI for analytics and CSS integration? Trying to develop features and advanced analytics capabilities into an app?


You can style dashboards with CSS as much as you'd like, though there are some limitations (canvas/webGL elements). I wrote a whole blog post on it: https://preset.io/blog/customizing-superset-dashboards-with-...

If you want to style the whole application, you can fork the repo and go bananas. If you're looking for theming, there's more to be done yet on that front, and I wrote an article on that too: https://preset.io/blog/theming-superset-progress-update/



It's been a few years since I evaluated superset. Did they ever resolve drilldown (filter for one chart on a page, populate to all charts)?


Yep... there's Drill By, which is more flexible than drill-down. Rather than having to specify a strict hierarchy of drilling "levels" you can pick columns, hierarchical or otherwise, to drill into.


This looks like grafana, right? Why would I use this instead of grafana?


They're both washboarding apps, and while I'm sure they each have panel types the other doesn't yet support, I don't think that's intrinsic. The differentiation as I see it, is that Superset is designed to craft SQL queries and visualize the results. The query builder is probably where this shows the most.

To make it more concrete -- coworkers tell me Grafana doesn't work so well with Apache Druid, while Superset supports it quite well.



*dashboarding, yikes


I thought this was some jargon I didn't know haha.


I love Grafana but Grafana doesn't really support non-time-series visualization that well.


Why is that, though? I'd think that there'd be some plugins/extensions for Grafana that could do this. Grafana could then become the next PowerBI/Tableau/Superset killer eventually.


grafana is built more for operational and timeseries data, but not so optimal for complex analytical queries. Ex: up-to-second data on cpu load on a host.

superset is the flip side of grafana; not good for up-to-second updates, but good for complex queries. Also, non-time series stuff. Ex: Which customer groups bought which products for all time?



You can’t trivially plug grafana in front of any SQL database, and grafana is more about graphing/plotting (usually time series).


You can actually plug grafana in front of any SQL database, but I'm not sure it's a good idea.


Much more focused on interactive slicing and dicing of data, rather than mostly following a few pre-defined time-series, as is the focus of Grafana.

As such, closer to an open source replacement for PowerBI.



The fundamental difference is that Grafana isn't great at cross referencing data in different data sources. (I love Grafana and I pay for the Cloud version.)


I found that running TrinoDB in a docker container and adding the trino plugin to grafana was very straightforward. TrinoDB feels magical sometimes, except that the SQL syntax they use seemed awkward IIRC. Also, there are inexplicable performance problems with certain queries that require trying subtlety different SQL queries until it snaps out of it.


How does it compare to Kibana + Elasticsearch?


A big thing here is that Superset and most of the other BI tools can connect directly to databases which is commonly the source of truth or data warehouse in some businesses. Secondly, Elastic have focused on other operational areas such as security, observability, and indexing / search. Kibana can do some dashboarding on those areas and its UI is nice, but Superset and similar tooling are more suited for BI purposes.


Anyone that worked it and could compare with Redash?


Well Redash got acquired so development stopped, biggest difference between Superset & Redash. Preset.io supports Superset still


Redash development slowed down for sure, but it's not looking abandoned. It's just that I've been using it for some time now, I'm wondering if is anything feature-wise that could justify the switch.


Surprised no one has mentioned hex yet. There was a post on the yc internal forum today about data stacks and a lot of founders mentioned they liked hex. I hadn't heard too much about them before but they looked interesting for someone (me) who typically prefers something closer to a jupyter notebook and simple stacks.


Superset is absolutely phenomenal. I really hope Microsoft eventually releases all of their customizations they made to it internally to the OS community someday.

https://www.youtube.com/watch?v=RY0SSvSUkMA

https://github.com/apache/superset/discussions/20094



Is this capable of performing efficient JOINs across non-homogeneous data-stores?


We use https://cube.dev/ as intermediate layer between data warehouse database and Superset (and other "terminal" apps for BI like report generators). You define your schema (metrics, dimensions, joins, calculated metrics etc) in cube and then access them by any tool that can connect to SQL db


Superset would be on my shortlist if I had to use something else, but the join limitations were part of why I passed.


Should it? If you really need that, join the different sources with TrinoDB (or any related managed service like AWS Athena) and connect it to Superset.


It’s common for business questions to only be answerable with a join over a few different stores.

I think Athena can only query data on S3?



This looks really good! How does it compare to Tableau?


Well, it's free! Or significantly cheaper even if you opt to use Preset to run a hosted/managed/compliant version of it, and not have to deal with config/security/upgrades/migrations. This article is a year old, but it might help a bit: https://preset.io/blog/apache-superset-vs-tableau/


I remember using Superset in 2017 or so, was forced to by a manager that would not pay for off the shelf software. I also did a few open source contributions to fix some bugs, it was a disaster. A huge rats nest of python. Might have changed in the last few years, am surprised its still active


It's definitely come a long way since 2017! It's improved markedly in terms of functionality and performance. It looks much prettier now as well.


anyone knows how does it compare to Looker?


No built-in thick semantic layer, compared to Looker.

I wrote about Superset's semantic layer here: https://preset.io/blog/understanding-superset-semantic-layer...

One popular option is to use dbt or Cube for the semantic layer and pair with Superset: https://preset.io/blog/announcing-presets-ui-integration-wit... and https://preset.io/blog/open-source-looker-cube-superset/



The lack of a semantic layer and join limitations are what made me pass on superset, but that was a couple years ago so maybe those features have been added.

I built my own semantic layer instead. I use this in production in my company but obviously use at your own risk as it's a one-man show.

https://github.com/totalhack/zillion



How does this compare to Jupyter notebooks and the ecosystem around that? Do the use cases overlap, or are they completely different things?


In my experience, people with a business related background have an easier time learning how to use BI tools (this is true even if Superset may be less user-friendly than other commercial product like Tableau); Jupyter is an interactive computing platform that is based on notebooks and cells, that's more useful for data scientists/engineers whose needs might exceed the capabilities of a SQL interface.


Generally what you get when VentureCapital/PrivateEquity buys out Redash.io, messes up end users in the process and spits it out a few years later, leaving users confused as to where it stands in the BI tools landscape.


So it's irritating to me that this is ranking #1 on HN (why is it, btw?) I just pulled the trigger on a large data gathering project using Metabase, and feel a bit hampered by the limitations in terms of charts and plugins... but I considered Superset first, and after a lot of thought I decided that almost everything I've ever worked with that was run by the Apache foundation turned out to be semi-abandoned disasterware over time. In fact I wasn't even sure if Superset was still an active project or if it just looked like one, in the way e.g. no one bothered to pull the OpenOffice website offline.

So now that I picked Metabase, Superset is topping HN for no apparent reason. Why?



Because we (the FBI Surveillance Van) saw that you picked Metabase, called our shady French-accented overlord, and he told us to dump it.


I thought your outrageous French accent just meant you're going to taunt him a second time.


He's working on that accent. https://www.youtube.com/watch?v=Z6oeAdemFZw


I knew it!!


I have the opposite experience. Lots of good stuff is hosted by Apache Foundation, such as Kafka, Maven, Cassandra, Camel, the Tika project, Superset, Solr, but I will admit they had more relevance 10 years ago. And I dont think there are many organizations that keep open source projects alive longer than the Apache projects.


I think that there is an active company behind Superset called Preset.

https://preset.io/

I don't think it's semi-abandoned. I had a brief interaction with the project in my previous job, and I found the community and the company to be reasonably engaged and responsive.



"semi-abandoned disasterware"

Hmm. I suppose all open source looks that way if it doesn't get regular funding/attention.

Apache does house a lot of abandonware. They had some relevance as recently as 6-7 years ago but they've been largely replaced by nginx I think. That being said, I view them like the local soup-kitchen - important to have and maintain, but not where I want to go for a 5-star meal.



Any time I hear "Apache Foundation" my stomach turns as I hesitate to ask my next question. "What we are trying to use from them is built on Java right"


That would be anything hosted by the Eclipse Foundation. Either Java-based or abandonware or sometimes both.


Apache hosts many, many projects, some good, some bad, some abandoned, some fucking great.


The Apache foundation is way larger than just the server


Yes, I agree. However a lot of their forward facing projects seem to be effective abandon-ware (few people interested in contributing, competing more popular solutions based on forks, or just no longer relevant).

These projects don't give the apache foundation an appearance of importance or relevance, rather they make it look rather rundown.



That's how open-source abandonware is supposed to work though: the idea is that whenever a (for-profit) company produces something that it can't afford to run anymore but also can't afford to shut-down and damage their customer relationships, then they'll open-source the project and give it to an open-source foundation for stewardship and repo hosting. Yes, it's where software goes-to-die-a-long-death, but it also gives some people hope, and the possibility of giving it a new life in future. Currently, the Apache Foundation is the go-to place for that, and it benefits everyone considering the alternatives are worse.

Obivously the main "alternative" is for the original company to simply shut down the product/service, which can do irreperable harm to a company when they have high-profile customers who are utterly dependent on a service.

Another alternative is to use an open-source foundation that's directly managed by the original company, which is what Microsoft did with its DotNet Foundation ( https://dotnetfoundation.org/ ) - and while Microsoft's legal team ensures the foundation is "legally" independent, in practice we know all the significant shots are being called from within Microsoft-proper; but it does give us some modest reassurances that .NET won't suddenly return to being closed-source overnight.

Another alternative is to not open-source it and to instead sell it off to another company that can maintain it while still being profitable - this is what Adobe did with Flash: they sold it all off to Samsung because their Harman division wanted to continue using Flash for embedded/automotive UX work. This approach can work, but doesn't benefit the wider ecosystem the way that open-sourcing does - and something something shareholder value and return-on-investment by selling rather than writing-it-off...

What companies won't do is let any of their engs that are passionate about a project split-off from the company to run and maintain it, le sigh.



I would consider Airflow, Spark and Flink to be their forward facing projects, and they are all very actively developed.


The Apache Foundation also takes on projects that are literally abandoned. It acts as an umbrella that takes over hosting a project for commercial actors that can no longer develop it, but want to at least give existing users a open source (Apache License) version of the software to continue with/depend on.


Apache Airflow, Kafka, Spark, ECharts, and many others are still going strong! It really depends on the project to be honest.


Yeah, I'm in a similar thought process. I've been burned multiple times by Apache, will not touch ever again.


> topping HN for no apparent reason

I think the HN algo is pretty easily manipulated. I worked at a startup that had an effective process to get things to the front page



> I think the HN algo is pretty easily manipulated. I worked at a startup that had an effective process to get things to the front page

That sounds (potentially) sleazy. If you think it's a technique that HN could potentially defend against, I encourage you to explain it to [email protected].



> That sounds (potentially) sleazy.

Pretty sure it's as simple as posting in your general slack channel "@here we posted a new article to HN, go upvote and write a comment"



Maybe it's a YC startup.


AFAIK YC startups don't get any more boost on the front page than normal posts.


I used Metabase at my last gig (CTO @ e-commerce, 30+ users) and it was well-received and dare I say even a bit adored. It was the only self-hosted tool I'd receive after-hours text messages about going down that someone urgently needed back up for some task due tomorrow.

Business users loved the self-serve query builder, and it wasn't uncommon to walk around the office and see Metabase up on someones screen. My CEO absolutely loved it, and used it daily including to put together data for board decks.

None of my users cared about visualizations, and lived in tabular data. This included finance, marketing, merchandising, operations, and executives (CEO/COO/CFO). The only people that lamented the limited visualization were analysts. Power users did all their day-to-day work in Excel or other tools anyway, such as managing marketing spend or inventory allocations.

Metabase was great for dashboards and self-service (ad-hoc). 10/10 would deploy again.



> almost everything I've ever worked with that was run by the Apache foundation turned out to be semi-abandoned disasterware over time

Can you name a few examples?



Ivy, Netbeans, Open Office, Shiro, Solr all jump at me off this list:

https://projects.apache.org/projects.html?name

These are all projects that once were (more) relevant, however seem to have become rather niche (Gradle, Jetbrains/VSCode, GoogleDocs/Libreoffice e.g. for the first three are the dominant competitors).

Most of these projects (like the massive commons listings) are either used by some Java library somewhere (meaning their success/relevance is tied to the usage of Java), or are obscure enough that they are no longer used widely and so suffer from lack of interest.

There are gems in this list, to be sure, but if you just run into half-maintained projects all the time you're not likely to associate good things with the Apache name?



Here's the list: https://projects.apache.org/projects.html?name

OpenOffice is probably the most famous (it still has the name, but it is dead, LibreOffice is the real "active" fork).

And the things in the "Attic" are officially dead - https://projects.apache.org/committee.html?attic and many more projects should be there.



I think it's a great feature to have explicit lifecycle for open source projects.

Lots of other projects just die silently and/or you are unsure of the status.

Here you at least have a chance to revive them if you like as there is always an overarching organisation.



The problem really is that some Apache projects are actually alive (Apache itself, apparently Superset, Groovy, etc) and some appear alive at first glance.

More things should move into the Attic, like OpenOffice.



Well, OpenOffice as I said. Cordova is/was a hot mess (with some nice pioneering features, just really not well maintained imo and felt like quicksand to build even a small app on) Then the sort of long slow death of Flex (now Royale?) Apache seems like where software no one loves anymore goes to die.


I suppose it depends on projects you're using. For many developers their primary exposure to the Apache Foundation is through projects like Maven and Kafka, and those certainly don't feel dead.


Apache Software Foundation is just an umbrella organization to keep things on life support till someone can apply sufficient motive force to resurrect. I think that's really valuable. Lots of projects there have had that effort applied to them and kept going.


> Everything I've ever worked with that was run by the Apache foundation turned out to be semi-abandoned disasterware over time.

Amen brother.







Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com