Matadisco – 去中心化数据发现
Matadisco – Decentralized Data Discovery

原始链接: https://matadisco.org/

## Matadisco:开放和去中心化的数据发现 Matadisco是一个新的开源网络,构建于AT协议之上,旨在解决开放数据分散在众多孤立存储库中的发现问题。它将数据*发现*与数据*存储*分离,允许任何人发布元数据“指针”指向数据集——无论其格式如何(STAC、DataCite等),通过轻量级记录实现。 **工作原理:** “生产者”将这些记录写入网络,而“消费者”读取它们以构建自定义数据门户。这种去中心化方法避免了集中控制,并允许社区独立管理目录,同时参与更广泛的发现。 主要特性包括:用于记录完整性的密码签名、元数据标准方面的灵活性,以及仅需要资源链接和发布日期的简单模式。Matadisco旨在使有价值的数据集——例如卫星图像、气候模型和基因组序列——易于查找和访问,从而促进开放数据领域的协作和创新。目前它仍处于实验阶段,欢迎社区贡献。

对不起。
相关文章

原文

An open, decentralized network for data discovery. Publish metadata about any dataset to AT Protocol. Build community portals. Find what matters.


Open data is only as useful as it is discoverable

Petabytes of satellite imagery, climate models, and genomic sequences sit in public repositories — yet finding the right data means navigating dozens of siloed portals, each with different interfaces, APIs, and blind spots.

If you generate a derived dataset or clean up an existing one, there's often no way to make it findable. Government portals decide what gets published. Aggregators are centralized. Community contributions get lost.


How Matadisco works

Matadisco separates data discovery from data storage. Three pieces work together:

AT Protocol

Matadisco is built on AT Protocol, an open social protocol. Every record is cryptographically signed. No single entity controls the network and all components are open source and can be self-hosted.

Producers

Write Matadisco records to a PDS (Personal Data Server). A record is a lightweight pointer to metadata — a link, an optional preview, and a timestamp — so the schema works with any metadata standard: STAC, DataCite, IIIF, RSS, and more. A producer typically watches an existing catalogue or data source and publishes records automatically.

Consumers

Read records from the network via a PDS or Jetstream, filter for what's relevant, and present them as a web-based portal for users. A satellite imagery portal, a scientific data hub, a cultural heritage archive — each built in about 100 lines of code.


The schema

The Matadisco record is defined as an ATProto Lexicon. In MLF syntax:

cx.vmx.matadisco


record matadisco {
    
    publishedAt!: Datetime,
    
    resource!: Uri,
    
    preview: {
        
        mimeType!: string,
        
        url: Uri,
    },
}

Only resource and publishdAt are required. The preview is optional — for satellite imagery it's a thumbnail, for articles a summary, for podcasts an audio snippet.

Browse records · View published lexicon


See it in action

The matadisco-viewer streams new ATProto records in real time and renders them. Currently showing Copernicus Sentinel-2 satellite imagery:

Sentinel-2 satellite image preview from the Matadisco viewer
Sentinel-2 L2A scene · metadata · full resolution (253 MiB)

Producers & Consumers

Producers write records into the network; consumers read and display them. The prototype demonstrates both roles:

Because records flow through an open network, institutions manage their catalogues independently while participating in shared discovery.


Prior art & influences


Get started

Matadisco is experimental — things may break or change. That also means there's room to shape it. Here's how to get involved:


What's next

  • Image-based sources like GLAM collections using IIIF
  • Non-image sources — podcasts, research datasets, publications
  • Schema evolution informed by real-world use across different domains

Publish records under your own namespace, build a portal for your community, or propose changes to the schema. We'd love to hear from anyone working in open data, metadata standards, or scientific infrastructure.

联系我们 contact @ memedata.com