新的OSM文件格式:比PBF小30%,导入速度快5倍
New OSM file format: 30% smaller than PBF, 5x faster to import

原始链接: https://community.openstreetmap.org/t/new-osm-file-format-30-smaller-than-pbf-5x-faster-to-import/137151

## GOB:一种用于处理大型OSM数据集的更快格式 OpenStreetMap (OSM) 数据不断增长,管理变得越来越困难。为了解决这个问题,GeoDesk 工具包推出了 **GOB(Geo-Object Bundle,地理对象包)**,这是一种专为更快、更轻松地处理 OSM 数据而设计的新文件格式。GOB 本质上是 Geo-Object Library (GOL,地理对象库) 的压缩版本,去除了索引以提高效率。 GOB 文件平均而言,**体积是 GOL 的一半,比 PBF 小 30%**,并且 **导入速度快 5 倍**——在现代硬件上,一个行星级别的 GOB 可以在短短 3 分钟内加载到 GOL 中。在 RAM 有限的系统上,这种优势尤其明显。 GOB 文件采用 **分块(tiled)** 方式,方便提取区域子集,使其非常适合存档和分发。该格式由 **GOL Tool 2.1** 支持,并带有新的命令,用于将 GOL 保存为 GOB 以及将 GOB 加载到 GOL 中。 **重要提示:**GOB 是快照,**不存储数据历史或元数据**,因此不适合编辑,但非常适合存档和分发。该格式仍在开发中,正在进行实验以改进压缩并实现从 URL 的直接下载/导入。

## New OpenStreetMap File Format Summary A new OpenStreetMap (OSM) file format, dubbed GOB, is gaining attention for its potential performance improvements. Initial reports indicate it’s **30% smaller and 5x faster to import** than the current PBF format. Discussion centers around the lack of a formal specification currently, but details are emerging. Users highlight the importance of efficient spatial data formats – contrasting the sluggish performance of KMZ files in QGIS with the speed of formats like FlatGeoBuf. The conversation also branches into related GIS challenges, including **meshing LiDAR point clouds** (with suggestions like 3dbag.nl and Meshroom) and the benefits of using Postgres for large datasets. A key question is whether GOB will gain widespread adoption, hinging on support from essential libraries like **libosmium and GDAL**. The new format also aims to address inefficiencies in the current OSM data model regarding coordinate resolution.
相关文章

原文

The OSM dataset is huge, and keeps growing every day. Great news, of course, but sometimes the sheer volume can be overwhelming – there are just gobs and gobs of data!

Hence, we created GOB (“Geo-Object Bundle”), a new file format that makes tackling OSM data faster and easier. It’s a companion format to our now-familiar Geo-Object Library (essentially, a tightly-compressed GOL with its indexes stripped).

To support this new format, GOL Tool 2.1 has two new commands: save GOLs as GOBs and load GOBs into a GOL (Of course, like all of the GeoDesk Toolkit, the GOL Tool is free & open-source).

  • GOB files are on average half the size of a GOL, and 30% smaller than PBFs.

  • Importing a GOB is 5 times faster than building a GOL from a PBF. A modern system loads a planet-size GOB into a GOL in 3 minutes. The speed advantage grows more pronounced on memory-constrained machines: gol build starts paging heavily with less than 32 GB of RAM, whereas gol load requires minimal resources (even a decade-old laptop loads the whole planet in under an hour).

  • GOBs are organized into tiles, so it’s easy to extract regional subsets (basically at file-copy speed) and stitch them back together; that makes GOB a convenient format for archiving and distributing geodata.

The image above shows some of the tiling structure, which mimics that of tile renderers. On the left, the smallest squares are zoom 6, the right shows the most granular level (zoom 12). A typical planet GOB has about 60,000 tiles.

Below are some size statistics for the planet file and popular regional extracts (without metadata):

                PBF      GOL               GOB
Planet      65.4 GB  93.6 GB  +43.1%   46.0 GB  -29.7%
California  1.18 GB  1.59 GB  +35.0%    770 MB  -36.5%
France      4.54 GB  5.89 GB  +29.7%   2.84 GB  -36.3%
Germany     4.29 GB  5.92 GB  +38.0%   2.67 GB  -37.5%
Italy       1.96 GB  2.63 GB  +34.0%   1.34 GB  -31.6%
Japan       2.13 GB  2.91 GB  +36.1%   1.34 GB  -37.0%
Poland      1.84 GB  2.72 GB  +47.6%   1.29 GB  -29.7%
Switzerland  487 MB   634 MB  +30.1%    311 MB  -36.2%

Dense, well-mapped areas tend to compress best as GOB. Less complete regions are below average in terms of GOB’s size advantage (GOBs for Brazil and China are only 23% smaller).

Just like GOLs, GOBs don’t store:

  • metadata (timestamp of last edit, changeset, username, etc.)

  • history (each GOB is a snapshot of the OSM dataset)

Therefore, it is not intended for editing, but for archival and distribution.

You will need GOL Tool 2.1 or above (download).

To export a GOL as a GOB:

gol save <gol-file> [<gob-file>]

If <gob-file> is omitted, it uses the same base name as the GOL. The .gol and .gob extensions are optional.

To limit the export to a specific area, use the --area (-a) option. You can specify a (multi)polygon as WKT, GeoJSON or simple coordinates (lon,lat pairs, rings are closed automatically), either directly or as a file. If no file extension is given, .wkt is assumed.

For example:

gol save world bodensee -a 9.55,47.4,8.78,47.66,9.01,47.88,9.85,47.58,9.82,47.46 

exports the tiles covering the region around the Bodensee (Lake Constance).

To import tiles into a GOL:

gol load <gol-file> [<gob-file>]

As with save, if <gob-file> is omitted, the base name of the GOL is used. If the GOL does not exist, it is created. To load just a specific region, restrict it with the -a option.

gol load japan -a shikoku

loads tiles from japan.gob into japan.gol (creating it if it doesn’t yet exist), but only those intersecting the area defined in shikoku.wkt.

This is still a work in progress, so the format may change. I’m experimenting with different compression algos beyond zlib to make it even tighter and faster (zstd didn’t yield any significant gains). I’m also in the process of enabling gol load to download a GOB directly from a URL and build the GOL in the background, which would bring the wall-clock import time to zero.

As always, questions/feedback are welcome! Please stop on by on Github and @[email protected].

联系我们 contact @ memedata.com