无需GPS的环境指纹近场对等发现

无需GPS的环境指纹近场对等发现
Nearby peer discovery without GPS using environmental fingerprints

原始链接: https://www.svendewaerhert.com/blog/nearby-peer-discovery/

## Shimmer：保护隐私的近距离设备发现 Shimmer是一种新的发现附近设备的方法，*无需*暴露精确位置或直接通信。它通过密码学方式比较观察到的环境——例如WiFi网络、蓝牙信标，甚至共同兴趣——使用局部敏感哈希（LSH）。设备不会广播所见的网络，而是创建指示共享环境的“指纹”。该过程包括从观察到的数据创建MinHash签名，然后使用LSH将相似的签名分组到“桶”中。匹配的桶表明设备彼此靠近。数据在向“汇聚”服务器广播之前进行加密，确保服务器无法学习特定的网络细节，而只知道哪些设备共享环境。虽然Shimmer提供了一种注重隐私的地理位置替代方案，但它也存在挑战。汇聚服务器*可以*学习IP地址的近似位置，并且如果被观察和复制，环境可能会被伪造。潜在的解决方案包括轮换标识符（例如Google的Eddystone-EID）和去中心化汇聚选项。它使用libp2p实现，并提供私有集合交集和可配置过期草图等功能。潜在的应用场景包括基于位置的游戏、会议社交和物联网配置。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录无需GPS即可使用环境指纹进行附近对等发现 (svendewaerhert.com) waerhert 发表于 2小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 figmert 发表于 2分钟前 [–] 我注意到许多城市的地铁系统似乎都能接收到GPS和蜂窝网络，这让我印象深刻。另一方面，伦敦最近才开始接收蜂窝网络信号，而且推广速度很慢。GPS根本无法工作。我一直认为，利用安装在地下铁中的WiFi接入点，可以为像Citymapper这样的应用增加一个功能，即使没有GPS也能确定你的位置。回复指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系搜索：

原文

Your phone can see dozens of WiFi networks right now. So can mine. If we're in the same area, we're probably seeing many of the same networks. Can we use that overlap to discover each other without either of us revealing which networks we actually see? Without directly communicating in that particular environment?

This is the core idea behind Shimmer: devices discover nearby peers by comparing their environments cryptographically, without disclosing the raw details. Instead of broadcasting "I see NetworkA, NetworkB, NetworkC," devices create fingerprints using locality-sensitive hashing. Similar environments produce matching fingerprints; different ones don't.

Crucially, we never reveal our location: only that we're observing the same one.

The technique isn't limited to WiFi. It works for any shared observations: Bluetooth beacons, common interests, cell towers, anything categorical.

And while there are real security considerations (which I'll discuss later), the approach offers an interesting alternative to geolocation based proximity detection. Below, I'll walk through how it works with an interactive demo, then cover the implementation details and trade-offs.

You can find a full implementation on GitHub

How it works

Step 1: MinHash - Creating Similarity Fingerprints

MinHash is a technique that creates a compact "fingerprint" of a set. Similar sets produce similar fingerprints, which is exactly what we need for proximity detection. (source)

Select at least 3 WiFi networks that your device might observe:

💡 Note: In real applications, you'd combine WiFi SSID, BSSID (MAC address), and bucketed signal strengths for more robust fingerprinting. This demo uses only SSIDs for simplicity.

Select at least 3 networks to calculate MinHash

Step 2: LSH - Creating Collision Buckets

Locality-Sensitive Hashing (LSH) divides the MinHash signature into bands. Each band is hashed to create a "bucket" where similar signatures are likely to collide, meaning they'll produce the same hash for at least one band. (source)

⬆ Complete the MinHash demo above first to see LSH in action.

Step 3: Encryption and Announcement

Now we encrypt our peer information using each preImage as the encryption key and announce to the rendezvous server using the publicTags as indexes.

⬆ Complete the LSH demo above first to see encryption and announcement.

Step 4: Peer Discovery and Decryption

Finally, let's simulate another peer (Bob) discovering Alice's announcement. If Bob observes a similar WiFi environment, they'll generate matching publicTags and be able to decrypt Alice's peer information.

⬆ Complete the announcement step above to see peer discovery in action.

You may notice that a significant overlap is required to produce the same LSH tags. Sometimes it works with two networks overlapping, and sometimes it doesn't. This is due to the probabilistic nature of LSH. Tweaking the k-value of MinHash and the b-value (bands) of LSH can potentially improve the overlap detection. I have to do some more testing to find good values for each modality.

Why ?

To be fair, I built this because it seemed interesting, not because I had a killer use case in mind. But thinking about it afterward, here are some scenarios where this approach might make sense:

Location-based multiplayer gaming: AR games or location-based apps could use this to automatically discover players in the same physical space: a park, convention center, or festival grounds. Instead of manually joining lobbies or sharing room codes, devices just detect "who else is here" through environmental fingerprints, then automatically form teams or enable shared AR content. It could just make the experience more seamless while still keeping privacy in mind.

There's probably also something here for conference networking (matching attendees by both proximity and shared interests) or IoT provisioning (sensors fingerprinting their zone to auto-configure without manual setup), though I haven't thought those through as deeply.

Features

Private Set Intersection

While LSH tags are probabilistic, peers can perform Private Set Intersection (PSI) after discovery to get exact similarity scores without disclosing their complete sets.

Multiple Modalities

The system works with any categorical data, not just WiFi networks: shared interests, Bluetooth networks, cell towers, or any modality you define. Each can have its own configuration (k-value, bands, epoch interval).

Epoch-Based Expiry

Sketches expire automatically based on configurable intervals (5 minutes, 10 minutes, etc.). Tags are withdrawn from the rendezvous server on expiry (depending on the implementation), preventing stale data and limiting tracking windows.

Rendezvous Options

Three implementations available: in-memory (testing), HTTP server (encrypted), or DHT-based (fully decentralized, but lacking some other features (see below)).

libp2p Integration

It plugs into libp2p as a service and provides automatic connection establishment, RTT measurement and libp2p peerStore integration.

If you don't like libp2p, the Sketcher and Rendezvous classes capture the core concept and don't rely on libp2p.

import { createLibp2p } from 'libp2p';
import { shimmer, httpRendezvous } from 'shimmer';

const node = await createLibp2p({
  services: {
    shimmer: shimmer({
      rendezvous: httpRendezvous('https://rendezvous.example.com'),
      sketcherConfig: {
        wifi: { k: 128, bands: 32, epochInterval: '5m' },
        interests: { k: 64, bands: 16, epochInterval: '1h' }
      }
    })
  }
});


await node.services.shimmer.sketch('wifi', ['CafeGuest_WiFi', 'HomeNetwork_5G']);
const peers = await node.services.shimmer.discover('wifi');

Security considerations

There are many things to consider here, and ultimately it really boils down to 'What's the threat model?'. I don't have waterproof solutions for every security aspect here. If you answer the previous question, there may not even be any issue. The least I can do here is expand on some security considerations I've come across.

One issue that comes up a lot in geolocation based systems is that geolocation is easily spoofed. There are many mobile apps that let users fake their GPS coordinates, and any app requesting the location will receive those fake coordinates. To prove that a device is really present at a location or nearby I came up with the idea of using environmental observations instead of GPS.

However, this also comes with a weakness: once an environment has been observed, it's easy for an attacker to keep pretending to be at in that environment (as long as the environment remains unchanged). For WiFi networks this is often the case. While building this I found out about a Google project "Eddystone-EID" that uses a beacon with rotating identifiers. This would cause an environmental change at every epoch and would make old knowledge of the environment useless, preventing such a spoofing attack. At least that's the gist as far as I understood.

The rendezvous server is another big one. Yes, the server does not learn the contents of the encrypted records. However... the server does learn which IP announces a tag, and which IP queries for it. This alone could allow any rendezvous server to learn about the proximity of IP addresses. Here are some potential mitigations:

Use Tor to talk to any rendezvous server
Oblivious HTTP (OHTTP) uses two proxies to separate IP knowledge from content knowledge
A DHT as a rendezvous: no single node knows the full set of announced tags, but this comes with a whole other set of DHT related issues: Sybill attacks, high mobile energy use, ...
Cryptographically Generated Addresses (IPv6): Instead of having a rendezvous server, have no rendezvous server! Addresses would be generated based on the LSH tags. The issue with this is that the network prefix still somehow has to be communicated. So not really a fully viable option.
Direct communication within the environment: Shocker! I know. The whole point however was to avoid this by design.

libp2p DHT

Shimmer includes a DHT-based rendezvous implementation using libp2p's kadDHT. It converts each publicTag to a CID and announces it through the Content Provider API.

Current limitations:

Records are neither encrypted nor signed (see issue)
kadDHT nodes retain Provider records for 24-48 hours, which doesn't align well with configurable epoch expiry windows

By contrast, the HTTP rendezvous implementation uses encrypted records (decryption requires the preImage) but doesn't use signatures. Signed records wouldn't add much value here I think: we could use them to prevent spam, but keypair generation costs nothing. For peer authentication, successfully decrypting with the correct preImage already proves the peer shares our environment. 🤷

Android location permissions

One thing worth mentioning: on Android, WiFi scanning has required location permissions since Android 6.0 (2015). Google considers the list of visible networks sufficient to triangulate your position using databases of known access point locations. So even though this method doesn't expose raw SSIDs to the rendezvous server, users still need to grant location access to the app itself. Just something to keep in mind if you're building this for mobile. It might cause UX friction or conflict with privacy expectations.

Geospatial indexing

When spoofing a geolocation is not a concern, and you just want to rendezvous based on a geographical location, there are existing hierarchical geospatial indexing systems. With these, areas of various sizes anywhere on the globe are identified using a code. This code can then simply serve as a rendezvous point.

H3 - Uber's hexagonal hierarchical geospatial indexing system. Divides the world into hexagonal cells at multiple resolutions, allowing efficient proximity queries. Example cell
Geohash - Encodes geographic coordinates into short alphanumeric strings. Locations with shared prefixes are geographically close. Example: u15

Anyway

I built this mostly because it was an interesting problem. The core idea works: you can detect proximity through environmental fingerprints without GPS. But there are real trade-offs. The rendezvous server potentially learns IP proximity. Spoofing is easy if someone captures your environment once. And on Android, you need location permissions anyway. Whether that matters depends entirely on your threat model.

Curious to hear about other use cases!