## 关于拥有一个数据对象
On Having a Data Object

原始链接: https://www.natemeyvis.com/on-having-a-data-object/

## 重新思考数据-对象模式 创建类来管理与数据库特定部分交互的常见做法(例如,用于所有帽子相关数据的“帽子”类)非常普遍——受到 Django 等框架的支持,并在大型公司中盛行。虽然表面上简化了数据访问,使用“普通旧对象”,但这种模式常常会引入隐藏的复杂性。 作者反对这种方法,强调了三个关键问题:**上下文差异**、**不同的访问需求**和**类膨胀**。应用程序的不同部分通常需要对相同数据进行微妙的差异化表示(帽子对象在一个上下文中需要 SKU,而在另一个上下文中需要价格),这迫使对单个数据对象做出妥协。此外,语义上相似的查询可能需要不同的实现(一致性与速度),导致混乱、过度参数化的函数。 最后,这些中心对象不可避免地会变得巨大,难以测试、维护和理解。这种模式并非整合模块,反而常常*增加*了复杂性。作者提倡一种更模块化的方法——在特定模块内管理持久性,即使这意味着一些重复——以获得更清晰、更准确且最终更易于维护的代码。

这个Hacker News讨论的核心是,在软件开发中使用“数据对象”——传统上用于存储数据的类——的缺点,尤其与使用更函数式的方法,例如使用简单的地图等数据结构形成对比。 核心论点源于一篇链接的文章,即使用数据对象对具有大量潜在组合事实的丰富领域进行建模,会导致代码膨胀且难以重构。每个调用者需要特定数据集,这要么导致巨大的对象,要么导致专门类的激增。 讨论中的替代方案包括使用在共享数据结构(如Clojure中的地图)上操作的函数,以及采用命名空间限定的关键字和“spec”等技术来维护结构保证。有人担心使用这种方法会失去结构安全性并增加测试复杂性,但有人反驳说,通过视图(如`UserLoginView`)进行显式分离可以缓解这些问题。 最终,对话强调了一个权衡:数据对象提供了熟悉性和易于上手性,但可能变得笨重,而替代方法需要更多的纪律性,并可能引入不同的复杂性。
相关文章

原文

Here's a common pattern:

  1. Carve out a chunk of your persistence layer--e.g., everything storing information about hats.
  2. Create a class for managing interactions with that chunk of the persistence layer.
  3. Use that class whenever you need to interact with that part of the persistence layer.

People love this pattern. Everyone likes the (apparent) simplicity of plain old objects, and many engineers also think that those objects bring abstraction benefits.[1]

Django and other frameworks enforce this pattern as a basic design principle. It's a motivation for the many ORMs out there. And almost any big company has a standard mechanism for defining and generating classes of this sort.

This data-object pattern has so much infrastructure and so much consensus supporting it that it can seem like the only reasonable option. But often it's not, and you would be better off using an alternative, for a few reasons:

1. You should often be using different objects in different contexts.

Parts of the codebase that initially seem to require identical objects--e.g., because those objects both concern the same real-world item--often require subtly different objects. But the data-object pattern requires you to use the same object for both of them.

The domain-driven design ("DDD") way to say this is: bounded contexts require different models. But accepting this point requires no ideological commitments. A Hat object can properly include a representation of a SKU in some contexts but not others; a representation of a discount or price in some contexts but not others; CAD artifacts in some contexts but not others; and so on.

I've never seen a good way to use the same object across contexts. You can mark the SKU field as Optional, but the semantics are wrong: Usually, it's never actually optional, but either required or irrelevant, so you have to write extra code for validation. (Also, these objects tend to be hard to test for the same reasons decorated functions are, and it can be tricky to work around this.)

Meanwhile, if you have different Hat objects in your hat-ordering and hat-creation modules, those objects can behave exactly as those contexts require. Those objects might look similar, and parts of them might feel untidily repetitive, but the benefits of accuracy usually outweigh those of consolidation.

2. The pattern pushes you to treat different access patterns as the same.

Semantically analogous operations ("what are all the orders for this hat SKU?") can require importantly different implementations. Sometimes you need strongly consistent reads, and sometimes you don't; sometimes you'll be planning to filter or query the result further; and so on.

You certainly can make your orders_for_hat() function accept various flags and parameters to satisfy the requirements of its various callers. But this tends to (i) be messy and (ii) break encapsulation. Very often, you're still implementing multiple units of functionality--precisely what you were trying to avoid!--but in a clunkier, more bug-prone way.

3. The classes get huge and painful.

The most important objects you deal with--e.g., the Hat object in hat-management software--will wind up with tons of code.

That's not all bad. As a system matures, it needs to keep track of a lot of things; the real world is messy. Huge Hat classes are, in part, a sign that you've figured out a lot of the little details you need to represent. But if all those little details are part of a single big class you're importing everywhere, you're compounding necessary difficulties with unnecessary ones. As I've argued, you'll be telling yourself lies with your type system; making your methods know too much about their callers (and, often, your callers know too much about the methods); and making it all hard to test.

This makes for a bad time. I've worked in many code bases where the central persistence-facing objects were huge (in one memorable case, well over 2,000 lines). I've never seen it go well.

The promise of the data-object pattern is to replace N modules with 1. But if the N things you're "replacing" are actually necessary, and you'll be implementing more or less damaged versions of those N things whether or not you use the data-object pattern, then you're going not from N to 1 but from N to N + 1.

Moreover, that "+ 1" tends to punch above its weight. So, for example, if the object is being automatically generated according to some DSL in some build process, a lot can go wrong. But that's a different post.

For now, remember that the data-object pattern is neither a law of nature nor a universal best practice. A clean, module-specific persistence-management layer is often the best available tool.

[1] Not all the data objects you'll see with this pattern are "plain old" objects according to the Fowler et al. sense of "POJO." The point is simply that people want to call get_hats() more than they want to look at SQL or a DynamoDB query. (Also, even senior-level colleagues will advocate for this pattern because it "gives us a POJO," whether or not that's formally accurate.)

联系我们 contact @ memedata.com