企业上下文层

企业上下文层
The Enterprise Context Layer

原始链接: https://andychen32.substack.com/p/the-enterprise-context-layer

## 民主化企业知识：企业上下文层构建一个全面、自我更新的公司知识库——“企业上下文层”（ECL）—— 令人惊讶地是可以实现的。尽管围绕复杂解决方案（如知识图谱）存在炒作，但最近的实验表明，仅使用 1,000 行 Python 代码和一个 GitHub 仓库就能取得成功。挑战在于超越简单的文档检索，去*理解*公司的细微差别：产品消歧、发布细节、内部流程和冲突信息。现有的解决方案难以实现这种整体视图。关键在于使用 AI 代理映射组织的各个方面——产品、人员、流程——并且至关重要的是，为每个声明*引用来源*，创建一个可追溯、可验证的知识库。这种方法不追求可读性，而是追求准确性和上下文。该实验涉及 20 个代理生成 6,000 次提交，跨 1,020 个文件，映射从客户旅程到功能标志清单的所有内容。结果系统超越了现有的检索系统，甚至可以识别出最适合路由到专业团队的敏感问题。这并非一个产品，而是一种实践——所有公司内部 AI 代理的基础层，由机器构建和维护，并且随着 LLM 的改进而日益普及。未来设想将从定制 AI 代理转向利用共享的、机器维护的上下文层来获取所有组织知识。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录企业上下文层 (andychen32.substack.com) 11 分，来自 zachperkel 1 小时前 | 隐藏 | 过去 | 收藏 | 讨论帮助指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

It’s trivially simple to build the Enterprise Context Layer: the central intelligence that encompasses all knowledge for your company, is able to answer any questions, and self-updates.

The founders and VCs will tell you it’s the next trillion dollar company. The SaaS players will try everything to convince you that they and they alone will be the solution to solve it all. They’ll throw around words like knowledge graphs, ontologies, semantic layer, taxonomies, etc.

But what if I told you that all you need is 1000 lines of python + a github repo?

Six months ago, my task was simple: build a bot that helps our GTM reps answer customer questions. Things like “will X feature be available next quarter,” “how are you different from Y competitor?”, “what’s your data retention policies?”

These sound simple. They are not.

Take “will X feature be available next quarter.” Four things have to go right for an AI to answer this correctly.

1. Product disambiguation. You need to understand what our product is and what the feature even refers to. People ask “does X feature exist” and they’re usually talking about our flagship product, inbound email security. But sometimes they’re not. We have several products that overlap, and internal names don’t always match external ones. Just getting this part right requires a well-tuned RAG system.

2. Release semantics. Does “available” mean early access or general access? Is it gated to specific customers or regions? What about EU customers, FedRAMP customers? The AI would need to know to ask a clarifying question like “where are you based?” This trips up even good retrieval systems.

3. Roadmap process. At Abnormal, we have specific processes around sharing roadmap items (e.g., signing an NDA before showing the roadmap). If there are escalations, PMs, engineers, and sometimes execs get involved. How does the AI decide whether to answer, ask for an NDA, or escalate? No system handles this today.

4. Source conflicts. Some of this is documented. Some isn’t. Some documentation conflicts with other documentation. A PM might announce something is “coming soon” but the feature keeps slipping. Even humans can’t reliably parse all of this.

As far as I can tell, the best solutions can address one, maybe two. But not all four (and if your company is more complex, certainly there are more than these four).

How do you solve this? Seems like you’d need a team of people sitting in a room just maintaining some kind of rule to enforce consistency (we and many other companies tried, and always fail).

For reference, we recently started using Glean, and they solve for some of this is by combining context graphs + trace learning + per customer embedding models + calibration loops, etc. And it’s really really good (I personally used it a lot when onboarding). They have some of the best engineers (ex Googler search infra) working on this and they meaningfully advanced a problem that Microsoft, Atlassian, all couldn’t solve within their own software.

But at the end of the day, retrieval and synthesis solve fundamentally different problems. Glean is excellent at finding documents, arguably the best in the world at it. What it doesn’t do is synthesize organizational context: the judgment calls, the institutional memory, the “this question is actually dangerous and you shouldn’t answer it” knowledge.

Simple question: “When a customer churns, how long until their data is deleted?”

A retrieval system will find the best-matching document and confidently return “customer data is deleted after X days.” Sometimes it will pull up more detail and say it depends on the product. If you are a rep and you answer with this, you will get slapped on the hand.

Here’s the actual answer: don’t answer unless you absolutely know what you are doing. Otherwise route to the security team.

A retrieval system searches for: best document to match this query. And if the ideal document doesn’t exist and reps tend to use a suboptimal document a lot (boosting its ranking), that one document gets cited every. single. time.

Glean has industry-leading retrieval heuristics, with hundreds of engineers staffed on ML pipelines to automatically map taxonomies, build context graphs, and fine-tune ranking. They can infer that when a customer asks “does Abnormal have X feature”, they usually mean the Email Security product; they can infer that a mention of “Abby” in a Slack thread is the name of our “AI Data Analyst” solution.

And yet, retrieval alone still doesn’t “understand” your company.

If all you do is document fetching, you would never be able to arrive at the conclusion that: “This question is extremely sensitive and reps have already gotten it wrong multiple times. This should be routed to our security team” or “this PM always announces features before they are fully shipped, let’s not give customers a committed answer right now.”

To solve this, you would need a much more representative context layer, a space that has the capacity to map all possible objects, relationships, politics, and behavior across R&D, GTM, Legal, HR, etc

Sounds impossible, right?

Agents with basic document search tools + access to a github repo to build the context layer.

This is the prompt i used:

You are an Enterprise Context Layer (ECL) Agent that builds and maintains internal mental models, the reasoning frameworks our experts use, not just raw facts.

It serves to document everything related to our product/people/process/etc.

Some (non exhaustive by any means) dimensions of understading:

- How our product works

- How to communicate things to customers

- Organization behavior/politics

- etc

This enterprise context layer needs to be a truthful reflection of how our product/people/process works. It should contain the good, bad, and ugly.

the ECL is not built for readability. It’s built for traceability and verifiability

Every claim, every statement, soft or hard, has to have an inline citation for the source(s) that it directly draws from.

plus some instructions about how to use tools and a simple file-based task system inspired by Anthropic’s C Compiler

That’s It

The result after running 20 parallel agents on this for ~2 days:

6000 commits, 1,020 files. 11 domains. Every product, every process, every team, every compliance framework, every competitive dynamic, mapped, cross-referenced, and verified against every internal source.

The agents generalized scarily well. Watching the ECL evolve was both fascinating and terrifying.

It produced artifacts that would have been otherwise impossible for our team to map out.

Without showing the exact details, here are some examples:

The entire customer journey: from first sales contact through deployment, onboarding, renewal, and churn, mapped end-to-end with handoff points between teams, common failure modes at each stage, and the exact playbooks for each scenario. Cross-referenced against real support cases and Gong calls.
A detection model lifecycle document that maps all distinct causes of detection behavior change to customer-visible impact, with triage frameworks, source code references, Databricks dashboard links, and real incident case studies. It bridges engineering, support, and customer success into one coherent mental model that previously only lived in a few engineers’ heads.
Battle cards for every conceivable competitor where claims are backed by a closed evidence loop: a specific Gong recording where the claim surfaced, cross-correlated with our actual product capabilities, linked to the Salesforce case showing how the deal ended, and tied to field team discussion threads documenting what messaging actually worked.
A complete inventory of all feature flags across our proto files, each cited back to specific line numbers in our codebase, with GovCloud overrides and deprecation status mapped. No human has ever maintained something like this — it would be out of date the moment you finished writing it.

And it answers infamous questions like:

If my customer churns, how long do we keep their data?

“Great question — the ECL has clear guidance on this: reps should NOT answer data retention/deletion timeline questions themselves. They should route to the security/privacy team. Why? Because the ECL has documented a history of reps giving incorrect or oversimplified timelines to customers — confusing different internal processes with each other, or giving specific day counts that don’t reflect the actual deletion pipeline.

… plus some other internal details”

The agents learned, through many trials and errors, how to cite and even prioritize sources.

This problem of knowledge reconciliation and management has never had a viable approach before, until now.

These files, together maps how information actually flows through the company. At a high level, the conclusions look something like this:

1. Architecture claims are durable; status claims are ephemeral. “We use API-based integration” is true for years. “Feature X is coming soon” can be unreliable.

2. No single source is universally trustworthy. Code is the best source for how something works, but not entirely accurate for whether it works in customer environments. A PM’s Slack message is unreliable for technical details, but it’s the best signal for what’s shipping next. Support cases are ground truth for customer experience but may describe bugs that have since been fixed.

3. Process documentation describes the ideal. Reality is messier, more informal, and more team-dependent than the docs suggest.

4. Three independent sources agreeing is the threshold for high confidence. But five Slack messages from the same channel are one data point, not five.

5. When in doubt, document the conflict itself. Two sources disagreeing is more useful information than picking a winner and hiding the disagreement.

I’ve been working on this problem for 6 months now, and this is 100% accurate to my mental model of how our sources work. And the agents fleshed out so many more details than I was even aware of.

What does this mean? You can keep feeding the agents a mix of good and bad data and they will simply generalize to learn what is good or bad.

They are simply the folder structures themselves and backlinks in these files

If an LLM discovers that “data retention type questions are super sensitive”, it will backlink the gtm-related article to the privacy-related article, and explain why it made that link

Nothing complicated, just text that LLMs can understand and trace through.

And as you run these agents more and more, the context layer will only get richer and richer, with more backlinks, cross-references, and inter-departmental mental models

Side note: they even built a guide to how to use their own tools and found several bugs:

You get the point.

There are many many more examples which I cannot share because they are internal details.

But other than cases where it didn’t have access to certain sources that are sensitive (google drive, salesforce objects, etc). It has beaten our current document retrieval systems on all the questions that I’ve tested it on.

And all it took was ~1000 lines of python + a Github repo.

The conclusion? The enterprise context layer will be democratized very soon

As LLMs get better and have longer contexts, we’ll probably see people dumping more and more stuff into this enterprise context layer.

Abnormal is a startup (~1000 people), so it’s quite reasonable for us to put our entire company’s knowledge into a single repo.

But what if you are Microsoft? Probably need to wait for LLMs to get a bit better before this fully works across the entire company.

You can really use whatever harness you want for these. For search, I used a mix of our in-house retrieval system + Glean search API to get data from Slack/Jira/Gong/etc.

For building the enterprise context, I personally just gave the agents plain bash access to the github repo in a Modal sandbox, which has worked well enough so far.

The main thing that makes the enterprise context layer work isn’t really the harness, but rather the ability for the newest models to intelligently reason through scattered context and come up with a holistic mental model.

But obviously, better search tools and better harness can save you quite a bit of tokens + time.

The self-maintenance architecture:

A maintenance agent continuously scans the ECL for files that haven’t been verified recently, areas that are missing, cross-references that have drifted. It creates tasks as markdown files in the tasks/ directory.

20 worker agents run in parallel. Each one:

Pulls the latest ECL from git
Claims a task by renaming it (appending -LOCKED) and pushing to main
Executes the task using every tool available: reads source code, searches Slack, queries Jira, searches Salesforce cases, reads Gong call transcripts
Updates or creates the relevant ECL files
Deletes the task file
Commits and pushes to main

I actually lied, the agents didn’t learn to cite and prioritize sources or document how to use their tools. This is a very meta level skill and they didn’t know they needed to learn it.

But all I did was make a directory called meta/ and seed it with a single plain markdown file called how-to-get-accurate-information.md

with just this:

“put in here synthesis of how to use tools to cite right sources, what sources or things tend to be out of date”

and the agents filled out everything else

No changes in the prompt or agent architecture, just a single new directory and seed file.

I’ll probably make better harnesses for actually retrieving answers from the enterprise context layer. Right now the READMEs and navigations are optimized for contribution, not necessarily retrieval
1. I currently ask question simply by running Claude Code over the file system and its tool calls are pretty reliable at finding the needed content. But it’s probably better to have the agents build a more definitive retrieval path.
Human expert in the loop: some systematic way for humans to broadly guide the AI & for it to learn from the feedback.
1. I’m pretty confident that the product-related content has super-human accuracy. But what about sales & marketing? The comprehensive battle cards are impressive and they all look right from my point of view. But I’m not an AE, and with enterprise sales, even small nuances can matter a lot.
Making the maintenance architecture cheaper and more scalable. Now that most of context layer has been bootstrapped, we probably need a system to auto-ingest new Jira tickets, Slack threads, etc instead having the agents search over them.
RBAC-ed contexts for team/personal uses (ie: access to all your meetings, private dms)

There might be things to tweak here and there, like adding a Postgres DB instead of raw markdown, or distributing the file store.

But the core answer to: “how do we accurately map the behavior of the entire company into an LLM-understandable form” is now clear.

What if people change roles? What if Jira itself becomes different? What if strategic decisions are reversed?

The answer? More enterprise context.

Build a self-maintaining org chart with the history of every person and role.

Build a mental model of how every software system currently works and has worked in the past.

Build a history of every strategic decision ever made.

Stop building custom agents with hard coded rules :P

Most AI-native enterprises will probably start by filling up their Github repo and then build something more sophisticated later on, with access control, compliance, and multi-tenancy.

I think companies will gradually stop building custom agents that encode hard business logic. Instead, all organizational knowledge will live in context layers, maintained by machines, verified against primary sources, confidence-tagged, and queryable by any agent for any purpose.

Enterprise context layer: company-wide knowledge
Team/org context layer: function-specific playbooks
Personal context layer: individual preferences and style

This also feels more like a practice, not a product. The ECL pattern is closer to DevOps than to Salesforce, ie: something that most companies will do in-house.

Every AI agent in the company reads from these layers. The agent itself is simple. The context is what makes it smart. We’re only at the very beginning of this.

企业上下文层 The Enterprise Context Layer

企业上下文层
The Enterprise Context Layer