(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43976895

Hacker News 上的一篇讨论围绕着知识图谱的实用性展开,特别是那些利用大型语言模型 (LLM) 来理解文档构建的知识图谱。人们对它们在简单的、定义明确的关系(例如公司结构)之外的有效性表示怀疑。评论者质疑生成的“三元组”(主语/谓语/宾语)是否有意义,以及由此产生的图谱是否比更简单的方法更有价值。 有人建议使用 Markdown 文件结合大型语言模型(如 Claude)和 Telegram API 来进行更简单的信息检索,作为替代方案。开放世界知识图谱的局限性,包括类似垃圾邮件的问题,也被强调出来。另一些人发现知识图谱在特定领域非常有用,例如安全领域,用于管理复杂的多对多关系(例如访问控制)。有人建议使用 Gemini 作为创建知识图谱的替代方案。 最后,有人断言基于图谱的信息检索已经过时,因为大型语言模型在内部创建它们的能力越来越强。但这受到了质疑,有人指出他们正在处理一个利基领域的法律文件,大型语言模型缺乏必要的相关信息。

相关文章
  • (评论) 2025-05-08
  • (评论) 2025-05-13
  • 2025-05-13
  • (评论) 2025-05-13
  • (评论) 2025-05-12

  • 原文
    Hacker News new | past | comments | ask | show | jobs | submit login
    Build real-time knowledge graph for documents with LLM (cocoindex.io)
    76 points by badmonster 6 hours ago | hide | past | favorite | 14 comments










    I feel like I should understand the purpose of knowledge graphs, but I just... don't.

    Like the example "CocoIndex supports Incremental Processing" becomes the subject/predicate/object triple (CocoIndex, supports, Incremental Processing)... so what? Are you going to look up "Incremental Processing" and get a list of related entities? That's not a term that is well enough defined to be meaningful across a variety of subjects. I can incrementally process my sandwich by taking small bites.

    I guess you could actually expand "Incremental Processing" to some full definition. But then it's not really a knowledge graph because the only entity ever associated with that new definition will be CocoIndex, and you are back to a single sentence that contains the information, you've just pretended it's structured. ("Supports" hardly a well-defined term either!)

    I can _kind of_ see how knowledge graphs can be used for limited relationships. If you want to map companies to board members, and board members to family members, etc. Very clearly and formally defined entities (like a person or company), with clearly defined relationships (board member, brother, etc). I still don't know how _useful_ the result is, but at least I can understand the validity of the model. But for everything else... am I missing something?



    IMO knowledge graphs are a must have for security use-cases because of how well they handle many-to-many relationships. Who has access to read each storage bucket? Via which IAM policies? Who owns each bucket? What is the shortest possible role-assumption path available from internet-exposed compute instances to read this bucket? What is the effective blast radius from a vulnerability that allows remote code execution on an internet exposed compute instance?

    Or, I have a docker container image that is built from multiple base images owned by different teams in my organization. Who is responsible for fixing security vulnerabilities introduced by each layer?

    We really could model these as tables but getting into all those joins makes things so cumbersome. Plus visualizing these things in a graph map is very compelling for presentation and persuading stakeholders to make security decisions.



    I feel like you can do the same using a single markdown file and an LLM (e.g. Claude Code).

    I do it that way and then I hooked it up with the Telegram API. I’m able to ask things like “What’s my passport number?” and it just works.

    Combine it with git and you have a Datomic-esque way of seeing facts getting added and retracted simply by traversing the commits.

    I arrived to the solution after trying more complex triplets-based approach and seeing that plain text-files + HTTP calls work as good and are human (and AI) friendly.

    The main disadvantage is having unstructured data, but for content that fits inside the LLM context window, it doesn’t matter practically speaking. And even then, when context starts being the limiting factor, you can start segmenting by categories or start using embeddings.



    People probably don't discuss the problems enough about an open world knowledge graph. Essentially the same class of problems as spam filters. Using an open language model to produce a graph doesn't create a closed world graph by definition. This confusion as well as just general avoidance of measuring actual productivity outcomes seems like an insurmountable problem in knowledge world now and I feel language itself is failing at times to educate on this issues.


    They don't even do any entity disambiguation, the resulting graph won't be very useful indeed. I also saw people then use a different prompt to generate a cypher query from user input for RAG, I can't imagine that actually works well. It would make a little more sense if they then use knowledge graph embeddings, but I'm not sure if neo4j supports that.


    idk if it’s precisely the same but o3 recently offered to create one for me in, was it markdown?, recently. suggesting it was something it was willing to maintain for me.


    sorry, what is `o3`? I am not familiar with it... unless you're talking about the open api chat gpt model?

    If so thats crazy, and I would love pointers on how to prompt it to suggest this?



    i think it offered a few formats but specifically remember it would do it in obsidian to use concept map ability within.


    mermaid probably.


    Why not merely upload all relevant documents into Gemini? Split the knowledge into smaller knowledge domains and have agents ( backed by Gemini) for each domain?


    Now imagine it with theorems as entities and lean proofs as relationships


    building knowledge graphs (GrahRAGs) are obsolete from a acamedic and technical point of view. LLMs are getting better with built in graph networks capable algorithms like SONAR and knowledge embeddings. like someone said - just use Notebook LM instead. But, they are useful in corporate setup when the infrastructure,teams and skills are lagging by years.


    My use case is for documents related to a legal issue, where a foundation model has no knowledge of any of the participants or particular issues. There are many, many such situations. Your statement is ignorant and overly broad.


    Could you provide some academic proofs from what I read this isn’t true so I’d be interested to see what you’re referring to






    Consider applying for YC's Summer 2025 batch! Applications are open till May 13


    Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



    Search:
    联系我们 contact @ memedata.com