miniKanren中的Datalog
Datalog in miniKanren

原始链接: https://deosjr.github.io/dynamicland/datalog.html

本文件描述了一个使用miniKanren在Scheme中实现的朴素Datalog系统,目标是集成到类似Dynamicland的系统中。目的是嵌入一种逻辑编程语言,以简化某些任务。 该实现定义了一个Datalog实例,包含事实、索引和规则字段。事实表示为实体ID、属性名和值的元组,存储在哈希表中并建立索引以实现高效检索。规则使用`dl-rule!`宏定义,该宏使用`fresh-vars`和`conj`将它们转换为函数,利用miniKanren的统一能力。`dl-findo`是一个超逻辑谓词,通过考虑已知的实体或属性值来优化Datalog查询。不动点分析迭代地应用规则,将新推导出的事实添加到数据库中,直到不再生成新的事实为止。`dl-rule!`宏内的变量作用域和卫生由syntax-case处理。

这个Hacker News帖子讨论了deosjr用Guile Scheme实现的极简Datalog,并使用Hoot编译成WebAssembly。 upghost认为语法对于Datalog优雅地查询和自我转换至关重要,而他认为这个实现缺乏这一点。kragen同意语法很重要,但强调语义,指出其对二元关系的关注偏离了Datalog的N元关系数据库本质。kragen建议可以用Scheme实现更易接受的语法。jitl建议使用Lisp宏。 j-pb澄清说,在数据库文献中,Datalog通常定义为具有递归/不动点迭代的合取查询,这与具有任意否定(如SQL)的非递归片段形成对比,并将此定义追溯到20世纪80年代的数据库研究。
相关文章

原文

Having access to an embedded logical programming language makes some tasks really easy. One prerequisite for RealTalk is some form of Datalog, and I built one in Scheme using miniKanren so that I had access to all of the internals. This page explains the naive Datalog implementation I did before modifying some of it to fit my version of Dynamicland.

As a typical example we can look at a directed graph with five vertices labeled a through e. Using Datalog, we introduce these vertices as separate records, establish facts about directed edges between them, and then introduce rules about what it means for a vertex to be reachable from another vertex. In the end we manually trigger fixpoint analysis and run a query against the resulting database of initial and derived facts.

(define dl (make-new-datalog)) (define a (dl-record! dl 'vertex)) (define b (dl-record! dl 'vertex)) (define c (dl-record! dl 'vertex)) (define d (dl-record! dl 'vertex)) (define e (dl-record! dl 'vertex)) (define (dl-edge x y) (dl-assert! dl x 'edge y)) (dl-edge a c) (dl-edge b a) (dl-edge b d) (dl-edge c d) (dl-edge d a) (dl-edge d e) (dl-rule! dl (reachable ,?x ,?y) :- (edge ,?x ,?y)) (dl-rule! dl (reachable ,?x ,?y) :- (edge ,?x ,?z) (reachable ,?z ,?y)) (dl-fixpoint! dl) (define query (dl-find dl ,?id where (,?id reachable ,?id)))

This is roughly the syntax that I want to go for, minus some of the commas (or unquotes) that I did not get rid of. Now that we have a goal, let's jump into the implementation! We can define a Datalog instance as a record with the following fields:

(define-record-type (make-datalog edb idb rdb idx-entity idx-attr counter) datalog? (edb datalog-edb) (idb datalog-idb set-datalog-idb!) (rdb datalog-rdb set-datalog-rdb!) (idx-entity datalog-idx-entity) (idx-attr datalog-idx-attr) (counter datalog-counter set-datalog-counter!))

Most of these are internal indices, each a different hash-table. Facts are always a 3-tuple of entity-ID, attribute-name and value. Whenever we assert a fact we not only update the main entity table (edb), we also update two indices that index facts by entity (idx-entity) and attribute (idx-attr) respectively.

(define (dl-assert! dl entity attr value) (hashtable-set! (datalog-edb dl) (list entity attr value) #t) (dl-update-indices! dl (list entity attr value))) (define (dl-update-indices! dl tuple) (let ((entity (car tuple)) (attr (cadr tuple)) (idx-entity (datalog-idx-entity dl)) (idx-attr (datalog-idx-attr dl))) (let ((m (hashtable-ref idx-entity entity #f))) (if m (hashtable-set! m tuple #t) (let ((new (make-hashtable))) (hashtable-set! idx-entity entity new) (hashtable-set! new tuple #t)))) (let ((m (hashtable-ref idx-attr attr #f))) (if m (hashtable-set! m tuple #t) (let ((new (make-hashtable))) (hashtable-set! idx-attr attr new) (hashtable-set! new tuple #t))))))

The dl-record! macro lets us introduce one or more facts in an easy way:

(define-syntax dl-record! (syntax-rules () ((_ dl type (attr value) ...) (let ((id (dl-next-id! dl))) (dl-assert! dl id (list type attr) value) ... id))))

So far so good. Adding facts to hash-tables is maybe not the most exciting part. Next we are going to look at rules, and this will get a lot more interesting! Applying a rule is the same thing a running a query against the database. In our naive Datalog, running fixpoint amounts to applying each rule, introducing each newly found fact into our set of derived facts, and repeat that process until no more new facts are being introduced. We'll start by looking at fixpoint, and come back to rules later as they are the most complicated part of this implementation.

(define (dl-assert-rule! dl rule) (hashtable-set! (datalog-rdb dl) rule #t)) (define (dl-apply-rule dl rule) (dl-find rule)) (define (dl-find goal) (runf* goal)) (define (dl-fixpoint! dl) (set-datalog-idb! dl (make-hashtable)) (dl-fixpoint-iterate dl)) (define (dl-fixpoint-iterate dl) (let* ((facts (map (lambda (rule) (dl-apply-rule dl rule)) (hashtable-keys (datalog-rdb dl)))) (factset (foldl (lambda (x y) (set-extend! y x)) facts (make-hashtable))) (new (hashtable-keys (set-difference factset (datalog-idb dl))))) (set-extend! (datalog-idb dl) new) (for-each (lambda (fact) (dl-update-indices! dl fact)) new) (if (not (null? new)) (dl-fixpoint-iterate dl))))

As you may have guessed, we will use miniKanren to run queries. That dl-find function calls runf*, which is the same as the run* macro from miniKanren but written as a normal function. This will make it a lot easier to use it within another macro. Similarly, the fresh macro is rewritten as a (slightly less ergonomic) function that now needs the amount of fresh vars as an argument.

(define (fresh-vars count kont) (lambda (s/c) (let loop ((n count) (vars '()) (st s/c)) (if (zero? n) ((apply kont (reverse vars)) st) (let ((c (cdr st))) (loop (- n 1) (cons (var c) vars) `(,(car s/c) . ,(+ c 1))) ))))) (define (runf* goal) (mK-reify (take-all (call/empty-state goal))))

So now we can run miniKanren as follows, not using any macros:

> (runf* (fresh-vars 1 (lambda (x) (disj (equalo x 5) (equalo x 6))))) > (5 6)

Now that that is done we can rewrite a rule somewhat like this. There is a lot of work needed to actually make that work, but lets start with the rough idea:

(dl_rule! (reachable ,?x ,?y) :- (edge ,?x ,?z) (reachable ,?z ,?y)) (dl-assert-rule! dl (fresh-vars 2 (lambda (q x y) (conj (equalo q `(reachable ,?x ,?y)) (dl-findo dl ((edge ,?x ,?z) (reachable ,?z ,?y))) ))))))))) ; once more, with problem areas highlighted: (dl-assert-rule! dl (fresh-vars [numvars] (lambda (q [vars]) (conj (equalo q [head]) (dl-findo dl [body]) )))))))))

The dl-rule! macro turns the rule into a function that when ran will try to match all of the body clauses against the database and returns the head with bound variables if it succeeds. The dl-findo function implements an extra-logical predicate that follows a pretty typical Datalog optimisation. If we try to match a tuple (entity attr value) against the database, we can limit the amount of potential matches if the entity or the attribute are already known.

(define-syntax dl-findo (syntax-rules () ((_ dl (m ...)) (conj+ (dl-findo_ dl `m) ... )))) (define (dl-findo_ dl m) (fresh (x y entity attr db) (conso entity x m) (conso attr y x) (conde [(boundo entity) (lookupo (datalog-idx-entity dl) entity db) (membero m db)] [(unboundo entity) (boundo attr) (lookupo (datalog-idx-attr dl) attr db) (membero m db)] ))) (define (boundo v) (lambda (s/c) (if (var? v) (let ((x (walk v (car s/c)))) (if (var? x) mzero (unit s/c))) (unit s/c)))) (define (unboundo v) (lambda (s/c) (if (var? v) (let ((x (walk v (car s/c)))) (if (var? x) (unit s/c) mzero)) mzero))) (define (lookupo m key value) (lambda (s/c) (let* ((k (if (var? key) (walk key (car s/c)) key)) (v (hashtable-ref m k #f))) (if v ((equalo value (hashtable-keys v)) s/c) mzero))))

The most problematic part of the dl-rule! macro is figuring out all of the variable scoping and hygiene. Using syntax-case ended up being the closest I could get to what I wanted. Note that right now the database (dl) is captured in a closure, so we do not have to specify it again when running a rule.

(define-syntax dl-rule! (lambda (stx) (define (symbol-with-question-mark? s) (and (symbol? s) (let ((str (symbol->string s))) (and (positive? (string-length str)) (char=? (string-ref str 0) #\?))))) (define (collect-vars datum) (cond [(symbol? datum) (if (symbol-with-question-mark? datum) (list datum) '())] [(pair? datum) (append (collect-vars (car datum)) (collect-vars (cdr datum)))] [else '()])) (define (remove-duplicates syms) (define seen '()) (define (unique s) (let ((d (syntax->datum s))) (if (member d seen) #f (begin (set! seen (cons d seen)) #t)))) (filter unique syms)) (define (replace-symbols datum sym->gen) (cond [(symbol? datum) (let ((mapped (assoc datum sym->gen))) (if mapped (cdr mapped) (datum->syntax stx datum)))] [(pair? datum) (cons (replace-symbols (car datum) sym->gen) (replace-symbols (cdr datum) sym->gen))] [else (datum->syntax stx datum)])) (syntax-case stx (:-) ((_ dl (head hx hy) :- (body bx by) ...) (let* ((datums (syntax->datum #'((hx head hy) (bx body by) ...))) (head-datum (car datums)) (body-datums (cdr datums)) (vars (remove-duplicates (collect-vars datums))) (numvars (+ 1 (length vars))) (gens (generate-temporaries vars)) (sym->gen (map cons vars gens)) (replaced-head (replace-symbols head-datum sym->gen)) (replaced-body (replace-symbols body-datums sym->gen))) #`(dl-assert-rule! dl (fresh-vars #,numvars (lambda (q #,@gens) (conj (equalo q `#,replaced-head) (dl-findo dl #,replaced-body) )))))))))

Now we are ready to run our example! The only thing I have changed is that I have not implemented the dl-find macro yet, so in order to query I have to write things out a bit more. It would have the same problems as dl-rule! and the same solution, I imagine.

> (define query (dl-find (fresh-vars 1 (lambda (?id) (dl-findo dl ( (,?id reachable ,?id) )))))) > (3 1 4)

This should print (3 1 4) or a permutation thereof, order depending on the hash-table implementation. As a bonus, this page implements and runs all of the above using Guile Scheme and WebAssembly! The result of running all of this should be printed below:

联系我们 contact @ memedata.com