编程语言应该有一个树遍历的原生指令。
Programming languages should have a tree traversal primitive

原始链接: https://blog.tylerglaiel.com/p/programming-languages-should-have

这篇文章提出了一种新的控制流结构 `for_tree`,旨在优雅地遍历树形结构,类似于 `for` 和 `foreach` 循环处理线性迭代的方式。`for_tree` 的语法模仿标准的 `for` 循环:`for_tree(init; condition; branch){ //body }`。`branch` 部分定义了如何从一个节点移动到它的子节点,编译后将转换为递归函数调用。 这比手动递归具有优势:代码更简洁、逻辑更清晰,并且可以使用 `break`、`continue` 和 `return` 进行流程控制。还引入了一个 `prune` 关键字来阻止遍历到节点的子节点。与基于范围的 for 循环不同,`for_tree` 既适用于内存中的树(不需要迭代器),也适用于命令式的树形遍历。讨论还指出了潜在的一些问题,例如叶节点过度分支,并提出了相应的解决方案。一个使用模板和宏的 C++ 实现演示了这个概念,突出了在原生语言支持下实现更简洁语法的潜力。虽然优先考虑深度优先搜索,但也考虑了广度优先的替代方案。

在Hacker News的一次讨论中,编程语言内置“树遍历原语”的想法引发了辩论。许多人同意需要更好的工具来操作复杂的数据结构,并重点介绍了现有的解决方案,例如Optics(Haskell的`lens`,Scala的Monocle)和Zippers(Huet最初的Zipper,Clojure内置的zippers),它们用于专注的访问和修改。 一些人赞成语言级对诸如Lenses之类的功能的支持,而另一些人则主张使用标准库接口,例如“BreadthFirstTraverse”,以提高可用性。还建议使用现有的语言特性,如迭代器和生成器作为替代方案,函数式语言如Haskell则利用Foldable/Traversable类型类。 人们担心实现迭代器的复杂性以及递归方法可能导致堆栈溢出。讨论还涉及代码重用的重要性以及用于图和树数据结构的经过实战检验的库的可用性。

原文

There should be a control flow construct in programming languages that can handle tree-like traversal in a nice way, similar to how for/foreach loops can handle linear traversal. It's a bit of a missing gap in the current set of control flow constructs most languages these days have settled on. Its a thing I end up having to do *all the time* and it seems like there should be some shortcuts for it.

I posted a thought about this recently and was thinking about how I would want something like that to look like. I can't find anything similar in any other languages. I'm most familiar with C++, so my sample code here is going to be C++-ish, but its general enough that you could fit it into basically anything else. I'm not a language design expert, so this is more of a loose idea than a formal proposal.

for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}){
	print(N->value);
}

with the specifics removed, this is *almost* the same syntax as an existing for loop

for_tree(init; condition; branch){
	//body
}

It's basically a for loop that branches instead of continuing linearly.

"init" here runs when you enter the for_tree loop, exactly the same as a normal for loop
"condition" must be met to enter the body of the loop
"branch" evolves the initial value into multiple branches (it would have to compile down into recursive function calls here)

Well, this is significantly easier and less error prone than implementing recursive functions for every operation you'd want to do on every type of tree, plus all the relevant code ends up in-place and readable. This would likely compile down to the same code that a recursive function would anyway. It's just a nice shortcut, the same way for is a nice shortcut for gotos and labels.

for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}){
	print(N->value);
}

would compile down into the equivalent of

void for_tree_internal(Node* N){
	print(N->value);
	for(Node* n2 : {N->left, N->right}){
		if(n2 != NULL){
			for_tree_internal(n2);
		}
	}
}
for_tree_internal(mytreeroot);

Doesn't for_tree(...) look a lot nicer and simpler and less error prone than needing to implement a recursive function for each operation you would want to do on a tree?

Additionally, for_tree would offer a few more benefits that you couldn't easily do with a recursive function, for instance being able to use break, continue, and return from within the function body. Those would behave similarly to how a for loop does.

for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}){
	if(N->value == 10) {
		found = true;
		break; //would break out of the entire for_tree block,
                       //unwinding the stack in the process
	}
}

There is an additional flow construct that could be added here, "prune", which would prevent the loop from traversing into the branches off of the current node.

for_tree(Node* N = mytreeroot; N != NULL; N : {N->left, N->right}){
	if(N->value == 10) {
		prune; //dont need to check any children of this node
	}
}

Well a range based for loop requires that your tree exist in memory AND that you have an iterator defined for your tree. With for_tree you could operate on an entirely imperative tree, without needing to define any iterators or generator functions. Here's an example where I'm checking every single string composed of "a", "b", and "c" of length 8 or less.

for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
	print(x);
}

This is an operation that requires "tree-like traversal" but is not iterating over a data structure that exists in memory. You can do that entire thing inside for_tree. It's entirely agnostic to any underlying data structures you are or aren't using. The "prune" flow described earlier is also something you can't get from a range based for loop.

There is a bit of a footgun here, which is that in that above string example its actually generating and rejecting all of the strings of length 9. A normal for loop *also* reaches the state after its last valid state before exiting, its just kind of not that much of an issue with a linear for loop where that will usually be like, one extra addition and conditional check. On the other hand, "one level deeper" in a tree traversal means checking the validity of multiple times as many leaf nodes as you need to.

You could resolve this manually in multiple ways, generate an empty branch list if you are at a leaf already

for_tree(string x = ""; x.size() <= 8; 
x : x.size() < 8 ? {x+"a", x+"b", x+"c"} : {}){
	print(x);
}

or just prune in the function body

for_tree(string x = ""; x.size() <= 8; x : {x+"a", x+"b", x+"c"}){
	print(x);
	if(x.size() == 8) prune;
}

There may be a better way to resolve that, but would have to pull the syntax further away from a for loop I think. Anyway.

So, a depth first search is pretty simple, uses minimal extra memory, and works great with the stack that we are already using. BFS requires a lot of extra memory (enough that it would likely need to use the heap) and is more complex. If you really wanted to you could do an alternate function like for_tree_breadth_first(...), but I think the extra complexity needed for a BFS might be too much for a “primitive” construct.

Anyway, as a bonus here's a test C++ implementation that tries to get as close to this as possible with a bunch of templates and macros. The usage is uglier, and a few constructs do not work (ex, you cant return from within a for_tree here), but its kind of a neat proof of concept I think.

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
#include <vector>
#include <string>
#include <iostream>
enum class _tree_return {
Continue = 0,
Prune,
Break
};
template<typename T, typename F1, typename F2, typename F3>
_tree_return _for_tree(T initial, F1 condition, F2 branch, F3 visit) {
_tree_return result = visit(initial);
if(result == _tree_return::Break) return _tree_return::Break;
if(result != _tree_return::Prune) {
for(T subnode : branch(initial)) {
if(condition(subnode)) {
_tree_return result = _for_tree(subnode, condition, branch, visit);
if(result == _tree_return::Break) return _tree_return::Break;
}
}
}
return _tree_return::Continue;
}
#define tree_break return _tree_return::Break
#define tree_prune return _tree_return::Prune
#define tree_continue return _tree_return::Continue
//v-- semicolon to not allow you to get the return value here
#define for_tree(XName, Xinitial, Condition, Branch, Visit) ;_for_tree(Xinitial, \
[&](decltype(Xinitial) XName){ return Condition; }, \
[&](decltype(Xinitial) XName){ return std::vector<decltype(Xinitial)>Branch; }, \
[&](decltype(Xinitial) XName){ Visit; return _tree_return::Continue; })
//excuse the use of a std::vector in there, I guess you cant return an initialize_list from a lambda
//that wouldn't really be an issue if this was implemented at the language level instead of hacked together from lambdas and macros
struct Node {
Node* left = NULL;
Node* right = NULL;
std::string value;
Node(std::string value):value(value){}
};
int main() {
//syntax is a little uglier than it could be if it was native
//imperative tree sample
for_tree(x, std::string(""), x.size()<=8, ({x+"a", x+"b", x+"c"}), {
std::cout << x << std::endl;
});
//tree structure sample
Node mytree("root");
mytree.left = new Node("left");
mytree.right = new Node("right");
mytree.left->left = new Node("leftleft");
for_tree(x, &mytree, x != NULL, ({x->left, x->right}), {
std::cout << x->value << std::endl;
});
return 0;
}
联系我们 contact @ memedata.com