(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43604640

CodeScientist是一个新的自动化科学发现(ASD)系统,旨在克服现有系统局限于狭窄的设计空间和有限的代码评估的缺点。它采用遗传搜索方法,结合研究论文和代码块来生成和测试想法,重点关注智能体和虚拟环境等领域。 该系统自动化实验构建,从而能够进行比简单的基准优化更广泛、更多样的发现。研究人员使用CodeScientist运行了数百个实验,产生了19项发现。这些发现都经过了外部评审、代码评审和复制尝试的严格评估。其中,有六项被认为既可靠又具有增量式的新颖性。这些发现涵盖了新的任务、智能体、指标和数据,标志着向更广泛的科学探索迈进了一步。该项目的代码和更多信息可在Github上找到。


原文
Hacker News new | past | comments | ask | show | jobs | submit login
CodeScientist: Automated scientific discovery system for code-based experiments (github.com/allenai)
4 points by liamdgray 2 hours ago | hide | past | favorite | 1 comment










Abstract: "Despite the surge of interest in autonomous scientific discovery (ASD) of software artifacts (e.g., improved ML algorithms), current ASD systems face two key limitations: (1) they largely explore variants of existing codebases or similarly constrained design spaces, and (2) they produce large volumes of research artifacts (such as automatically generated papers and code) that are typically evaluated using conference-style paper review with limited evaluation of code. In this work we introduce CodeScientist, a novel ASD system that frames ideation and experiment construction as a form of genetic search jointly over combinations of research articles and codeblocks defining common actions in a domain (like prompting a language model). We use this paradigm to conduct hundreds of automated experiments on machine-generated ideas broadly in the domain of agents and virtual environments, with the system returning 19 discoveries, 6 of which were judged as being both at least minimally sound and incrementally novel after a multi-faceted evaluation beyond that typically conducted in prior work, including external (conference-style) review, code review, and replication attempts. Moreover, the discoveries span new tasks, agents, metrics, and data, suggesting a qualitative shift from benchmark optimization to broader discoveries."






Join us for AI Startup School this June 16-17 in San Francisco!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com