爱丽丝在可微分仙境历险记
Alice's Adventures in a Differentiable Wonderland

原始链接: https://arxiv.org/abs/2404.17625

西蒙妮·斯卡达帕内(Simone Scardapane)的论文“爱丽丝在可微分仙境中的冒险——第一卷,对这片土地的游览”(Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land)是可微分编程和神经网络的入门读物。该论文以“爱丽丝”这样的新手为目标读者,重点介绍了通过自动微分优化函数的核心概念的直观理解。 这篇论文探讨了各种常用神经网络设计,这些设计常用于处理序列、图、文本和音频。它强调了这些网络背后的关键设计技术,包括卷积、注意力和循环模块。通过弥合理论知识和实践代码(使用PyTorch和JAX)之间的差距,这篇论文旨在使读者能够理解大型语言模型(LLM)和多模态架构等复杂模型。本质上,这篇论文是一份关于可微分编程和神经网络设计的自包含指南,使读者能够掌握并构建最先进的AI模型。

The Hacker News discussion revolves around "Alice's Adventures in a Differentiable Wonderland," a book on deep learning available on arXiv. Commenters praise its clarity, accessibility for beginners, and blend of code and theory. Some compare it favorably to more theoretical texts and appreciate the inclusion of JAX alongside PyTorch. A recurring point is the level of mathematical formality. Some find the writing approachable, while others lament the occasional "sloppiness" common in mathematical writing, arguing for more explicit notation for didactic purposes. Counterarguments suggest that excessive verbosity can obscure the core concepts. Other points include a request for e-reader friendly formats on arXiv, appreciation for the author's additional resources, and a discussion on the trustworthiness of non-peer-reviewed arXiv submissions versus traditional book publishing. One commenter highlights the presence of an errata list, suggesting ongoing refinement of the book.
相关文章

原文

View a PDF of the paper titled Alice's Adventures in a Differentiable Wonderland -- Volume I, A Tour of the Land, by Simone Scardapane

View PDF
Abstract:Neural networks surround us, in the form of large language models, speech transcription systems, molecular discovery algorithms, robotics, and much more. Stripped of anything else, neural networks are compositions of differentiable primitives, and studying them means learning how to program and how to interact with these models, a particular example of what is called differentiable programming.
This primer is an introduction to this fascinating field imagined for someone, like Alice, who has just ventured into this strange differentiable wonderland. I overview the basics of optimizing a function via automatic differentiation, and a selection of the most common designs for handling sequences, graphs, texts, and audios. The focus is on a intuitive, self-contained introduction to the most important design techniques, including convolutional, attentional, and recurrent blocks, hoping to bridge the gap between theory and code (PyTorch and JAX) and leaving the reader capable of understanding some of the most advanced models out there, such as large language models (LLMs) and multimodal architectures.
From: Simone Scardapane [view email]
[v1] Fri, 26 Apr 2024 15:19:58 UTC (13,152 KB)
[v2] Thu, 4 Jul 2024 14:52:11 UTC (29,301 KB)
联系我们 contact @ memedata.com