我用来创作科幻小说集的代码和开源工具。

我用来创作科幻小说集的代码和开源工具。
The code and open-source tools I used to produce a science fiction anthology

原始链接: https://compellingsciencefiction.com/posts/the-code-and-open-source-tools-i-used-to-produce-a-science-fiction-anthology.html

## 思考更怪异：一个人的出版成功上个月，乔·斯泰奇出版了科幻短篇故事集《思考更怪异》，该书在亚马逊的短篇故事集中短暂地成为#1畅销新书——超越了传统出版社出版的书籍，尽管这是一个在全职工作和抚养年幼子女的同时，由个人独立完成的项目。这一成功源于利用编程技能构建的高效、自动化出版流程。斯泰奇利用Python、YAML文件和LaTeX来管理整个过程。他使用YAML文件跟踪391个故事，以便于组织和版本控制，然后开发了一个Python命令行工具（“se.py”）来浏览和分析数据，监控选集构成和字数。排版是通过LaTeX实现的，提供专业的排版和可重复构建。电子书是通过使用Pandoc转换LaTeX源代码创建的，并使用自定义脚本来增强目录。关键要点是组织、自动化以及简单透明的文件格式。斯泰奇强调，构建工具来自动化重复性任务，并采用循序渐进的学习方法，使这个雄心勃勃的项目成为可能。他鼓励其他考虑自出版的人探索类似的方法，提供他的电子邮件（[email protected]）以解答问题，并推广他的选集《思考更怪异》，供创新科幻迷阅读。

## 引人入胜的科幻选集：摘要 Mojoe 最近发布了一本科幻选集 ([compellingsciencefiction.com](https://compellingsciencefiction.com))，并在Hacker News上详细介绍了创建它所使用的工具和流程。虽然格式化和排版使用自定义Python脚本和ConTeXt（一种文档排版系统）相对简单，但最大的挑战是获得作者的重印权。许多作者难以联系，即使在支付了初始费用后，获得许可也证明很复杂。这本选集收录了彼得·瓦茨和艾伦·迪恩·福斯特等作者的故事，其中最难获得的是奇辉翻译的故事，需要通过谷歌翻译进行沟通，并且有限的普通话能力。这次讨论引发了关于版权挑战的对话，包括版权所有者登记处和孤儿作品立法的想法。贡献者分享了他们自己关于出版以及Typst等现代排版工具的经验。Mojoe计划每年继续出版这本选集，专注于概念驱动的故事，并希望在Hacker News的推广带来的初步成功之后实现盈利。该项目突出了编程技能在独立出版中的力量以及应对版权复杂性所需的奉献精神。

原文

Last month I published Think Weirder: The Year's Best Science Fiction Ideas, a 16-story anthology featuring Greg Egan, Isabel J. Kim, Ray Nayler, Caroline M. Yoachim, and twelve other wonderful authors. The book ended up being the #1 New Release in the Short Stories Anthologies category for a short time on Amazon, outselling many other newly released short story anthologies published by the big NYC publishers with large marketing departments.

I'm not a professional publisher. I have a full-time job and two small kids, so all of this work happened after my kids went to sleep. I had to use my time judiciously, which meant creating an efficient process. Fortunately I'm a programmer, and it turns out that programming skills translate surprisingly well to book publishing. This post is about how I built a complete publishing pipeline using Python, YAML files, and LaTeX — and why you might want to do something similar if you're considering publishing a book. I know that by writing this I'll have my choices questioned by professional designers, but hopefully the software concepts will be helpful.

My initial thought: can I really do ALL of this?

When I started this project, I had some worries. Professional publishers have entire departments of specialists. How could I possibly handle all of that myself?

The answer turned out to be: build tools that automate the repetitive parts, and use simple file formats that make everything transparent and debuggable.

Step 1: Tracking stories with plain text files

The first challenge was tracking hundreds of candidate stories from different magazines. I read 391 stories published in 2024 before selecting the final 16. That's a lot of stories to keep organized.

I could have used a spreadsheet, but I went with plain YAML files instead. Here's why this worked well for me:

Git-friendly: Every decision I made was tracked in version control
Human-readable: I could open any file in a text editor and understand what I was looking at
Easy to build scripts around: I wrote several Python functions to do different kinds of metadata introspection that I'll go through

The structure looks like this:

data/
  story-progress.yaml       # Central tracking file
  markets.yaml              # Magazine metadata
  themes.yaml               # Theme occurrence tracking
  subgenres.yaml            # Subgenre tallies
stories/
  clarkesworld-magazine/
    nelson_11_24.yaml       # Individual story files
    pak_06_24.yaml
  reactor-magazine/
    larson_breathing.yaml
  ...

Each story file is pure YAML containing the full story text plus metadata:

title: "Twenty-Four Hours"
author: H.H. Pak
market: clarkesworld-magazine
url: https://clarkesworldmagazine.com/pak_06_24/
word_count: 4540
year: 2024
slug: pak_06_24
summary: ...

Not all stories have public URLs available, but that's OK because all of the fields are optional. The central story-progress.yaml tracks editorial state:

clarkesworld-magazine-nelson_11_24:
  title: "LuvHome™"
  author: Resa Nelson
  market: clarkesworld-magazine
  status: accepted  # or: not_started/relevant/rejected
  date_added: '2024-09-08T08:22:47.033192'

Step 2: A simple command-line tool

I built a small Python CLI tool (se.py) to help me navigate all this data. Since I do all this work at night after my kids go to sleep, I wanted something fast that mirrored a lot of the other work I do on the command line. The tool is simple:

python se.py —help
usage: se.py [-h] {markets,stories,relevant,decide,accepted,compile} ...

Story Evaluator CLI

positional arguments:
  {markets,stories,relevant,decide,accepted,compile}
                        Available commands
    markets             List markets
    stories             Manage stories
    relevant            List URLs for stories marked as relevant
    decide              Make accept/reject decisions on relevant stories
    accepted            Manage accepted stories
    compile             Show anthology compilation statistics

optional arguments:
  -h, —help            show this help message and exit

The compile command ended up being really useful — it gave me instant feedback on anthology size and composition:

ANTHOLOGY COMPILATION STATISTICS
============================================================
Total Stories: 16
Total Word Count: 115,093 words
Average Word Count: 7,193 words
Unique Authors: 16
Markets Represented: 4

STORIES BY MARKET:
  analog-magazine: 2 stories (12.5%)
  asimovs-magazine: 2 stories (12.5%)
  clarkesworld-magazine: 10 stories (62.5%)
  reactor-magazine: 2 stories (12.5%)

This was really helpful during the selection process. I could quickly check how far along I was toward my ~120k word goal, and make sure I hadn't accidentally included multiple stories by the same author.

Step 3: Typesetting the print book

This part surprised me the most. I initially thought I'd have to learn Adobe InDesign or pay someone to do the typesetting. But I decided to use LaTeX instead, since I had some previous experience with it (another publishing friend sent me some of his example files, and I had some academic experience). The process worked out better than expected.

I used XeLaTeX with the memoir document class. Here's what I liked about this approach:

Reproducible: I can rebuild the entire book from source in a few seconds, and I can use the same templates next year
Professional typography: LaTeX handles ligatures, kerning, and line breaking better than I could manually
Custom fonts: I used Crimson Pro for body text and Rajdhani for titles
Again, version control that I'm used to: The entire book is just text files in Git

The main parts of the master file for the book are really simple:

\documentclass[final,11pt,twoside]{memoir}
\usepackage{compelling}

\begin{document}
\begin{frontmatter}
  \include{title}
  \tableofcontents
\end{frontmatter}

\begin{mainmatter}
  \include{introduction}
  \include{death-and-the-gorgon}
  \include{the-best-version-of-yourself}
  % ... 14 more stories
  \include{acknowledgements}
\end{mainmatter}
\end{document}

All the formatting rules live in compelling.sty, a custom style package. Here's a link to the full, messy file. Some highlights:

% 6x9 inch trade paperback size
\setstocksize{9in}{6in}
\settrimmedsize{9in}{6in}{*}

% Margins
\setlrmarginsandblock{1.00in}{0.75in}{*}
\setulmarginsandblock{0.75in}{0.75in}{*}

% Typography nerding
\usepackage[final,protrusion=true,factor=1125,
            stretch=70,shrink=70]{microtype}

% Custom fonts loaded from local files
\setromanfont[
  Ligatures=TeX,
  Path=./Crimson_Pro/static/,
  UprightFont=CrimsonPro-Regular,
  BoldFont=CrimsonPro-Bold,
  ItalicFont=CrimsonPro-Italic,
  BoldItalicFont=CrimsonPro-BoldItalic
]{Crimson Pro}


\setsansfont[
  Path=./Rajdhani/,
  UprightFont=Rajdhani-Bold,
  BoldFont=Rajdhani-Bold,
  ItalicFont=Rajdhani-Bold,
  BoldItalicFont=Rajdhani-Bold
]{Rajdhani}

% Chinese font family for CJK characters
\newfontfamily\chinesefont{PingFang SC}

The microtype package does a lot of subtle work with character spacing and line breaking that makes the text look professionally typeset.

I wanted story titles in bold sans-serif with author names underneath in a lighter gray. Here's how I set that up:

\renewcommand{\chapter}[2]{
    \pagestyle{DefaultStyle}
    \stdchapter*{
        \sffamily
        \LARGE 
        \textbf{\MakeUppercase{#1}}
        \\ 
        \large 
        \color{dark-gray} 
        {\MakeUppercase{#2}}
    }
    \addcontentsline{toc}{chapter}{
        \protect\parbox[t]{\dimexpr\textwidth-3em}{
            \sffamily#1
            \\ 
            \protect\small
            \protect\color{gray}
            \protect\textit{#2}
        }
    }
    \def\leftmark{#1}
    \def\rightmark{#2}
}

This redefines the chapter command to take two arguments, the title and byline, and sets up both the chapter formatting, TOC formatting, and makes sure that the title and byline are printed in the headers on alternating pages.

Now every story file just says:

\chapter{Death and the Gorgon}{by Greg Egan}
[story content]

Most authors send me stories as HTML, PDF, or word, so I needed a way to convert them to LaTeX. I wrote a simple Python script to do this, which saved me a huge amount of manual formatting work.

Step 4: Creating the ebook

Print was one thing, but I also needed an ebook. This turned out to be easier than I expected because I could reuse all the LaTeX source I'd already created.

I used Pandoc to convert from LaTeX to EPUB:

# Convert LaTeX to EPUB
pandoc 2025.tex -o Think_Weirder_2025.epub \
  —toc \
  —epub-cover-image=cover_optimized.jpg \
  —css=epub-style.css \
  —metadata title="Think Weirder" \
  —metadata author="Edited by Joe Stech"

Pandoc's default table of contents only showed story titles. But I wanted author names too, like you see in print anthologies. EPUBs are just zipped collections of XHTML files, so I wrote a small post-processing script:

def modify_toc(nav_content, authors):
    """Add author bylines to TOC entries."""
    pattern = r'<a href="([^"]+)">([^<]+)</a>'

    def add_author(match):
        href, title = match.group(1), match.group(2)
        chapter_id = extract_id_from_href(href)

        if chapter_id in authors:
            author = authors[chapter_id]
            return f'<a href="{href}">{title}<br />\n' \
                   f'<em>{author}</em></a>'
        return match.group(0)

    return re.sub(pattern, add_author, nav_content)

The script unzips the EPUB, finds the navigation file, adds author bylines, and rezips everything. Now the ebook table of contents matches the print version.

What I learned

The whole process took longer than I expected — many months of night work. The simple software I wrote really made it a feasible one-person project though, and motivates me to go through the whole process again next year.

Staying organized is crucial. When hundreds of stories are involved, it's easy to forget details, so using se.py to save metadata in the moment that could be sliced and diced later was so important.

Reproducible builds were a lifesaver. I made changes to the book layout right up until the week before publication. Because I could rebuild the entire book in seconds, and everything was backed up in git, I could experiment freely without worrying about breaking things.

Simple file formats made me comfortable. When something went wrong, I could always open a YAML file or look at the LaTeX source and understand what was happening. I never hit a point where the tools were a black box.

I didn't need to understand everything up front. I learned LaTeX details as I went (arguably I still don't really understand LaTeX). Same with Pandoc. I got something basic working first, then incrementally improved it.

Can you do this too?

If you're thinking about publishing a book — whether it's an anthology, a novel, or a collection of technical writing — I think this approach is worth considering. There's something motivating about having a detailed understanding of every step in the production process. If you have questions feel free to reach out, I love talking about this hobby! You can email me at [email protected].

And if you enjoy concept-driven science fiction that is heavy on novel ideas, check out Think Weirder!

我用来创作科幻小说集的代码和开源工具。 The code and open-source tools I used to produce a science fiction anthology

我用来创作科幻小说集的代码和开源工具。
The code and open-source tools I used to produce a science fiction anthology