通过形式化模型(IR 和 Alloy 以及黄金标准)将 COBOL 转换为 Kotlin
COBOL to Kotlin via Formal Models (IR and Alloy and Golden Master)

原始链接: https://marcoeg.medium.com/from-cobol-to-kotlin-795920b1f371

## 使用中间表示 (IR) 现代化遗留系统 大规模系统现代化受益于创建详细的中间表示 (IR)——系统逻辑的概念地图,而不仅仅是编译器步骤。该 IR 分层构建,从基本结构(记录布局、文件 I/O)到语义含义(如计算新余额的变换代数)逐步深入。 每一层都增加细节,同时不丢失先前的上下文,即使不完整,也能提供有价值的文档、依赖关系图和现代化路线图。提供的示例展示了将 COBOL 解析为基于 JSON 的 IR,捕获数据结构和业务逻辑(如存款/取款规则)。 然后,该 IR 有助于确定性、可测试地重写为 Kotlin 等现代语言,确保算术安全性(处理遗留 `COMP-3` 数据类型)。此外,诸如 Alloy、TLA+ 和 Z3 等形式化模型用于验证系统的静态结构、时间行为和算术一致性,从而保证现代化后的系统行为与原始系统完全相同。

## COBOL 到 Kotlin 通过形式化方法 - 摘要 一个 Hacker News 讨论围绕着一篇 Medium 文章([marcoeg.medium.com](https://marcoeg.medium.com))中描述的项目,该项目旨在通过形式化验证来现代化 COBOL 代码。作者 marcoeg 通过中间表示 (IR) 和 Alloy 形式化模型将一个小的 COBOL 批处理程序翻译成 Kotlin,并通过比较与原始 COBOL 程序输出的 SHA-256 校验和来证明等价性。 讨论强调了 COBOL 持续存在的必要性,因为它可靠且替换成本/风险高昂,尽管它已经过时。评论员争论 COBOL 是否本质上比其他语言更可靠,指出它历史上缺乏动态内存分配是一个促成因素,但也承认可靠性源于生产系统中数十年的错误修复。 一些用户质疑简单语法翻译的价值,因为它没有解决围绕传统 COBOL 系统(硬件、数据库、作业调度器)的更广泛生态系统问题。另一些人指出拥有与生产系统匹配的准确 COBOL 源代码的重要性。作者欢迎反馈,特别是来自那些具有大规模传统改造经验的人,并分享了项目 GitHub 仓库的链接 ([https://github.com/marcoeg/cobol-modernization-playbook](https://github.com/marcoeg/cobol-modernization-playbook))。
相关文章

原文

The Strategy: From Structure to Semantics

Large-scale modernization begins with a structured intermediate representation (IR). This is not a compiler artifact but rather a conceptual map of the system’s meaning.

The Layered IR

Layer Description Syntactic IR Parses COBOL into paragraphs, sections, data declarations, and copybooks Structural IR Captures record layouts, field types, and file I/O contracts Control-Flow IR (CFG) Builds a flow graph of PERFORM, EVALUATE, and IF branches Effects IR Tracks reads/writes to files and variables Semantic IR Expresses the algebra of transformations — e.g. “balance’ = balance – amount”

Each layer adds fidelity without losing context. Even if we stop halfway, the resulting model already serves as documentation, dependency graph, and modernization map.

The IR snippet below illustrates how the parser captures records, encodings, and posting rules.

{
"program": "DAILYPOST",
"files": [
{
"name": "ACCT-FILE",
"record": "AccountRec",
"layout": [
{ "name": "ACCT_ID", "kind": "char", "len": 12, "offset": 0 },
{ "name": "CUST_ID", "kind": "char", "len": 12, "offset": 12 },
{ "name": "PRODUCT_CODE", "kind": "char", "len": 4, "offset": 24 },
{ "name": "ACCT_STATUS", "kind": "char", "len": 1, "offset": 28 },
{ "name": "CURR_BAL", "kind": "comp3", "digits": 13, "scale": 2, "offset": 29 },
{ "name": "OVERDRAFT_LIMIT", "kind": "comp3", "digits": 11, "scale": 2, "offset": 36 }
]
},
{
"name": "TXN-FILE",
"record": "TxnRec",
"layout": [
{ "name": "ACCT_ID", "kind": "char", "len": 12, "offset": 0 },
{ "name": "TXN_ID", "kind": "char", "len": 16, "offset": 12 },
{ "name": "TXN_CODE", "kind": "char", "len": 4, "offset": 28 },
{ "name": "TXN_AMT", "kind": "comp3","digits": 11, "scale": 2, "offset": 32 },
{ "name": "TXN_TS", "kind": "num", "len": 14, "offset": 37 }
]
}
],
"logic": [
{
"when": "yyyymmdd(TXN_TS) == TODAY",
"switch": "TXN_CODE",
"cases": {
"DEPO": [{ "op": "Add", "dst": "CURR_BAL", "src": "TXN_AMT" }],
"WDRW": [
{ "op": "Compute", "dst": "NEW_BAL", "expr": "CURR_BAL - TXN_AMT" },
{ "op": "Guard", "cond": "NEW_BAL >= -OVERDRAFT_LIMIT",
"then": [{ "op": "Move", "dst": "CURR_BAL", "src": "NEW_BAL" }],
"else": [{ "op": "Emit", "stream": "EXC-FILE", "rec": "TxnRec" }]
}
]
}
}
]
}

The full IR is in parsing/out/ir.json. Note that COMP-3 is captured explicitly (digits, scale) so arithmetic is integer-safe in codegen.

The same semantics compile into a deterministic, testable modern implementation in Kotlin.

// rewrite/kotlin/src/main/kotlin/banking/Posting.kt
package banking

data class AccountRec(
val acctId: String,
val custId: String,
val productCode: String,
val acctStatus: String,
var currBalCents: Long, // integer cents to mirror COMP-3
val overdraftLimitCents: Long
)

data class TxnRec(
val acctId: String,
val txnId: String,
val code: String, // "DEPO", "WDRW", ...
val amountCents: Long, // integer cents (no floats)
val tsYyyyMmDdHhMmSs: Long
)

fun runPosting(
accounts: MutableList<AccountRec>,
txns: List<TxnRec>,
todayYyyyMmDd: Long
): Pair<List<AccountRec>, List<TxnRec>> {

val (todayTxns, otherTxns) = txns.partition { it.tsYyyyMmDdHhMmSs / 1_000_000L == todayYyyyMmDd }
require(otherTxns.isEmpty()) { "Runner expects only today's txns (for golden-master parity)" }

val exceptions = mutableListOf<TxnRec>()
val index = accounts.associateBy { it.acctId }.toMutableMap()

// Stable order to guarantee deterministic outputs
for (t in todayTxns.sortedWith(compareBy<TxnRec> { it.acctId }.thenBy { it.txnId })) {
val a = index[t.acctId] ?: run { exceptions += t; continue }
when (t.code) {
"DEPO" -> a.currBalCents += t.amountCents
"WDRW" -> {
val newBal = a.currBalCents - t.amountCents
if (newBal >= -a.overdraftLimitCents) a.currBalCents = newBal else exceptions += t
}
else -> exceptions += t
}
}
return accounts.sortedBy { it.acctId } to exceptions
}

The Role of Formal Models

⚙️ Alloy

Alloy models static structure — relationships among records, constraints, and invariants.

Example: “Every transaction belongs to exactly one account.”
This is captured in formal/alloy/banking.als and verified using the Alloy Analyzer.

⏱ TLA+

TLA+ expresses temporal behavior — perfect for COBOL’s batch workflows.
It allows us to specify conditions like: “After all transactions are processed, every account balance must be updated exactly once.”

🧮 SMT / Z3

Z3 (Satisfiability Modulo Theories) checks arithmetic and logical consistency. It’s ideal for proving that rewritten arithmetic (for instance, interest or overdraft rules) is equivalent at a symbolic level, not just with sample data.

联系我们 contact @ memedata.com