The Strategy: From Structure to Semantics
Large-scale modernization begins with a structured intermediate representation (IR). This is not a compiler artifact but rather a conceptual map of the system’s meaning.
The Layered IR
Layer Description Syntactic IR Parses COBOL into paragraphs, sections, data declarations, and copybooks Structural IR Captures record layouts, field types, and file I/O contracts Control-Flow IR (CFG) Builds a flow graph of PERFORM, EVALUATE, and IF branches Effects IR Tracks reads/writes to files and variables Semantic IR Expresses the algebra of transformations — e.g. “balance’ = balance – amount”
Each layer adds fidelity without losing context. Even if we stop halfway, the resulting model already serves as documentation, dependency graph, and modernization map.
The IR snippet below illustrates how the parser captures records, encodings, and posting rules.
{
"program": "DAILYPOST",
"files": [
{
"name": "ACCT-FILE",
"record": "AccountRec",
"layout": [
{ "name": "ACCT_ID", "kind": "char", "len": 12, "offset": 0 },
{ "name": "CUST_ID", "kind": "char", "len": 12, "offset": 12 },
{ "name": "PRODUCT_CODE", "kind": "char", "len": 4, "offset": 24 },
{ "name": "ACCT_STATUS", "kind": "char", "len": 1, "offset": 28 },
{ "name": "CURR_BAL", "kind": "comp3", "digits": 13, "scale": 2, "offset": 29 },
{ "name": "OVERDRAFT_LIMIT", "kind": "comp3", "digits": 11, "scale": 2, "offset": 36 }
]
},
{
"name": "TXN-FILE",
"record": "TxnRec",
"layout": [
{ "name": "ACCT_ID", "kind": "char", "len": 12, "offset": 0 },
{ "name": "TXN_ID", "kind": "char", "len": 16, "offset": 12 },
{ "name": "TXN_CODE", "kind": "char", "len": 4, "offset": 28 },
{ "name": "TXN_AMT", "kind": "comp3","digits": 11, "scale": 2, "offset": 32 },
{ "name": "TXN_TS", "kind": "num", "len": 14, "offset": 37 }
]
}
],
"logic": [
{
"when": "yyyymmdd(TXN_TS) == TODAY",
"switch": "TXN_CODE",
"cases": {
"DEPO": [{ "op": "Add", "dst": "CURR_BAL", "src": "TXN_AMT" }],
"WDRW": [
{ "op": "Compute", "dst": "NEW_BAL", "expr": "CURR_BAL - TXN_AMT" },
{ "op": "Guard", "cond": "NEW_BAL >= -OVERDRAFT_LIMIT",
"then": [{ "op": "Move", "dst": "CURR_BAL", "src": "NEW_BAL" }],
"else": [{ "op": "Emit", "stream": "EXC-FILE", "rec": "TxnRec" }]
}
]
}
}
]
}The full IR is in parsing/out/ir.json. Note that COMP-3 is captured explicitly (digits, scale) so arithmetic is integer-safe in codegen.
The same semantics compile into a deterministic, testable modern implementation in Kotlin.
// rewrite/kotlin/src/main/kotlin/banking/Posting.kt
package bankingdata class AccountRec(
val acctId: String,
val custId: String,
val productCode: String,
val acctStatus: String,
var currBalCents: Long, // integer cents to mirror COMP-3
val overdraftLimitCents: Long
)
data class TxnRec(
val acctId: String,
val txnId: String,
val code: String, // "DEPO", "WDRW", ...
val amountCents: Long, // integer cents (no floats)
val tsYyyyMmDdHhMmSs: Long
)
fun runPosting(
accounts: MutableList<AccountRec>,
txns: List<TxnRec>,
todayYyyyMmDd: Long
): Pair<List<AccountRec>, List<TxnRec>> {
val (todayTxns, otherTxns) = txns.partition { it.tsYyyyMmDdHhMmSs / 1_000_000L == todayYyyyMmDd }
require(otherTxns.isEmpty()) { "Runner expects only today's txns (for golden-master parity)" }
val exceptions = mutableListOf<TxnRec>()
val index = accounts.associateBy { it.acctId }.toMutableMap()
// Stable order to guarantee deterministic outputs
for (t in todayTxns.sortedWith(compareBy<TxnRec> { it.acctId }.thenBy { it.txnId })) {
val a = index[t.acctId] ?: run { exceptions += t; continue }
when (t.code) {
"DEPO" -> a.currBalCents += t.amountCents
"WDRW" -> {
val newBal = a.currBalCents - t.amountCents
if (newBal >= -a.overdraftLimitCents) a.currBalCents = newBal else exceptions += t
}
else -> exceptions += t
}
}
return accounts.sortedBy { it.acctId } to exceptions
}
The Role of Formal Models
⚙️ Alloy
Alloy models static structure — relationships among records, constraints, and invariants.
Example: “Every transaction belongs to exactly one account.”
This is captured in formal/alloy/banking.als and verified using the Alloy Analyzer.
⏱ TLA+
TLA+ expresses temporal behavior — perfect for COBOL’s batch workflows.
It allows us to specify conditions like: “After all transactions are processed, every account balance must be updated exactly once.”
🧮 SMT / Z3
Z3 (Satisfiability Modulo Theories) checks arithmetic and logical consistency. It’s ideal for proving that rewritten arithmetic (for instance, interest or overdraft rules) is equivalent at a symbolic level, not just with sample data.