Tutorial — 30 min

Formal Methods

Understanding Software Through Mathematical Eyes
Ferhat Erata
CFG DFG PDG Slicing Alloy Hypothesis KLEE
code → graph → analysis → understanding

The Problem

We write code. But do we understand it?

"Take one step back. What is your mental image of software?"

Roadmap

Six lenses in 30 minutes

TimeTopicKey Question
5 minControl Flow Graph (CFG)What paths can execution take?
5 minData Flow Graph (DFG)Where do values come from and go to?
5 minProgram Dependence Graph (PDG)What depends on what? → Slicing
5 minProperty-Based Testing (Hypothesis)Does it hold for many random inputs?
5 minConcolic Testing (KLEE)Can we explore every path?
5 minFormal Specification (Alloy)What should the system do?

The Big Picture

A stack of mathematical lenses

graph TD
    SRC["Source Code<br>what humans write  "] --> CFG
    CFG["CFG<br>what paths exist?  "] --> DFG
    DFG["DFG<br>where do values flow?  "] --> PDG
    PDG["PDG<br>what depends on what?  "] --> SLC["Slicing<br>minimal relevant subset  "]

    SRC --> SPEC["Formal Spec — Alloy<br>is the design consistent?  "]
    SRC --> PROP["Property Testing — Hypothesis<br>do invariants hold?  "]
    SRC --> CONC["Concolic Testing — KLEE<br>what does every path do?  "]

    style SRC fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
    style CFG fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style DFG fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style PDG fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style SLC fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style SPEC fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
    style PROP fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
    style CONC fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
          
All Graph Analysis Alloy Hypothesis KLEE
Graph-Based Analysis
CFGDFGPDGSlicing

Build the graph. Traverse the graph. Answer the question.
Each layer adds a new kind of edge.
Specification & Testing
Alloy — check designs before code exists
Hypothesis — check properties across many random inputs
KLEE — systematically explore every path
We'll go through each of these one by one — then return to this picture at the end.

The Compiler Pipeline

Where do these representations live?

graph LR
    SRC["Source Code<br>Python, C, Java, ...  "] --> LEX["Lexer /<br>Parser  "]
    LEX --> AST["AST  "]
    AST --> SEM["Semantic<br>Analysis  "]
    SEM --> IR["IR / SSA<br>LLVM IR, GCC GIMPLE  "]
    IR --> OPT["Optimizer  "]
    OPT --> CG["Code Gen  "]
    CG --> BIN["Machine Code<br>x86, ARM, WASM  "]

    style SRC fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
    style LEX fill:#1a1a2e,stroke:#2a2a3a,color:#8888a0
    style AST fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style SEM fill:#1a1a2e,stroke:#2a2a3a,color:#8888a0
    style IR fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style OPT fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
    style CG fill:#1a1a2e,stroke:#2a2a3a,color:#8888a0
    style BIN fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
      
Front-End
Source → AST
(Abstract Syntax Tree)

Lexing, parsing, type checking.
The AST is the first structured representation of your code.

Tools: tree-sitter, clang, javac
Middle-End (IR)
AST → CFG + SSA

The IR is where analysis happens:
CFG = control flow structure
DFG = data flow (def-use chains)
PDG = dependence → slicing

This is what we'll study today.
SSA Form
Static Single Assignment

Every variable is assigned exactly once.
Multiple assignments → distinct names.

φ nodes merge values at join points:
%x = φ(v₁, v₂)

Makes def-use chains explicit — the DFG is built into the IR.
Back-End
IR → Machine Code

Register allocation, instruction selection, scheduling.
The optimizer transforms the IR using the same graphs we'll see.

Tools: LLVM, GCC, V8 TurboFan

Control Flow Graph

What paths can execution take?

def fee(age, dist): if age < 25: rate = 3 else: rate = 2 if dist > 100: surcharge = 20 else: surcharge = 0 return dist * rate + surcharge
Key Concepts
Nodes = basic blocks (straight-line code)
Edges = possible transfers of control
Path = one possible execution
4 paths = 2 binary decisions = 2² leaves
graph TD
    A["entry: age, dist  "] --> B{"age < 25 ?"}
    B -- T --> C["rate = 3  "]
    B -- F --> D["rate = 2  "]
    C --> E{"dist > 100 ?"}
    D --> E
    E -- T --> F["surcharge = 20  "]
    E -- F --> G["surcharge = 0  "]
    F --> H["return dist*rate<br>+ surcharge  "]
    G --> H

    style A fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
    style B fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style C fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style D fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style E fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style F fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style G fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style H fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
        

CFG with Nested Loops

Back-edges reveal loop structure

def matrix_search(matrix, target): for i in range(len(matrix)): for j in range(len(matrix[i])): if matrix[i][j] == target: return (i, j) return (-1, -1)
All Outer Loop Inner Loop
Nested Loop Structure
Outer back-edge: G → B (outer loop)
Inner back-edge: F → C (inner loop)
Outer header: B — dominates everything inside
Inner header: C — dominates D, E, F
Early exit: E breaks out of both loops
Why It Matters
Loop nesting depth drives complexity analysis
Dominance tree: B dom C dom D dom E
Loop-invariant code motion: hoist from inner
to outer, or out of both loops entirely
graph TD
    A["A: entry  "] --> B{"B: i < rows ?"}
    B -- F --> H["H: return (-1,-1)  "]
    B -- T --> C{"C: j < cols ?"}
    C -- F --> G["G: i += 1  "]
    C -- T --> D{"D: matrix[i][j]<br>== target ?"}
    D -- T --> E["E: return (i,j)  "]
    D -- F --> F["F: j += 1  "]
    F -->|"inner back-edge"| C
    G -->|"outer back-edge"| B

    style A fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
    style B fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style C fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
    style D fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style E fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style F fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
    style G fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style H fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
        

LLVM IR

The CFG in the compiler's representation

Source (Python)
def fee(age, dist): if age < 25: rate = 3 else: rate = 2 if dist > 100: surcharge = 20 else: surcharge = 0 return dist * rate + surcharge
Source (C)
int fee(int age, int dist) { int rate; if (age < 25) rate = 3; else rate = 2; int surcharge; if (dist > 100) surcharge = 20; else surcharge = 0; return dist * rate + surcharge; }
Different frontends, same IR → same analyses
LLVM IR
define i32 @fee(i32 %age, i32 %dist) { entry: %cmp1 = icmp slt i32 %age, 25 br i1 %cmp1, label %young, label %older young: br label %check_dist older: br label %check_dist check_dist: %rate = phi i32 [3, %young], [2, %older] %cmp2 = icmp sgt i32 %dist, 100 br i1 %cmp2, label %far, label %near far: br label %compute near: br label %compute compute: %sur = phi i32 [20, %far], [0, %near] %base = mul i32 %dist, %rate %total = add i32 %base, %sur ret i32 %total }
What to Notice
Basic Blocks = CFG Nodes
Each label: is a basic block.
Straight-line instructions, ending
with a br (branch) or ret.
SSA = Explicit Data Flow
Every %var is assigned exactly once.
phi nodes merge values from different incoming edges — the DFG is built into the IR.
Two Independent phi Nodes
%rate merges from young/older
%sur merges from far/near
Two independent data flows
that converge in %compute.
The CFG and DFG aren't abstractions you draw on a whiteboard — they're the actual compiler IR.

Data Flow Graph

Where do values come from and go to?

def fee(age, dist): if age < 25: rate = 3 else: rate = 2 if dist > 100: surcharge = 20 else: surcharge = 0 return dist * rate + surcharge
Key Concepts
DEF = variable is assigned a value
USE = variable is read
Def-use chain = which DEFs reach a USE?
Reaching defs = at return, rate could be 3 OR 2
Why it matters:
Constant propagation — replace var with known value
Dead code elimination — DEF with no USE
Uninitialized variable — USE with no DEF
Two independent def-use chains visible here!
graph LR
    subgraph age_rate ["age → rate channel"]
        DA["DEF(age)<br>@ param  "] -.->|def-use| UA["USE(age)<br>@ age < 25  "]
        DR1["DEF(rate)<br>@ rate = 3  "] -.->|def-use| UR["USE(rate)<br>@ return  "]
        DR2["DEF(rate)<br>@ rate = 2  "] -.->|def-use| UR
    end
    subgraph dist_surcharge ["dist → surcharge channel"]
        DD["DEF(dist)<br>@ param  "] -.->|def-use| UD1["USE(dist)<br>@ dist > 100  "]
        DD -.->|def-use| UD2["USE(dist)<br>@ return  "]
        DS1["DEF(surcharge)<br>@ surcharge = 20  "] -.->|def-use| US["USE(surcharge)<br>@ return  "]
        DS2["DEF(surcharge)<br>@ surcharge = 0  "] -.->|def-use| US
    end

    style DA fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
    style DR1 fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
    style DR2 fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
    style DD fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
    style DS1 fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
    style DS2 fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
    style UA fill:#13131a,stroke:#2a2a3a,color:#8888a0
    style UR fill:#13131a,stroke:#ffaa44,color:#e8e8f0
    style UD1 fill:#13131a,stroke:#2a2a3a,color:#8888a0
    style UD2 fill:#13131a,stroke:#ffaa44,color:#e8e8f0
    style US fill:#13131a,stroke:#ffaa44,color:#e8e8f0
      
All age → rate dist → surcharge return (all USEs)

Program Dependence Graph & Slicing

What depends on what?

graph TD
    Entry["entry  "] ==>|ctrl| P1{"age < 25 ?"}
    Entry ==>|ctrl| P2{"dist > 100 ?"}
    Entry ==>|ctrl| RET["return<br>dist*rate+surcharge  "]
    P1 ==>|ctrl| S1["rate = 3  "]
    P1 ==>|ctrl| S2["rate = 2  "]
    P2 ==>|ctrl| S3["surcharge = 20  "]
    P2 ==>|ctrl| S4["surcharge = 0  "]
    Entry -.->|data| P1
    Entry -.->|data| P2
    Entry -.->|data| RET
    S1 -.->|data| RET
    S2 -.->|data| RET
    S3 -.->|data| RET
    S4 -.->|data| RET

    style Entry fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
    style P1 fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style P2 fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style S1 fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style S2 fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style S3 fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style S4 fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style RET fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
          
All Backward: rate = 3 Backward: surcharge = 20 Forward: age < 25? Forward: dist > 100?
PDG = Control deps + Data deps
Solid ⇒ = control dependence
Dashed ‐‐> = data dependence
Program Slicing
"What is the minimal subset of the program that could affect variable v at line n?"

Backward slice: trace deps backwards — what contributed to this value?
Forward slice: trace deps forward — what does this statement affect?

Click the buttons below the graph to visualize slices.
Extensions
SDG (System Dependence Graph) — PDG + interprocedural edges (call/return) for whole-program slicing

Property-Based Testing

Hypothesis: Does it hold for many inputs?

from hypothesis import given, strategies as st # Property: fee is always non-negative @given(st.integers(0, 120), st.integers(0, 1000)) def test_fee_non_negative(age, dist): assert fee(age, dist) >= 0 # Property: young drivers always pay more @given(st.integers(0, 1000)) def test_young_pay_more(dist): young = fee(20, dist) older = fee(30, dist) assert young >= older # Property: fee is linear in distance @given(st.integers(0, 120), st.integers(1, 998)) def test_fee_linear(age, dist): step = fee(age, dist + 1) - fee(age, dist) prev = fee(age, dist) - fee(age, dist - 1) assert step == prev # Hypothesis generates 100s of inputs # and SHRINKS failures to minimal cases # # Found: fee(20,101)−fee(20,100) = 23 # but fee(20,100)−fee(20, 99) = 3 # surcharge jump breaks linearity!
Key Idea
You write properties (invariants), not examples.
The framework generates inputs — including edge cases you'd never think of.
Failing inputs are shrunk to the simplest counterexample.
Live Generation ● running
The gap it bridges:

Example tests "works on my cases"

Property tests "works for many random inputs"

Formal proof "works for all inputs, proven"

Concolic Testing

KLEE: Concrete + Symbolic = every path

def fee(age, dist): if age < 25: rate = 3 else: rate = 2 if dist > 100: surcharge = 20 else: surcharge = 0 return dist*rate+surcharge
graph TD
    R["fee(age, dist)  "] --> A{"age < 25 ?"}
    A -- "T: age<25" --> B1{"dist > 100 ?"}
    A -- "F: age≥25" --> B2{"dist > 100 ?"}
    B1 -- T --> P1["path 1:<br>3×dist + 20  "]
    B1 -- F --> P2["path 2:<br>3×dist  "]
    B2 -- T --> P3["path 3:<br>2×dist + 20  "]
    B2 -- F --> P4["path 4:<br>2×dist  "]

    style R fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
    style A fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style B1 fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style B2 fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style P1 fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style P2 fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style P3 fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style P4 fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
          
Overview 1. Concrete Run 2. Collect π 3. Negate 4. New Path 5. All Explored
How Concolic Works
1. Run concretely: $age{=}30,\; dist{=}50$
   → takes path F, F

2. Collect path constraint:
   $(age \ge 25) \;\land\; (dist \le 100)$

3. Negate last conjunct:
   $(age \ge 25) \;\land\; \color{#ff6c8c}{(dist > 100)}$

4. Solve → $age{=}30,\; dist{=}150$

5. Run with new input → new path

6. Repeat until all 4 paths explored
KLEE found bugs in all UNIX coreutils
(cat, echo, ls, ...) that were there for decades.

Formal Specification (Bounded Model Checking)

Alloy: What should the system do?

-- A simple access control model sig User {} sig Resource {} sig Permission { user: one User, resource: one Resource, level: one Level } enum Level { Read, Write, Admin } -- No user can have Write without Read fact escalation { all p: Permission | p.level = Write implies some q: Permission | q.user = p.user and q.resource = p.resource and q.level = Read } -- Can a user get Admin on everything? run find_omnipotent { some u: User | all r: Resource | some p: Permission | p.user = u and p.resource = r and p.level = Admin } for 4
Overview 1. Signatures 2. Constraints 3. Analysis 4. Counterexample
Key Idea
Declarative: say what you want, not how

The Alloy Analyzer exhaustively checks all instances up to a bound.

Finds counterexamples automatically — "here's a state your spec allows but you didn't intend."
Used for: Protocol design, API modeling, architecture validation, security policy checking

The Big Picture

A stack of mathematical lenses

graph TD
    SRC["Source Code<br>what humans write  "] --> CFG
    CFG["CFG<br>what paths exist?  "] --> DFG
    DFG["DFG<br>where do values flow?  "] --> PDG
    PDG["PDG<br>what depends on what?  "] --> SLC["Slicing<br>minimal relevant subset  "]

    SRC --> SPEC["Formal Spec — Alloy<br>is the design consistent?  "]
    SRC --> PROP["Property Testing — Hypothesis<br>do invariants hold?  "]
    SRC --> CONC["Concolic Testing — KLEE<br>what does every path do?  "]

    style SRC fill:#1a1a2e,stroke:#6c8cff,color:#e8e8f0
    style CFG fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style DFG fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style PDG fill:#1a1a2e,stroke:#4cde80,color:#e8e8f0
    style SLC fill:#1a1a2e,stroke:#ffaa44,color:#e8e8f0
    style SPEC fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
    style PROP fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
    style CONC fill:#1a1a2e,stroke:#ff6c8c,color:#e8e8f0
          
All Graph Analysis Alloy Hypothesis KLEE
Graph-Based Analysis
CFGDFGPDGSlicing

Build the graph. Traverse the graph. Answer the question.
Each layer adds a new kind of edge.
Specification & Testing
Alloy — check designs before code exists
Hypothesis — check properties across many random inputs
KLEE — systematically explore every path
Mathematical models make the invisible visible.
Tools automate what humans can't do by hand.

Why Now?

Formal Methods × Large Language Models

LLM Inference — Context Window
LLMs have a finite context window. You can't feed an entire codebase into a prompt.

Program slicing gives the principled answer: the backward slice is exactly the minimal code the model needs to reason about a bug or feature.

Slicing turns a whole-program problem into a focused, context-window-sized problem.
LLM Inference — Structured Generation
CFGs and formal specs can constrain LLM output at decode time — ensuring generated code follows valid control flow or satisfies type/API contracts.
Training — Data Quality
Property-based testing can validate training data at scale — checking that code examples satisfy invariants before they enter the training set.

Garbage in, garbage out — formal properties are a filter.
Training — Reward & Verification
Formal specifications can serve as automatic reward signals for RLHF — did the generated code satisfy the spec?

KLEE and Alloy can verify LLM-generated code without running user tests.
Formal methods aren't replaced by LLMs — they're the missing verification layer that makes LLM-generated code trustworthy.

Takeaway

We don't just write code.

We build mathematical models of code —

and let tools reason for us.

"The mathematical models are already under the hood. Now you know what's there."
← Writing