Matching Payments to Bank Statements: Where It Breaks
A technical look at payment matching: connecting to a bank API, normalization, matching rules, fuzzy matching without a variable symbol, idempotence, and reconciliation. Real edge cases.
Matching incoming payments to invoices sounds like a trivial task. A payment comes in, you find the invoice by its variable symbol, you match them. As long as payers pay with a variable symbol and send exact amounts, it really is simple. Off-the-shelf accounting SaaS handles this, and there is no reason to build anything of your own.
The problem is that reality does not hold to the variable symbol. Payers enter the wrong number, merge three invoices into one payment, send a partial payment, pay in another currency, or write only the company name in the message. At that moment, deterministic matching by variable symbol stops being enough. This article is about where exactly it breaks and what to do about it from a technical point of view.
Connecting to a Bank API: a Statement Is Not an Event
The first decision is where the data comes from. A bank API typically gives you statements — a batch of transactions for a period, not a real-time stream of events. That has two consequences.
First, you will see the same transaction more than once. If you download today's statement in the morning and again in the afternoon, the morning transactions will be there again. The matching pipeline has to account for this from the start — otherwise you match one payment twice.
Second, transactions do not have the same shape across banks. One bank puts the variable symbol in a dedicated field, another hides it in the message text, a third does not have it at all and you only get remittanceInformation. The payer's IBAN is sometimes there, sometimes not. The booking date and the value date differ. That is why the first layer of the pipeline is not matching but normalization — converting a statement from a specific bank into a single internal transaction shape: amount, currency, date, counterparty, IBAN, structured symbol (if present), and raw text. Only on top of the normalized shape can you match reasonably.
Matching Rules: From Certain to Uncertain
Matching is not one algorithm, it is a cascade of rules ordered from the most certain to the least certain. Each rule either matches or passes on.
- Variable symbol + matching amount. The strongest signal. When the variable symbol corresponds to an invoice and the amount matches to the koruna, it is matched automatically and done. This is where most ordinary payments end.
- Variable symbol, but the amount does not match. The symbol points to an invoice, but less or more came in. That is not a matching error — it is a partial payment, an overpayment, or a deducted fee. You can match it, but the invoice status stays "partially paid", not "paid".
- No variable symbol. Here the deterministic rule fails and fuzzy matching takes over.
The key is not to release uncertain rules before the certain ones are exhausted. Once you start matching by payer name, you risk a false positive. The order of the cascade protects you from overwriting a match you could have found firmly with a loose match.
Fuzzy Matching: a Score, Not a Guess
When the variable symbol is missing, you match by a combination of weaker signals. None of them is enough on its own, but together they give reasonable confidence:
- Amount — an exact match, or a match within tolerance (a fee, an exchange-rate difference).
- Payer name — a fuzzy comparison against the customer on the invoice, because "ACME s.r.o." and "ACME sro" and "Acme" are still the same company.
- Payer IBAN — if we have matched a payment from this counterparty before, that is a strong signal.
- Date — a payment near the due date is more likely than a payment half a year old.
- Message text — sometimes it has the invoice number, sometimes a name, sometimes nothing.
From these signals a score is computed for each candidate. Above the upper threshold it is matched automatically. Below the lower threshold it is not matched at all. And between the thresholds — the band where the trust in the whole system is decided — the transaction goes into a manual queue with a suggestion. The accountant does not get an empty field but "this payment probably belongs to this invoice, confirm or correct". That is the difference between a tool that saves time and a black box no one trusts.
Here is the honest boundary: this is not ML research. It is engineering scoring over data the company actually has. Its strength does not come from a model but from the signals being well normalized and the thresholds being set conservatively.
Edge Cases That Break Naive Matching
A few concrete situations where a "it works" pipeline fails:
- Merged payment. A payer pays three invoices with one amount. You have to match one transaction to several invoices, and only the sum of their amounts produces a match. Naive 1:1 matching never finds it.
- Partial payment. Half came in. The invoice must not jump to "paid", but it also must not fall into the unmatched bucket. The remaining balance has to stay visible.
- Overpayment. More came in than the invoice. The difference is either an advance for next time or a refund. That is no longer a purely matching decision, but matching has to be able to flag it.
- Multiple currencies. Invoice in EUR, payment in CZK. Without conversion at the exchange rate on the payment date, the amounts never match.
- Returned payment / reversal. A negative transaction that cancels an earlier incoming payment. It has to unmatch, not add another.
- Duplicate statement download. Covered above — without idempotence, a double match.
Idempotence: Don't Match Twice
This is the part that tends to be underestimated, and yet it decides whether you can trust the system. The matching job runs repeatedly, statements overlap, and sooner or later the job restarts mid-batch. If matching is not idempotent, you get duplicate payments and a broken invoice balance.
The solution is the same as for other durable integrations: every transaction from a statement gets a deterministic key (typically from the bank transaction ID, not from its position in the downloaded batch) and the matching operation is idempotent. When a transaction we have already processed comes in, the step recognizes it and adds nothing. The same principle as idempotent submission in connecting to regulated government systems — without it, every retry becomes a risk.
Reconciliation and Audit: Does Reality Match?
Automatic matching handles most of it. The question is how you know the rest did not silently get stuck somewhere. That is what reconciliation is for — a periodic check that asks: does the sum of matched payments match the booked invoices? Is there a transaction sitting in the queue for a week that no one has looked at? Is there an invoice marked as paid with no payment attached to it? Without reconciliation you do not know whether matching works — you only hope.
And on top of all of it, an audit log. Every matching step — automatic and manual — leaves a record: which transaction, to which invoice, by which rule, with which score, who confirmed it manually if anyone did. When an auditor or accountant asks in half a year "why is this payment on this invoice", the answer is in the log, not in someone's memory. That is why we treat matching as a durable reconciliation pattern, not a one-off script: idempotence, retry, an audit of every step, and a periodic check that reality matches.
When Custom Is Worth It
Let's be honest, so you don't make work for yourself. When most of your payments arrive with a variable symbol and an exact amount, deterministic matching in off-the-shelf SaaS is enough for you and there is no point building anything more. A custom solution makes sense where the box hits its limits:
- you have a custom ERP or Dynamics 365 Business Central that an off-the-shelf tool cannot reach,
- you run enterprise volume, where even a few percent of manual work is expensive,
- a large share of payments arrive without a variable symbol and deterministic matching is useless,
- you need audit-grade reconciliation, not just "it matched somehow".
This is exactly the area we work in. We built the same durable machinery — idempotence, retry, reconciliation, audit — for connecting to a regulated government system, where over 40,000 documents had to be delivered to the last one. In document automation in a banking environment, we cut contract generation from 2 hours to 3 minutes. We handle invoice extraction as AI extraction from PDFs with human-in-the-loop and an audit log, not as a black box — and the same approach applies to payment matching: connecting to a bank API, fuzzy matching where the variable symbol is missing, and everything wired into the ERP.
If payment matching costs you manual work every month and the box is not enough, get in touch — we will tell you where your matching is weak and what can be automated.
FAQ
When is off-the-shelf SaaS enough and when do you need custom matching?
Standard cases with a variable symbol can be matched deterministically even by off-the-shelf SaaS. A custom solution makes sense where the box hits its limits: a custom ERP, enterprise volume, payments without a variable symbol, multiple currencies, and audit-grade reconciliation.
How do you match payments without a variable symbol?
Fuzzy matching combines several signals: amount, payer name, due date, IBAN, and message text. It scores candidates and matches only those above a confidence threshold. Anything below the threshold goes into a manual queue with a suggestion, not into an automatic match.
How do you prevent matching the same payment twice?
With idempotence. Every transaction from a statement gets a deterministic key, and the matching step is an idempotent operation. When a statement arrives a second time or a job restarts mid-batch, the system recognizes it has already processed the transaction and does not match it again.