Here at Apryse, we occasionally have some free time at the end of our iText development sprints where we're encouraged to use our initiative to "work on whatever" we fancy.
During one of these periods, I developed a fascination with the details of how table rendering works in iText Core; specifically why large cell counts seemed to slow it down by a lot. So, I decided to spend some time on improving my understanding in that area, and to see if I could at least find what was causing it.
TL;DR: I optimized table rendering in iText with minimal code changes, by avoiding repeated border collapse calculations and unnecessary tagging overhead. This significantly improved rendering performance, with a 50k cell table going from 5 minutes to just 7 secs. Here’s how.
A Quick Overview of Tables in iText
Tables are one of the most useful/common layout tools for documents, but are also quite difficult to implement in PDF because the specification doesn't really provide for tables; you only have very basic drawing instructions.
Without going too much into the nitty-gritty of PDF syntax, you have instructions to:
Move to an
x,y
coordinate on the pageDraw a line from
x1,y1
tox2,y2
in a specific color and line thicknessDisplay the text "Hello world" at coordinates
x1,y1
With a little imagination you can see how these simple instructions can be combined into a complex graphic that can visually represent a table.
Thankfully, iText Core's layout engine constructs high-level abstractions around these operations, so you don't have the pain of directly dealing with the low-level PDF syntax. This means you are probably more familiar with something more like the following representation:
JAVA
Document document = new Document(pdf);
Table table = new Table(2);
table.addCell("Cell1");
table.addCell("Cell2");
document.add(table);
document.close();
Which results in something looking like this:

What our simple table looks like when rendered
Conversely, using the previously mentioned low-level operations this table would look like the following in actual PDF syntax:
PDF syntax cheat sheet: https://pdfa.org/wp-content/uploads/2023/08/PDF-Operators-CheatSheet.pdf
TEXT
q % Start subsection (Drawing the text Cell 1)
BT % Begin text
/F1 12 Tf % Select the specified font
38.5 790.83 Td % Move text cursor x coordinate: 38.5, y coordinate 790.83
(Cell 1)Tj % Show text string
ET % End text
Q % End section
q % |
BT % |
/F1 12 Tf % |
88.5 790.83 Td % |--> Same as above
(Cell 2)Tj % |
ET % |
Q % |
q % Start subsection (drawing of the left border )
0 0 0 RG % Select color black
0.5 w % Set line width to 0.5
36.25 806 m % Move cursor to x coordinate 36.25 and y coordinate 806
36.25 783.02 l % Create line to x coordinate 36.25 and y coordinate 783.02
S % Stroke the line
Q % End section
q % |
0 0 0 RG % |
0.5 w % |
86.25 806 m % | -> Draw right border of cell 1
86.25 783.02 l % |
S % |
Q % |
% omitted but this continues for the other borders
So, all this gives you a basic overview of what happens when you create a table using iText's layout engine. As you can probably guess, a lot of calculations need to happen to determine the exact coordinates of those instructions, but that's out of scope for this article. Although, if you want a blogpost going into more detail on the inner workings of iText's layout engine, let me know!
What I'll be doing in this article is walking through how I identified what was causing the table rendering slowdown, and the steps I took to resolve it. With that said, let's go back to January of this year when my investigation began.
Creating a Performance Baseline
As mentioned, I wanted to see where the most time was spent when generating tables. So, I created the most basic table use case, and decided to time the performance for generating 100 cells, 1,000 cells, 10,000, and finally 50,000 cells:
JAVA
public class App {
public static void main(String[] args) throws IOException {
//Do some warmup
generatePdfWithTable("table_warmup.pdf", 1000);
int[] amounts = {100, 1_000, 10_000, 50_000};
for (int amount : amounts) {
long start = System.currentTimeMillis();
generatePdfWithTable("table_" + amount + ".pdf", amount);
long end = System.currentTimeMillis();
System.out.println("Time to generate " + amount + " cells: " + (end - start) + "ms");
}
}
private static void generatePdfWithTable(String fileName, int amountOfCells) throws IOException {
PdfDocument pdf = new PdfDocument(new PdfWriter(fileName));
Document document = new Document(pdf);
Table table = new Table(5);
for (int i = 0; i < amountOfCells; i++) {
table.addCell("Cell1");
}
document.add(table);
document.close();
}
}
Probably not the most optimal benchmarking code, but I thought it should still do the trick. Running this code produced these results:
Number of Cells | Time to Generate |
---|---|
100 | 7ms |
1,000 | 53ms |
10,000 | 767ms |
50,000 | 35660ms |
Plotting this data in a graph produced this:

In this graph, “line go up mean world more badder”
This results in what looks suspiciously like an algorithm with quadratic time complexity. I'm not going to get in to the details of exactly what this is, so don't worry! But it should be pretty clear that the generation time is not increasing proportionally — a low number of cells is OK(ish) while a high number of cells seems way more than you'd expect.
Profiling
So, let's take a closer look by using flame graphs to track down potential bottlenecks. Here's the comparison of running the above sample with 1,000 cells, and 50,000 cells
1,000 cells

The results of rendering our table with 1,000 cells
50,000 cells

And now, with 50,000 cells
Interesting. We can see that almost all of the time is being spent in just two methods:
And both of those methods seem to originate from the function call com.itextpdf.layout.renderer.CollapsedTableBorders#getVerticalBorder
.
Notice that we have Collapsed
in the name, indicating that it seems to happen when we use Collapsed borders.
What you might not notice is that this is also the default behaviour, so let's take a look what happens when we run the same code and avoid collapsing borders. By default, iText uses BorderCollapsePropertyValue.COLLAPSE
so we have to slightly modify our code sample:
JAVA
table.setBorderCollapse(BorderCollapsePropertyValue.SEPARATE);
Results:
Number of Cells | Time to Generate BorderCollapsePropertyValue.SEPARATE | Time to Generate BorderCollapsePropertyValue.COLLAPSE |
---|---|---|
100 | 9ms | 7ms |
1,000 | 59ms | 53ms |
10,000 | 558ms | 767ms |
50,000 | 1601ms | 35660ms |
Indeed, it seems using the collapsing border feature increases the runtime by a lot — I might even call it a performance bug if Marketing lets me... Anyway, it seems that improving the collapsing border feature is now our top priority. Let's investigate!
Remember that from the flame graph we saw that two functions are responsible for almost all of the runtime. But both of those functions seem to be the result of
calling com.itextpdf.layout.renderer.CollapsedTableBorders#getVerticalBorder
.
So, let's have a look at this function:
JAVA
@Override
public List<Border> getVerticalBorder(int index) {
if (index == 0) {
List<Border> borderList = TableBorderUtil
.createAndFillBorderList(null, tableBoundingBorders[3], verticalBorders.get(0).size());
return getCollapsedList(verticalBorders.get(0), borderList);
} else if (index == numberOfColumns) {
List<Border> borderList = TableBorderUtil.createAndFillBorderList(null, tableBoundingBorders[1],
verticalBorders.get(verticalBorders.size() - 1).size());
return getCollapsedList(verticalBorders.get(verticalBorders.size() - 1), borderList);
} else {
return verticalBorders.get(index);
}
}
Walking through this code, we can see that if it's the outermost border we calculate a border list, this is done to determine which border color and width to take when collapsing. However, if it's an inner column we can just take the border's width and color without needing to collapse it.
Let's say we have a table in our code, but we know that after initializing the verticalBorders
and tableBoundingBorders
the values will not change. This means it's pointless to recalculate the collapsed border list over and over again when processing each row, as the results will always be the same.
Instead, we should cache those results when they are calculated. So, let's implement the following to lazily calculate the results, and then store it in verticalBorderComputationResult
until they're needed:
JAVA
private final Map<Integer, List<Border>> verticalBorderComputationResult = new HashMap<>()
@Override
public List<Border> getVerticalBorder(int index) {
//If not outermost we don't need to calculate collapsed borders
if (index != 0 && index != numberOfColumns) {
return verticalBorders.get(index);
}
if (verticalBorderComputationResult.containsKey(index)) {
return verticalBorderComputationResult.get(index);
}
final int tableBoundingBordersIndex = index == 0 ? 3 : 1;
final Border boundingBorder = tableBoundingBorders[tableBoundingBordersIndex];
final List<Border> verticalBorder = verticalBorders.get(index);
final List<Border> borderList = TableBorderUtil
.createAndFillBorderList(null, boundingBorder, verticalBorder.size());
final List<Border> result = getCollapsedList(verticalBorder, borderList);
verticalBorderComputationResult.put(index, result);
return result;
}
Now, we simply calculate once and return the cached results when we want them. Seems promising, but let's verify the results of this small refactoring of the function. Running our newly-optimized code gives us the following:
Number of Cells | Time to Generate BorderCollapsePropertyValue.SEPARATE | Time to Generate BorderCollapsePropertyValue.COLLAPSE | With fix |
---|---|---|---|
100 | 9ms | 7ms | 7ms |
1,000 | 59ms | 53ms | 49ms |
10,000 | 558ms | 767ms | 485ms |
50,000 | 1601ms | 35660ms | 1310ms |
As you can see we managed to reduce the runtime from 35 seconds to 1.3 seconds, without adding a lot of complexity. Pretty good for an hour of work. 😀
Tagging Tables
We're not done yet though. While I was working on general improvements to table performance, I also wanted to check how tagged tables compare to untagged tables.
Tagging is quite a complex subject, but the only thing you need to know for this blog is that it allows screen readers and other accessibility tools to better understand the semantic meaning of the contents of a PDF document - what is a heading, what is body text etc.
More importantly for this article, there are specific tags to identify content formatted as a table. You can find a lot more information in the Tagged PDF Q&A from the PDF Association's site.
If you are using iText Core's layout engine (or indeed the pdfHTML add-on), you can simply include this in your code to enable tagging:
JAVA
PdfDocument pdf = new PdfDocument(new PdfWriter(fileName));
pdf.setTagged();
This will ensure that your generated PDF file is tagged accordingly. I won't go into the exact details of how iText does this now, but let me know if you want more info or blogs on this topic.
So, let's return to our performance testing code. As you can see, the only addition we've made is to enable tagging.
JAVA
private static void generatePdfWithTable(String fileName, int amountOfCells) throws IOException {
PdfDocument pdf = new PdfDocument(new PdfWriter(fileName));
pdf.setTagged();
Document document = new Document(pdf);
Table table = new Table(5);
for (int i = 0; i < amountOfCells; i++) {
table.addCell("Cell1");
}
document.add(table);
document.close();
}
When we run this example and compare against our now improved baseline results, we can see some major issues occurring.
Number of Cells | Untagged tables | Tagged tables |
---|---|---|
100 | 7ms | 16ms |
1,000 | 49ms | 206ms |
10,000 | 485ms | 15637ms |
50,000 | 1310ms | 300018ms |
Again, there's something fishy going on so let's dive into it. Once more, we'll bring out the flame graphs to see where we could be losing all this time:

Using flame graphs again to dig deeper
After poking though the code a bit, I saw the following in the IntelliJ gutter:

Flushing out the problem
Hold on, this single safety check takes 50% of the total runtime! This seems like an ideal case for improvement.
For some context, flushing is the process where iText writes completed pages to disk to keep memory usage low. So the check is to make sure that you don't try to modify parts of the tagTreestructure
that have already been finalized. Since we can't just remove this check, let's instead see what's happening in the getKids()
method.
JAVA
@Override
public List<IStructureNode> getKids() {
PdfObject k = getK();
List<IStructureNode> kids = new ArrayList<>();
if (k != null) {
if (k.isArray()) {
PdfArray a = (PdfArray) k;
for (int i = 0; i < a.size(); i++) {
addKidObjectToStructElemList(a.get(i), kids);
}
} else {
addKidObjectToStructElemList(k, kids);
}
}
return kids;
}
We see that we perform a rather expensive parsing operation to convert low-level PdfObjects
to high-level layout elements. But we don't really do anything with the parsed information, just check if a certain entry exists or not.
Let's try to rewrite this to not parse any of the PDF objects.
JAVA
public boolean isKidFlushed(int index) {
PdfObject k = getK();
if (k == null) {
return false;
}
if (k.isArray()) {
PdfArray array = (PdfArray) k;
if (index >= array.size()) {
return false;
}
return array.get(index).isFlushed();
}
return index == 0 && k.isFlushed();
}
What we've done here is completely eliminated the need for conversion, since we just use iText's low-level PdfObject
API. Instead of looping and parsing over each element, it now just becomes an array lookup + function invocation in cases where the PdfOjbect
is a PdfArray
. And in the case of other PdfObjects
we just invoke the function.
For reference, you can see the full implementation on GitHub here: https://github.com/itext/itext-java/commit/71451319ebb9463d2c577bdaa89e4958e4ca2dd8
Let's see how much we've improved the speed.
Number of Cells | Tagged tables | Optimized kid lookup fix |
---|---|---|
100 | 16ms | 19ms |
1,000 | 206ms | 203ms |
10,000 | 15637ms | 2761ms |
50,000 | 300018ms | 111790ms |
Nice, so by avoiding unneeded computation we already made it about three times faster. But we are still pretty far away from the performance we get without tagging, so let's continue the investigation.
Further Table Tweaks
After some further profiling, I found I could add a few more optimizations. While the results were nice, the optimizations themselves aren't super interesting. So, I'll just provide brief descriptions here along with links to my commits if you want more info.
Cache tagging hintkey
Here we always had to call into an expensive function to get a value that didn't change, so simply caching the result once helped quite a bit.
Commit: b50c34e14f012993f81a02500542edaa54d3ea5c
Remove duplicate function invocation
An expensive calculation was executed twice in the same function, in which the result could have not changed in between. Storing the result in the variable, and using that for further optimization resulted in a drastic improvement.
Commit: b6b212971e285a4a800492f1d17651827b3de623
Do row adding in bulk
The previous implementation added new tags one by one. This caused a lot of unneeded checks for each tag. To avoid those duplicate checks we now gather all the tags, and then add them at once.
Commit: 0626cd422a275ac402aaa3aa34d92d17f934174f
After implementing these three minor improvements, let's check the results now:
Number of Cells | Tagged tables | Optimized kid lookup fix | 3 minor fixes |
---|---|---|---|
100 | 16ms | 19ms | 20ms |
1,000 | 206ms | 203ms | 117ms |
10,000 | 15637ms | 2761ms | 1485ms |
50,000 | 300018ms | 111790ms | 18508ms |
As you can see, even though the code changes are small and localized we've managed to make it a further 5 times faster!
Next nearest sibling algorithm
The last commit of the patch set is quite interesting, and so it gets its own special section. It showcases how adding a little heuristics to an algorithm can drastically improve the runtime.
Let's say we have the following table:
Header Cell 1 | Header Cell 2 | Header Cell 3 | Header Cell 4 |
---|---|---|---|
Cell 1 | Cell 2 | Cell 3 | Cell4 |
Cell 5 | Cell 6 | Cell 7 | Cell8 |
If we would open the PDF and look at the tag structure, it would look something like this.
TEXT
-Document
--Table
---Thead
-----th
-----span (Header Cell 1)
-----th
-----span (Header Cell 2)
....
---Tbody
-----td
------span (Cell 1)
-----td
------span (Cell 2)
(You might notice that there are no tr
elements for table rows. This is expected, as we only generate those dummy elements when we finalize the table. This is because different PDF specifications require us to have different structures.)
When adding a new cell to the table, we need to create the correct tag and find the correct place to insert it in the tag tree. By using the profiler I saw these were the offending lines.
JAVA
// Omitted for clarity
List<TaggingHintKey> parentKidsHint = getAccessibleKidsHint(parentKey);
int kidIndInParentKidsHint = parentKidsHint.indexOf(kidKey);
int ind = getNearestNextSiblingTagIndex(waitingTagsManager, parentPointer, parentKidsHint, kidIndInParentKidsHint);
// insert tag at index.... Omitted for clarity
The implementations then for these functions can be seen below:
JAVA
public List<TaggingHintKey> getAccessibleKidsHint(TaggingHintKey parent) {
List<TaggingHintKey> kidsHint = kidsHints.get(parent);
if (kidsHint == null) {
return Collections.<TaggingHintKey>emptyList();
}
List<TaggingHintKey> accessibleKids = new ArrayList<>();
for (TaggingHintKey kid : kidsHint) {
if (isNonAccessibleHint(kid)) {
accessibleKids.addAll(getAccessibleKidsHint(kid));
} else {
accessibleKids.add(kid);
}
}
return accessibleKids;
}
private int getNearestNextSiblingTagIndex(WaitingTagsManager waitingTagsManager, TagTreePointer parentPointer,
List<TaggingHintKey> siblingsHint, int start) {
int ind = -1;
TagTreePointer nextSiblingPointer = new TagTreePointer(document);
while (++start < siblingsHint.size()) {
if (waitingTagsManager.tryMovePointerToWaitingTag(nextSiblingPointer, siblingsHint.get(start))
&& parentPointer.isPointingToSameTag(new TagTreePointer(nextSiblingPointer).moveToParent())) {
ind = nextSiblingPointer.getIndexInParentKidsList();
break;
}
}
return ind;
}
So, the first line again flattens the tree to a list recursively — this initial implementation is pretty decent really. It's quick to implement and in most cases your flattened sub-structure will only contain very few tags. Since this operation isn't that expensive, it's not worth spending the time to optimize it.
The current algorithm looks like this in pseudo-code:
Flatten the whole tree to a list
Start searching from the start to find the desired
TaggingHintKey
Start from the found index, and look for the next
TaggingHintKey
which has the same parent
The Trouble with Tagging
We know this works well in most scenarios. But in our table, the tbody
tag will have as many siblings as it has cells, and since this code will always be executed for newly-added cells means it has to flatten the tree again and again.
In the case of tables we know we always add cells in at the end of the list — in fact, we do this in most general tagging scenarios. So, it does not make sense that we start looking from the beginning of the list, since we know we will always be at the other end of it.
Determining Some Improvements
Now that we have a better understanding of the problem space, we can optimize our algorithm to take into account the heuristics we have.
Start processing recursively backwards until you find the desired
TaggingHintKey
to start fromWhen found, start looking ahead to find the next sibling
As you can see, if we do this we completely avoid the two pitfalls which currently slow down the table creation process.
So, the implementation of this new algorithm would look something like this.
JAVA
// ommited for clarity
int ind = getNearestNextSiblingTagIndex(waitingTagsManager, parentPointer, parentKidsHint, kidIndInParentKidsHint);
// insert tag at index.... Omitted for clarity
JAVA
private int getNearestNextSiblingIndex(WaitingTagsManager waitingTagsManager, TagTreePointer parentPointer,
TaggingHintKey parentKey, TaggingHintKey kidKey) {
ScanContext scanContext = new ScanContext();
scanContext.waitingTagsManager = waitingTagsManager;
scanContext.startHintKey = kidKey;
scanContext.parentPointer = parentPointer;
scanContext.nextSiblingPointer = new TagTreePointer(document);
return scanForNearestNextSiblingIndex(scanContext, null, parentKey);
}
private int scanForNearestNextSiblingIndex(ScanContext scanContext, TaggingHintKey toCheck, TaggingHintKey parent) {
if (scanContext.startVerifying) {
if (scanContext.waitingTagsManager.tryMovePointerToWaitingTag(scanContext.nextSiblingPointer, toCheck)
&& scanContext.parentPointer.isPointingToSameTag(
new TagTreePointer(scanContext.nextSiblingPointer).moveToParent())) {
return scanContext.nextSiblingPointer.getIndexInParentKidsList();
}
}
if (toCheck != null && !isNonAccessibleHint(toCheck)) {
return -1;
}
List<TaggingHintKey> kidsHintList = kidsHints.get(parent);
if (kidsHintList == null) {
return -1;
}
int startIndex = -1;
if (!scanContext.startVerifying) {
for (int i = kidsHintList.size() - 1; i >= 0; i--) {
if (scanContext.startHintKey == kidsHintList.get(i)) {
scanContext.startVerifying = true;
startIndex = i;
break;
}
}
}
for (int j = startIndex + 1; j < kidsHintList.size(); j++) {
final TaggingHintKey kid = kidsHintList.get(j);
final int interMediateResult = scanForNearestNextSiblingIndex(scanContext, kid, kid);
if (interMediateResult != -1) {
return interMediateResult;
}
}
return -1;
}
private static class ScanContext {
WaitingTagsManager waitingTagsManager;
TaggingHintKey startHintKey;
boolean startVerifying;
TagTreePointer parentPointer;
TagTreePointer nextSiblingPointer;
}
}
OK, so let's plug this snazzy new tagging algorithm into the code and see what we get:
Number of Cells | Tagged tables | Optimized kid lookup fix | 3 minor fixes | Optimized search algo |
---|---|---|---|---|
100 | 16ms | 19ms | 20ms | 19ms |
1,000 | 206ms | 203ms | 117ms | 120ms |
10,000 | 15_637ms | 2_761ms | 1_485ms | 1_072ms |
50,000 | 300_018ms | 111_790ms | 18_508ms | 7_525ms |
Alright, that's tagging a 50k cell table almost 40x faster. Not bad for an afternoon's work, eh?
Conclusions
And so we come to the end of Guust's Adventures in Optimization Land! But in all seriousness, I wanted to highlight how these improvements were achieved since it shows how profiling and a little curiosity can lead to huge performance wins with minimal risk.
All the above table improvements were including in the iText Core 9.1.0 release in February. If you’re generating PDFs at scale or using collapsed borders/tagging, I highly recommend checking it out as as you should get some free performance improvements.
Also, if you're tackling similar bottlenecks feel free to reach out — I'm more than happy to nerd out on this stuff. 😁