我们如何加快 iText 表格渲染速度

原文

Here at Apryse, we occasionally have some free time at the end of our iText development sprints where we're encouraged to use our initiative to "work on whatever" we fancy.

During one of these periods, I developed a fascination with the details of how table rendering works in iText Core; specifically why large cell counts seemed to slow it down by a lot. So, I decided to spend some time on improving my understanding in that area, and to see if I could at least find what was causing it.

TL;DR: I optimized table rendering in iText with minimal code changes, by avoiding repeated border collapse calculations and unnecessary tagging overhead. This significantly improved rendering performance, with a 50k cell table going from 5 minutes to just 7 secs. Here’s how.

A Quick Overview of Tables in iText

Tables are one of the most useful/common layout tools for documents, but are also quite difficult to implement in PDF because the specification doesn't really provide for tables; you only have very basic drawing instructions.

Without going too much into the nitty-gritty of PDF syntax, you have instructions to:

Move to an x,y coordinate on the page
Draw a line from x1,y1 to x2,y2 in a specific color and line thickness
Display the text "Hello world" at coordinates x1,y1

With a little imagination you can see how these simple instructions can be combined into a complex graphic that can visually represent a table.

Thankfully, iText Core's layout engine constructs high-level abstractions around these operations, so you don't have the pain of directly dealing with the low-level PDF syntax. This means you are probably more familiar with something more like the following representation:

JAVA

        Document document = new Document(pdf);
        Table table = new Table(2);
        table.addCell("Cell1");
        table.addCell("Cell2");
        document.add(table);
        document.close();

Which results in something looking like this:

What our simple table looks like when rendered

Conversely, using the previously mentioned low-level operations this table would look like the following in actual PDF syntax:

PDF syntax cheat sheet: https://pdfa.org/wp-content/uploads/2023/08/PDF-Operators-CheatSheet.pdf

TEXT

q                        % Start subsection (Drawing the text Cell 1)
BT                       % Begin text
/F1 12 Tf                % Select the specified font
38.5 790.83 Td           % Move text cursor x coordinate: 38.5, y coordinate 790.83
(Cell 1)Tj               % Show text string
ET                       % End text
Q                        % End section
q                        %  |
BT                       %  |
/F1 12 Tf                %  |
88.5 790.83 Td           %  |--> Same as above
(Cell 2)Tj               %  |
ET                       %  |
Q                        %  |
q                        % Start subsection (drawing of the left border )
0 0 0 RG                 % Select color black
0.5 w                    % Set line width to 0.5
36.25 806 m              % Move cursor to x coordinate 36.25 and y coordinate 806
36.25 783.02 l           % Create line to x coordinate 36.25 and y coordinate 783.02 
S                        % Stroke the line
Q                        % End section
q                        % |  
0 0 0 RG                 % | 
0.5 w                    % | 
86.25 806 m              % | -> Draw right border of cell 1
86.25 783.02 l           % | 
S                        % | 
Q                        % | 

% omitted but this continues for the other borders

So, all this gives you a basic overview of what happens when you create a table using iText's layout engine. As you can probably guess, a lot of calculations need to happen to determine the exact coordinates of those instructions, but that's out of scope for this article. Although, if you want a blogpost going into more detail on the inner workings of iText's layout engine, let me know!

What I'll be doing in this article is walking through how I identified what was causing the table rendering slowdown, and the steps I took to resolve it. With that said, let's go back to January of this year when my investigation began.

Creating a Performance Baseline

As mentioned, I wanted to see where the most time was spent when generating tables. So, I created the most basic table use case, and decided to time the performance for generating 100 cells, 1,000 cells, 10,000, and finally 50,000 cells:

JAVA

public class App {
    public static void main(String[] args) throws IOException {
        //Do some warmup
        generatePdfWithTable("table_warmup.pdf", 1000);
        int[] amounts = {100, 1_000, 10_000, 50_000};

        for (int amount : amounts) {
            long start = System.currentTimeMillis();
            generatePdfWithTable("table_" + amount + ".pdf", amount);
            long end = System.currentTimeMillis();
            System.out.println("Time to generate " + amount + " cells: " + (end - start) + "ms");
        }
    }

    private static void generatePdfWithTable(String fileName, int amountOfCells) throws IOException {
        PdfDocument pdf = new PdfDocument(new PdfWriter(fileName));

        Document document = new Document(pdf);
        Table table = new Table(5);

        for (int i = 0; i < amountOfCells; i++) {
            table.addCell("Cell1");
        }
        document.add(table);
        document.close();
    }
}

Probably not the most optimal benchmarking code, but I thought it should still do the trick. Running this code produced these results:

Number of Cells	Time to Generate
100	7ms
1,000	53ms
10,000	767ms
50,000	35660ms

Plotting this data in a graph produced this:

In this graph, “line go up mean world more badder”

This results in what looks suspiciously like an algorithm with quadratic time complexity. I'm not going to get in to the details of exactly what this is, so don't worry! But it should be pretty clear that the generation time is not increasing proportionally — a low number of cells is OK(ish) while a high number of cells seems way more than you'd expect.

Profiling

So, let's take a closer look by using flame graphs to track down potential bottlenecks. Here's the comparison of running the above sample with 1,000 cells, and 50,000 cells

1,000 cells

50,000 cells

Interesting. We can see that almost all of the time is being spent in just two methods:

And both of those methods seem to originate from the function call com.itextpdf.layout.renderer.CollapsedTableBorders#getVerticalBorder.
Notice that we have Collapsed in the name, indicating that it seems to happen when we use Collapsed borders.

What you might not notice is that this is also the default behaviour, so let's take a look what happens when we run the same code and avoid collapsing borders. By default, iText uses BorderCollapsePropertyValue.COLLAPSE so we have to slightly modify our code sample:

JAVA

table.setBorderCollapse(BorderCollapsePropertyValue.SEPARATE);

Results:

Number of Cells	Time to Generate BorderCollapsePropertyValue.SEPARATE	Time to Generate BorderCollapsePropertyValue.COLLAPSE
100	9ms	7ms
1,000	59ms	53ms
10,000	558ms	767ms
50,000	1601ms	35660ms

Indeed, it seems using the collapsing border feature increases the runtime by a lot — I might even call it a performance bug if Marketing lets me... Anyway, it seems that improving the collapsing border feature is now our top priority. Let's investigate!

Remember that from the flame graph we saw that two functions are responsible for almost all of the runtime. But both of those functions seem to be the result of
calling com.itextpdf.layout.renderer.CollapsedTableBorders#getVerticalBorder.

So, let's have a look at this function:

JAVA


@Override
public List<Border> getVerticalBorder(int index) {
    if (index == 0) {
        List<Border> borderList = TableBorderUtil
                .createAndFillBorderList(null, tableBoundingBorders[3], verticalBorders.get(0).size());
        return getCollapsedList(verticalBorders.get(0), borderList);
    } else if (index == numberOfColumns) {
        List<Border> borderList = TableBorderUtil.createAndFillBorderList(null, tableBoundingBorders[1],
                verticalBorders.get(verticalBorders.size() - 1).size());
        return getCollapsedList(verticalBorders.get(verticalBorders.size() - 1), borderList);
    } else {
        return verticalBorders.get(index);
    }
}

Walking through this code, we can see that if it's the outermost border we calculate a border list, this is done to determine which border color and width to take when collapsing. However, if it's an inner column we can just take the border's width and color without needing to collapse it.

Let's say we have a table in our code, but we know that after initializing the verticalBorders and tableBoundingBorders the values will not change. This means it's pointless to recalculate the collapsed border list over and over again when processing each row, as the results will always be the same.

Instead, we should cache those results when they are calculated. So, let's implement the following to lazily calculate the results, and then store it in verticalBorderComputationResult until they're needed:

JAVA

    private final Map<Integer, List<Border>> verticalBorderComputationResult = new HashMap<>()

@Override
public List<Border> getVerticalBorder(int index) {
    //If not outermost we don't need to calculate collapsed borders
    if (index != 0 && index != numberOfColumns) {
        return verticalBorders.get(index);
    }
    if (verticalBorderComputationResult.containsKey(index)) {
        return verticalBorderComputationResult.get(index);
    }
    final int tableBoundingBordersIndex = index == 0 ? 3 : 1;
    final Border boundingBorder = tableBoundingBorders[tableBoundingBordersIndex];
    final List<Border> verticalBorder = verticalBorders.get(index);

    final List<Border> borderList = TableBorderUtil
            .createAndFillBorderList(null, boundingBorder, verticalBorder.size());
    final List<Border> result = getCollapsedList(verticalBorder, borderList);
    verticalBorderComputationResult.put(index, result);
    return result;
}

Now, we simply calculate once and return the cached results when we want them. Seems promising, but let's verify the results of this small refactoring of the function. Running our newly-optimized code gives us the following:

Number of Cells	Time to Generate BorderCollapsePropertyValue.SEPARATE	Time to Generate BorderCollapsePropertyValue.COLLAPSE	With fix
100	9ms	7ms	7ms
1,000	59ms	53ms	49ms
10,000	558ms	767ms	485ms
50,000	1601ms	35660ms	1310ms

As you can see we managed to reduce the runtime from 35 seconds to 1.3 seconds, without adding a lot of complexity. Pretty good for an hour of work. 😀

Tagging Tables

We're not done yet though. While I was working on general improvements to table performance, I also wanted to check how tagged tables compare to untagged tables.

Tagging is quite a complex subject, but the only thing you need to know for this blog is that it allows screen readers and other accessibility tools to better understand the semantic meaning of the contents of a PDF document - what is a heading, what is body text etc.

More importantly for this article, there are specific tags to identify content formatted as a table. You can find a lot more information in the Tagged PDF Q&A from the PDF Association's site.

If you are using iText Core's layout engine (or indeed the pdfHTML add-on), you can simply include this in your code to enable tagging:

JAVA

    PdfDocument pdf = new PdfDocument(new PdfWriter(fileName));
    pdf.setTagged();

This will ensure that your generated PDF file is tagged accordingly. I won't go into the exact details of how iText does this now, but let me know if you want more info or blogs on this topic.

So, let's return to our performance testing code. As you can see, the only addition we've made is to enable tagging.

JAVA

private static void generatePdfWithTable(String fileName, int amountOfCells) throws IOException {
    PdfDocument pdf = new PdfDocument(new PdfWriter(fileName));
    pdf.setTagged();
    Document document = new Document(pdf);
    Table table = new Table(5);

    for (int i = 0; i < amountOfCells; i++) {
        table.addCell("Cell1");
    }
    document.add(table);
    document.close();
}

When we run this example and compare against our now improved baseline results, we can see some major issues occurring.

Number of Cells	Untagged tables	Tagged tables
100	7ms	16ms
1,000	49ms	206ms
10,000	485ms	15637ms
50,000	1310ms	300018ms

Again, there's something fishy going on so let's dive into it. Once more, we'll bring out the flame graphs to see where we could be losing all this time:

After poking though the code a bit, I saw the following in the IntelliJ gutter:

Hold on, this single safety check takes 50% of the total runtime! This seems like an ideal case for improvement.

For some context, flushing is the process where iText writes completed pages to disk to keep memory usage low. So the check is to make sure that you don't try to modify parts of the tagTreestructure that have already been finalized. Since we can't just remove this check, let's instead see what's happening in the getKids() method.

JAVA


@Override
public List<IStructureNode> getKids() {
    PdfObject k = getK();
    List<IStructureNode> kids = new ArrayList<>();
    if (k != null) {
        if (k.isArray()) {
            PdfArray a = (PdfArray) k;
            for (int i = 0; i < a.size(); i++) {
                addKidObjectToStructElemList(a.get(i), kids);
            }
        } else {
            addKidObjectToStructElemList(k, kids);
        }
    }
    return kids;
}

We see that we perform a rather expensive parsing operation to convert low-level PdfObjects to high-level layout elements. But we don't really do anything with the parsed information, just check if a certain entry exists or not.

Let's try to rewrite this to not parse any of the PDF objects.

JAVA


public boolean isKidFlushed(int index) {
    PdfObject k = getK();
    if (k == null) {
        return false;
    }
    if (k.isArray()) {
        PdfArray array = (PdfArray) k;
        if (index >= array.size()) {
            return false;
        }
        return array.get(index).isFlushed();
    }
    return index == 0 && k.isFlushed();
}

What we've done here is completely eliminated the need for conversion, since we just use iText's low-level PdfObject API. Instead of looping and parsing over each element, it now just becomes an array lookup + function invocation in cases where the PdfOjbect is a PdfArray. And in the case of other PdfObjects we just invoke the function.

For reference, you can see the full implementation on GitHub here: https://github.com/itext/itext-java/commit/71451319ebb9463d2c577bdaa89e4958e4ca2dd8

Let's see how much we've improved the speed.

Number of Cells	Tagged tables	Optimized kid lookup fix
100	16ms	19ms
1,000	206ms	203ms
10,000	15637ms	2761ms
50,000	300018ms	111790ms

Nice, so by avoiding unneeded computation we already made it about three times faster. But we are still pretty far away from the performance we get without tagging, so let's continue the investigation.

Further Table Tweaks

After some further profiling, I found I could add a few more optimizations. While the results were nice, the optimizations themselves aren't super interesting. So, I'll just provide brief descriptions here along with links to my commits if you want more info.

Cache tagging hintkey

Here we always had to call into an expensive function to get a value that didn't change, so simply caching the result once helped quite a bit.

Commit: b50c34e14f012993f81a02500542edaa54d3ea5c

Remove duplicate function invocation

An expensive calculation was executed twice in the same function, in which the result could have not changed in between. Storing the result in the variable, and using that for further optimization resulted in a drastic improvement.

Commit: b6b212971e285a4a800492f1d17651827b3de623

Do row adding in bulk

The previous implementation added new tags one by one. This caused a lot of unneeded checks for each tag. To avoid those duplicate checks we now gather all the tags, and then add them at once.

Commit: 0626cd422a275ac402aaa3aa34d92d17f934174f

After implementing these three minor improvements, let's check the results now:

Number of Cells	Tagged tables	Optimized kid lookup fix	3 minor fixes
100	16ms	19ms	20ms
1,000	206ms	203ms	117ms
10,000	15637ms	2761ms	1485ms
50,000	300018ms	111790ms	18508ms

As you can see, even though the code changes are small and localized we've managed to make it a further 5 times faster!

Next nearest sibling algorithm

The last commit of the patch set is quite interesting, and so it gets its own special section. It showcases how adding a little heuristics to an algorithm can drastically improve the runtime.

Let's say we have the following table:

Header Cell 1	Header Cell 2	Header Cell 3	Header Cell 4
Cell 1	Cell 2	Cell 3	Cell4
Cell 5	Cell 6	Cell 7	Cell8

If we would open the PDF and look at the tag structure, it would look something like this.

TEXT


-Document
--Table
---Thead
-----th
-----span (Header Cell 1)
-----th
-----span (Header Cell 2)
....
---Tbody
-----td
------span (Cell 1)
-----td
------span (Cell 2)

(You might notice that there are no tr elements for table rows. This is expected, as we only generate those dummy elements when we finalize the table. This is because different PDF specifications require us to have different structures.)

When adding a new cell to the table, we need to create the correct tag and find the correct place to insert it in the tag tree. By using the profiler I saw these were the offending lines.

JAVA


// Omitted for clarity
List<TaggingHintKey> parentKidsHint = getAccessibleKidsHint(parentKey);
int kidIndInParentKidsHint = parentKidsHint.indexOf(kidKey);
int ind = getNearestNextSiblingTagIndex(waitingTagsManager, parentPointer, parentKidsHint, kidIndInParentKidsHint);
// insert tag at index....  Omitted for clarity

The implementations then for these functions can be seen below:

JAVA


public List<TaggingHintKey> getAccessibleKidsHint(TaggingHintKey parent) {
    List<TaggingHintKey> kidsHint = kidsHints.get(parent);
    if (kidsHint == null) {
        return Collections.<TaggingHintKey>emptyList();
    }

    List<TaggingHintKey> accessibleKids = new ArrayList<>();

    for (TaggingHintKey kid : kidsHint) {
        if (isNonAccessibleHint(kid)) {
            accessibleKids.addAll(getAccessibleKidsHint(kid));
        } else {
            accessibleKids.add(kid);
        }
    }

    return accessibleKids;
}

private int getNearestNextSiblingTagIndex(WaitingTagsManager waitingTagsManager, TagTreePointer parentPointer,
                                          List<TaggingHintKey> siblingsHint, int start) {
    int ind = -1;
    TagTreePointer nextSiblingPointer = new TagTreePointer(document);
    while (++start < siblingsHint.size()) {
        if (waitingTagsManager.tryMovePointerToWaitingTag(nextSiblingPointer, siblingsHint.get(start))
                && parentPointer.isPointingToSameTag(new TagTreePointer(nextSiblingPointer).moveToParent())) {
            ind = nextSiblingPointer.getIndexInParentKidsList();
            break;
        }
    }
    return ind;
}

So, the first line again flattens the tree to a list recursively — this initial implementation is pretty decent really. It's quick to implement and in most cases your flattened sub-structure will only contain very few tags. Since this operation isn't that expensive, it's not worth spending the time to optimize it.

The current algorithm looks like this in pseudo-code:

Flatten the whole tree to a list
Start searching from the start to find the desired TaggingHintKey
Start from the found index, and look for the next TaggingHintKey which has the same parent

The Trouble with Tagging

We know this works well in most scenarios. But in our table, the tbody tag will have as many siblings as it has cells, and since this code will always be executed for newly-added cells means it has to flatten the tree again and again.

In the case of tables we know we always add cells in at the end of the list — in fact, we do this in most general tagging scenarios. So, it does not make sense that we start looking from the beginning of the list, since we know we will always be at the other end of it.

Determining Some Improvements

Now that we have a better understanding of the problem space, we can optimize our algorithm to take into account the heuristics we have.

Start processing recursively backwards until you find the desired TaggingHintKey to start from
When found, start looking ahead to find the next sibling

As you can see, if we do this we completely avoid the two pitfalls which currently slow down the table creation process.

So, the implementation of this new algorithm would look something like this.

JAVA


// ommited  for clarity
int ind = getNearestNextSiblingTagIndex(waitingTagsManager, parentPointer, parentKidsHint, kidIndInParentKidsHint);
// insert tag at index....  Omitted for clarity

JAVA

private int getNearestNextSiblingIndex(WaitingTagsManager waitingTagsManager, TagTreePointer parentPointer,
                                       TaggingHintKey parentKey, TaggingHintKey kidKey) {
    ScanContext scanContext = new ScanContext();
    scanContext.waitingTagsManager = waitingTagsManager;
    scanContext.startHintKey = kidKey;
    scanContext.parentPointer = parentPointer;
    scanContext.nextSiblingPointer = new TagTreePointer(document);
    return scanForNearestNextSiblingIndex(scanContext, null, parentKey);
}

private int scanForNearestNextSiblingIndex(ScanContext scanContext, TaggingHintKey toCheck, TaggingHintKey parent) {
    if (scanContext.startVerifying) {
        if (scanContext.waitingTagsManager.tryMovePointerToWaitingTag(scanContext.nextSiblingPointer, toCheck)
                && scanContext.parentPointer.isPointingToSameTag(
                new TagTreePointer(scanContext.nextSiblingPointer).moveToParent())) {
            return scanContext.nextSiblingPointer.getIndexInParentKidsList();
        }
    }
    if (toCheck != null && !isNonAccessibleHint(toCheck)) {
        return -1;
    }
    List<TaggingHintKey> kidsHintList = kidsHints.get(parent);
    if (kidsHintList == null) {
        return -1;
    }


    int startIndex = -1;
    if (!scanContext.startVerifying) {
        for (int i = kidsHintList.size() - 1; i >= 0; i--) {
            if (scanContext.startHintKey == kidsHintList.get(i)) {
                scanContext.startVerifying = true;
                startIndex = i;
                break;
            }
        }
    }


    for (int j = startIndex + 1; j < kidsHintList.size(); j++) {
        final TaggingHintKey kid = kidsHintList.get(j);
        final int interMediateResult = scanForNearestNextSiblingIndex(scanContext, kid, kid);
        if (interMediateResult != -1) {
            return interMediateResult;
        }
    }

    return -1;
}

private static class ScanContext {
    WaitingTagsManager waitingTagsManager;
    TaggingHintKey startHintKey;
    boolean startVerifying;
    TagTreePointer parentPointer;
    TagTreePointer nextSiblingPointer;
}
}

OK, so let's plug this snazzy new tagging algorithm into the code and see what we get:

Number of Cells	Tagged tables	Optimized kid lookup fix	3 minor fixes	Optimized search algo
100	16ms	19ms	20ms	19ms
1,000	206ms	203ms	117ms	120ms
10,000	15_637ms	2_761ms	1_485ms	1_072ms
50,000	300_018ms	111_790ms	18_508ms	7_525ms

Alright, that's tagging a 50k cell table almost 40x faster. Not bad for an afternoon's work, eh?

Conclusions

And so we come to the end of Guust's Adventures in Optimization Land! But in all seriousness, I wanted to highlight how these improvements were achieved since it shows how profiling and a little curiosity can lead to huge performance wins with minimal risk.

All the above table improvements were including in the iText Core 9.1.0 release in February. If you’re generating PDFs at scale or using collapsed borders/tagging, I highly recommend checking it out as as you should get some free performance improvements.

Also, if you're tackling similar bottlenecks feel free to reach out — I'm more than happy to nerd out on this stuff. 😁