我们如何加快 iText 表格渲染速度
Making iText's table rendering faster

原始链接: https://kb.itextpdf.com/itext/how-i-made-pdf-table-rendering-faster

在Apryse的空闲时间里,我调查了iText Core表格渲染的性能问题,特别是针对大量单元格的情况。性能分析显示,合并边框和标签处理造成了严重的性能下降。 我通过缓存合并边框的计算结果优化了合并边框的计算,这使得50,000单元格表格的生成时间从35秒减少到1.3秒。进一步调查显示,标签处理造成了更多问题。最初的解决方案是避免完全解析PDF对象,将50,000单元格的生成时间从300秒减少到111秒。在进行了一些额外的调整后,例如缓存标签提示、移除重复的函数调用以及批量添加行,生成时间下降到18.5秒。最终的优化是改进用于查找插入新单元格正确位置的“最近的下一个兄弟”算法,特别是针对表格。这将时间减少到7.5秒。 这些微小的代码更改,专注于避免冗余计算和改进标签插入逻辑,带来了显著的性能提升。这些更改已包含在iText Core 9.1.0版本中。

This Hacker News thread discusses iText's improved table rendering speed. A commenter reminisces about using iText 20 years ago, noting its continued relevance. Another user points out that the title was changed to remove the claim of "95% faster" rendering, arguing that the original title was more informative about the scale of the improvement. A moderator explains that removing specific numbers from titles is standard practice to avoid baity titles and discussions focused on the accuracy of the numbers, particularly when the claimed improvements might be based on specific, unusual conditions. They argue that a more general title, like "faster," allows the article itself to demonstrate the improvement's magnitude. However, some believe the specific improvement percentage is important context for the nature of the update.

原文

Here at Apryse, we occasionally have some free time at the end of our iText development sprints where we're encouraged to use our initiative to "work on whatever" we fancy.

During one of these periods, I developed a fascination with the details of how table rendering works in iText Core; specifically why large cell counts seemed to slow it down by a lot. So, I decided to spend some time on improving my understanding in that area, and to see if I could at least find what was causing it.

TL;DR: I optimized table rendering in iText with minimal code changes, by avoiding repeated border collapse calculations and unnecessary tagging overhead. This significantly improved rendering performance, with a 50k cell table going from 5 minutes to just 7 secs. Here’s how.

A Quick Overview of Tables in iText

Tables are one of the most useful/common layout tools for documents, but are also quite difficult to implement in PDF because the specification doesn't really provide for tables; you only have very basic drawing instructions.

Without going too much into the nitty-gritty of PDF syntax, you have instructions to:

  • Move to an x,y coordinate on the page

  • Draw a line from x1,y1 to x2,y2 in a specific color and line thickness

  • Display the text "Hello world" at coordinates x1,y1

With a little imagination you can see how these simple instructions can be combined into a complex graphic that can visually represent a table.

Thankfully, iText Core's layout engine constructs high-level abstractions around these operations, so you don't have the pain of directly dealing with the low-level PDF syntax. This means you are probably more familiar with something more like the following representation:

JAVA

        Document document = new Document(pdf);
        Table table = new Table(2);
        table.addCell("Cell1");
        table.addCell("Cell2");
        document.add(table);
        document.close();

Which results in something looking like this:

basic_table.png

What our simple table looks like when rendered

Conversely, using the previously mentioned low-level operations this table would look like the following in actual PDF syntax:

PDF syntax cheat sheet: https://pdfa.org/wp-content/uploads/2023/08/PDF-Operators-CheatSheet.pdf

TEXT

q                        % Start subsection (Drawing the text Cell 1)
BT                       % Begin text
/F1 12 Tf                % Select the specified font
38.5 790.83 Td           % Move text cursor x coordinate: 38.5, y coordinate 790.83
(Cell 1)Tj               % Show text string
ET                       % End text
Q                        % End section
q                        %  |
BT                       %  |
/F1 12 Tf                %  |
88.5 790.83 Td           %  |--> Same as above
(Cell 2)Tj               %  |
ET                       %  |
Q                        %  |
q                        % Start subsection (drawing of the left border )
0 0 0 RG                 % Select color black
0.5 w                    % Set line width to 0.5
36.25 806 m              % Move cursor to x coordinate 36.25 and y coordinate 806
36.25 783.02 l           % Create line to x coordinate 36.25 and y coordinate 783.02 
S                        % Stroke the line
Q                        % End section
q                        % |  
0 0 0 RG                 % | 
0.5 w                    % | 
86.25 806 m              % | -> Draw right border of cell 1
86.25 783.02 l           % | 
S                        % | 
Q                        % | 

% omitted but this continues for the other borders

So, all this gives you a basic overview of what happens when you create a table using iText's layout engine. As you can probably guess, a lot of calculations need to happen to determine the exact coordinates of those instructions, but that's out of scope for this article. Although, if you want a blogpost going into more detail on the inner workings of iText's layout engine, let me know!

What I'll be doing in this article is walking through how I identified what was causing the table rendering slowdown, and the steps I took to resolve it. With that said, let's go back to January of this year when my investigation began.

Creating a Performance Baseline

As mentioned, I wanted to see where the most time was spent when generating tables. So, I created the most basic table use case, and decided to time the performance for generating 100 cells, 1,000 cells, 10,000, and finally 50,000 cells:

JAVA

public class App {
    public static void main(String[] args) throws IOException {
        //Do some warmup
        generatePdfWithTable("table_warmup.pdf", 1000);
        int[] amounts = {100, 1_000, 10_000, 50_000};

        for (int amount : amounts) {
            long start = System.currentTimeMillis();
            generatePdfWithTable("table_" + amount + ".pdf", amount);
            long end = System.currentTimeMillis();
            System.out.println("Time to generate " + amount + " cells: " + (end - start) + "ms");
        }
    }

    private static void generatePdfWithTable(String fileName, int amountOfCells) throws IOException {
        PdfDocument pdf = new PdfDocument(new PdfWriter(fileName));

        Document document = new Document(pdf);
        Table table = new Table(5);

        for (int i = 0; i < amountOfCells; i++) {
            table.addCell("Cell1");
        }
        document.add(table);
        document.close();
    }
}

Probably not the most optimal benchmarking code, but I thought it should still do the trick. Running this code produced these results:

Number of Cells

Time to Generate

100

7ms

1,000

53ms

10,000

767ms

50,000

35660ms

Plotting this data in a graph produced this:

runtime_graph.png

In this graph, “line go up mean world more badder”

This results in what looks suspiciously like an algorithm with quadratic time complexity. I'm not going to get in to the details of exactly what this is, so don't worry! But it should be pretty clear that the generation time is not increasing proportionally — a low number of cells is OK(ish) while a high number of cells seems way more than you'd expect.

Profiling

So, let's take a closer look by using flame graphs to track down potential bottlenecks. Here's the comparison of running the above sample with 1,000 cells, and 50,000 cells

1,000 cells

flamegraph_1000.png

The results of rendering our table with 1,000 cells

50,000 cells

flamegraph_50000.png

And now, with 50,000 cells

Interesting. We can see that almost all of the time is being spent in just two methods:

And both of those methods seem to originate from the function call com.itextpdf.layout.renderer.CollapsedTableBorders#getVerticalBorder.
Notice that we have Collapsed in the name, indicating that it seems to happen when we use Collapsed borders.

What you might not notice is that this is also the default behaviour, so let's take a look what happens when we run the same code and avoid collapsing borders. By default, iText uses BorderCollapsePropertyValue.COLLAPSE so we have to slightly modify our code sample:

JAVA

table.setBorderCollapse(BorderCollapsePropertyValue.SEPARATE);

Results:

Number of Cells

Time to Generate BorderCollapsePropertyValue.SEPARATE

Time to Generate BorderCollapsePropertyValue.COLLAPSE

100

9ms

7ms

1,000

59ms

53ms

10,000

558ms

767ms

50,000

1601ms

35660ms

Indeed, it seems using the collapsing border feature increases the runtime by a lot — I might even call it a performance bug if Marketing lets me... Anyway, it seems that improving the collapsing border feature is now our top priority. Let's investigate!

Remember that from the flame graph we saw that two functions are responsible for almost all of the runtime. But both of those functions seem to be the result of
calling com.itextpdf.layout.renderer.CollapsedTableBorders#getVerticalBorder.

So, let's have a look at this function:

JAVA


@Override
public List<Border> getVerticalBorder(int index) {
    if (index == 0) {
        List<Border> borderList = TableBorderUtil
                .createAndFillBorderList(null, tableBoundingBorders[3], verticalBorders.get(0).size());
        return getCollapsedList(verticalBorders.get(0), borderList);
    } else if (index == numberOfColumns) {
        List<Border> borderList = TableBorderUtil.createAndFillBorderList(null, tableBoundingBorders[1],
                verticalBorders.get(verticalBorders.size() - 1).size());
        return getCollapsedList(verticalBorders.get(verticalBorders.size() - 1), borderList);
    } else {
        return verticalBorders.get(index);
    }
}

Walking through this code, we can see that if it's the outermost border we calculate a border list, this is done to determine which border color and width to take when collapsing. However, if it's an inner column we can just take the border's width and color without needing to collapse it.

Let's say we have a table in our code, but we know that after initializing the verticalBorders and tableBoundingBorders the values will not change. This means it's pointless to recalculate the collapsed border list over and over again when processing each row, as the results will always be the same.

Instead, we should cache those results when they are calculated. So, let's implement the following to lazily calculate the results, and then store it in verticalBorderComputationResult until they're needed:

JAVA

    private final Map<Integer, List<Border>> verticalBorderComputationResult = new HashMap<>()

@Override
public List<Border> getVerticalBorder(int index) {
    //If not outermost we don't need to calculate collapsed borders
    if (index != 0 && index != numberOfColumns) {
        return verticalBorders.get(index);
    }
    if (verticalBorderComputationResult.containsKey(index)) {
        return verticalBorderComputationResult.get(index);
    }
    final int tableBoundingBordersIndex = index == 0 ? 3 : 1;
    final Border boundingBorder = tableBoundingBorders[tableBoundingBordersIndex];
    final List<Border> verticalBorder = verticalBorders.get(index);

    final List<Border> borderList = TableBorderUtil
            .createAndFillBorderList(null, boundingBorder, verticalBorder.size());
    final List<Border> result = getCollapsedList(verticalBorder, borderList);
    verticalBorderComputationResult.put(index, result);
    return result;
}

Now, we simply calculate once and return the cached results when we want them. Seems promising, but let's verify the results of this small refactoring of the function. Running our newly-optimized code gives us the following:

Number of Cells

Time to Generate BorderCollapsePropertyValue.SEPARATE

Time to Generate BorderCollapsePropertyValue.COLLAPSE

With fix

100

9ms

7ms

7ms

1,000

59ms

53ms

49ms

10,000

558ms

767ms

485ms

50,000

1601ms

35660ms

1310ms

As you can see we managed to reduce the runtime from 35 seconds to 1.3 seconds, without adding a lot of complexity. Pretty good for an hour of work. 😀

Tagging Tables

We're not done yet though. While I was working on general improvements to table performance, I also wanted to check how tagged tables compare to untagged tables.

Tagging is quite a complex subject, but the only thing you need to know for this blog is that it allows screen readers and other accessibility tools to better understand the semantic meaning of the contents of a PDF document - what is a heading, what is body text etc.

More importantly for this article, there are specific tags to identify content formatted as a table. You can find a lot more information in the Tagged PDF Q&A from the PDF Association's site.

If you are using iText Core's layout engine (or indeed the pdfHTML add-on), you can simply include this in your code to enable tagging:

JAVA

    PdfDocument pdf = new PdfDocument(new PdfWriter(fileName));
    pdf.setTagged();

This will ensure that your generated PDF file is tagged accordingly. I won't go into the exact details of how iText does this now, but let me know if you want more info or blogs on this topic.

So, let's return to our performance testing code. As you can see, the only addition we've made is to enable tagging.

JAVA

private static void generatePdfWithTable(String fileName, int amountOfCells) throws IOException {
    PdfDocument pdf = new PdfDocument(new PdfWriter(fileName));
    pdf.setTagged();
    Document document = new Document(pdf);
    Table table = new Table(5);

    for (int i = 0; i < amountOfCells; i++) {
        table.addCell("Cell1");
    }
    document.add(table);
    document.close();
}

When we run this example and compare against our now improved baseline results, we can see some major issues occurring.

Number of Cells

Untagged tables

Tagged tables

100

7ms

16ms

1,000

49ms

206ms

10,000

485ms

15637ms

50,000

1310ms

300018ms

Again, there's something fishy going on so let's dive into it. Once more, we'll bring out the flame graphs to see where we could be losing all this time:

flame_graph_tagged_is_flushed.png

Using flame graphs again to dig deeper

After poking though the code a bit, I saw the following in the IntelliJ gutter:

kids_hint_lot_of_time.png

Flushing out the problem

Hold on, this single safety check takes 50% of the total runtime! This seems like an ideal case for improvement.

For some context, flushing is the process where iText writes completed pages to disk to keep memory usage low. So the check is to make sure that you don't try to modify parts of the tagTreestructure that have already been finalized. Since we can't just remove this check, let's instead see what's happening in the getKids() method.

JAVA


@Override
public List<IStructureNode> getKids() {
    PdfObject k = getK();
    List<IStructureNode> kids = new ArrayList<>();
    if (k != null) {
        if (k.isArray()) {
            PdfArray a = (PdfArray) k;
            for (int i = 0; i < a.size(); i++) {
                addKidObjectToStructElemList(a.get(i), kids);
            }
        } else {
            addKidObjectToStructElemList(k, kids);
        }
    }
    return kids;
}

We see that we perform a rather expensive parsing operation to convert low-level PdfObjects to high-level layout elements. But we don't really do anything with the parsed information, just check if a certain entry exists or not.

Let's try to rewrite this to not parse any of the PDF objects.

JAVA


public boolean isKidFlushed(int index) {
    PdfObject k = getK();
    if (k == null) {
        return false;
    }
    if (k.isArray()) {
        PdfArray array = (PdfArray) k;
        if (index >= array.size()) {
            return false;
        }
        return array.get(index).isFlushed();
    }
    return index == 0 && k.isFlushed();
}

What we've done here is completely eliminated the need for conversion, since we just use iText's low-level PdfObject API. Instead of looping and parsing over each element, it now just becomes an array lookup + function invocation in cases where the PdfOjbect is a PdfArray. And in the case of other PdfObjects we just invoke the function.

For reference, you can see the full implementation on GitHub here: https://github.com/itext/itext-java/commit/71451319ebb9463d2c577bdaa89e4958e4ca2dd8

Let's see how much we've improved the speed.

Number of Cells

Tagged tables

Optimized kid lookup fix

100

16ms

19ms

1,000

206ms

203ms

10,000

15637ms

2761ms

50,000

300018ms

111790ms

Nice, so by avoiding unneeded computation we already made it about three times faster. But we are still pretty far away from the performance we get without tagging, so let's continue the investigation.

Further Table Tweaks

After some further profiling, I found I could add a few more optimizations. While the results were nice, the optimizations themselves aren't super interesting. So, I'll just provide brief descriptions here along with links to my commits if you want more info.

Cache tagging hintkey

Here we always had to call into an expensive function to get a value that didn't change, so simply caching the result once helped quite a bit.

Commit: b50c34e14f012993f81a02500542edaa54d3ea5c

Remove duplicate function invocation

An expensive calculation was executed twice in the same function, in which the result could have not changed in between. Storing the result in the variable, and using that for further optimization resulted in a drastic improvement.

Commit: b6b212971e285a4a800492f1d17651827b3de623

Do row adding in bulk

The previous implementation added new tags one by one. This caused a lot of unneeded checks for each tag. To avoid those duplicate checks we now gather all the tags, and then add them at once.

Commit: 0626cd422a275ac402aaa3aa34d92d17f934174f

After implementing these three minor improvements, let's check the results now:

Number of Cells

Tagged tables

Optimized kid lookup fix

3 minor fixes

100

16ms

19ms

20ms

1,000

206ms

203ms

117ms

10,000

15637ms

2761ms

1485ms

50,000

300018ms

111790ms

18508ms

As you can see, even though the code changes are small and localized we've managed to make it a further 5 times faster!

Next nearest sibling algorithm

The last commit of the patch set is quite interesting, and so it gets its own special section. It showcases how adding a little heuristics to an algorithm can drastically improve the runtime.

Let's say we have the following table:

Header Cell 1

Header Cell 2

Header Cell 3

Header Cell 4

Cell 1

Cell 2

Cell 3

Cell4

Cell 5

Cell 6

Cell 7

Cell8

If we would open the PDF and look at the tag structure, it would look something like this.

TEXT


-Document
--Table
---Thead
-----th
-----span (Header Cell 1)
-----th
-----span (Header Cell 2)
....
---Tbody
-----td
------span (Cell 1)
-----td
------span (Cell 2)

(You might notice that there are no tr elements for table rows. This is expected, as we only generate those dummy elements when we finalize the table. This is because different PDF specifications require us to have different structures.)

When adding a new cell to the table, we need to create the correct tag and find the correct place to insert it in the tag tree. By using the profiler I saw these were the offending lines.

JAVA


// Omitted for clarity
List<TaggingHintKey> parentKidsHint = getAccessibleKidsHint(parentKey);
int kidIndInParentKidsHint = parentKidsHint.indexOf(kidKey);
int ind = getNearestNextSiblingTagIndex(waitingTagsManager, parentPointer, parentKidsHint, kidIndInParentKidsHint);
// insert tag at index....  Omitted for clarity

The implementations then for these functions can be seen below:

JAVA


public List<TaggingHintKey> getAccessibleKidsHint(TaggingHintKey parent) {
    List<TaggingHintKey> kidsHint = kidsHints.get(parent);
    if (kidsHint == null) {
        return Collections.<TaggingHintKey>emptyList();
    }

    List<TaggingHintKey> accessibleKids = new ArrayList<>();

    for (TaggingHintKey kid : kidsHint) {
        if (isNonAccessibleHint(kid)) {
            accessibleKids.addAll(getAccessibleKidsHint(kid));
        } else {
            accessibleKids.add(kid);
        }
    }

    return accessibleKids;
}

private int getNearestNextSiblingTagIndex(WaitingTagsManager waitingTagsManager, TagTreePointer parentPointer,
                                          List<TaggingHintKey> siblingsHint, int start) {
    int ind = -1;
    TagTreePointer nextSiblingPointer = new TagTreePointer(document);
    while (++start < siblingsHint.size()) {
        if (waitingTagsManager.tryMovePointerToWaitingTag(nextSiblingPointer, siblingsHint.get(start))
                && parentPointer.isPointingToSameTag(new TagTreePointer(nextSiblingPointer).moveToParent())) {
            ind = nextSiblingPointer.getIndexInParentKidsList();
            break;
        }
    }
    return ind;
}

So, the first line again flattens the tree to a list recursively — this initial implementation is pretty decent really. It's quick to implement and in most cases your flattened sub-structure will only contain very few tags. Since this operation isn't that expensive, it's not worth spending the time to optimize it.

The current algorithm looks like this in pseudo-code:

  1. Flatten the whole tree to a list

  2. Start searching from the start to find the desired TaggingHintKey

  3. Start from the found index, and look for the next TaggingHintKey which has the same parent

The Trouble with Tagging

We know this works well in most scenarios. But in our table, the tbody tag will have as many siblings as it has cells, and since this code will always be executed for newly-added cells means it has to flatten the tree again and again.

In the case of tables we know we always add cells in at the end of the list — in fact, we do this in most general tagging scenarios. So, it does not make sense that we start looking from the beginning of the list, since we know we will always be at the other end of it.

Determining Some Improvements

Now that we have a better understanding of the problem space, we can optimize our algorithm to take into account the heuristics we have.

  1. Start processing recursively backwards until you find the desired TaggingHintKey to start from

  2. When found, start looking ahead to find the next sibling

As you can see, if we do this we completely avoid the two pitfalls which currently slow down the table creation process.

So, the implementation of this new algorithm would look something like this.

JAVA


// ommited  for clarity
int ind = getNearestNextSiblingTagIndex(waitingTagsManager, parentPointer, parentKidsHint, kidIndInParentKidsHint);
// insert tag at index....  Omitted for clarity

JAVA

private int getNearestNextSiblingIndex(WaitingTagsManager waitingTagsManager, TagTreePointer parentPointer,
                                       TaggingHintKey parentKey, TaggingHintKey kidKey) {
    ScanContext scanContext = new ScanContext();
    scanContext.waitingTagsManager = waitingTagsManager;
    scanContext.startHintKey = kidKey;
    scanContext.parentPointer = parentPointer;
    scanContext.nextSiblingPointer = new TagTreePointer(document);
    return scanForNearestNextSiblingIndex(scanContext, null, parentKey);
}

private int scanForNearestNextSiblingIndex(ScanContext scanContext, TaggingHintKey toCheck, TaggingHintKey parent) {
    if (scanContext.startVerifying) {
        if (scanContext.waitingTagsManager.tryMovePointerToWaitingTag(scanContext.nextSiblingPointer, toCheck)
                && scanContext.parentPointer.isPointingToSameTag(
                new TagTreePointer(scanContext.nextSiblingPointer).moveToParent())) {
            return scanContext.nextSiblingPointer.getIndexInParentKidsList();
        }
    }
    if (toCheck != null && !isNonAccessibleHint(toCheck)) {
        return -1;
    }
    List<TaggingHintKey> kidsHintList = kidsHints.get(parent);
    if (kidsHintList == null) {
        return -1;
    }


    int startIndex = -1;
    if (!scanContext.startVerifying) {
        for (int i = kidsHintList.size() - 1; i >= 0; i--) {
            if (scanContext.startHintKey == kidsHintList.get(i)) {
                scanContext.startVerifying = true;
                startIndex = i;
                break;
            }
        }
    }


    for (int j = startIndex + 1; j < kidsHintList.size(); j++) {
        final TaggingHintKey kid = kidsHintList.get(j);
        final int interMediateResult = scanForNearestNextSiblingIndex(scanContext, kid, kid);
        if (interMediateResult != -1) {
            return interMediateResult;
        }
    }

    return -1;
}

private static class ScanContext {
    WaitingTagsManager waitingTagsManager;
    TaggingHintKey startHintKey;
    boolean startVerifying;
    TagTreePointer parentPointer;
    TagTreePointer nextSiblingPointer;
}
}

OK, so let's plug this snazzy new tagging algorithm into the code and see what we get:

Number of Cells

Tagged tables

Optimized kid lookup fix

3 minor fixes

Optimized search algo

100

16ms

19ms

20ms

19ms

1,000

206ms

203ms

117ms

120ms

10,000

15_637ms

2_761ms

1_485ms

1_072ms

50,000

300_018ms

111_790ms

18_508ms

7_525ms

Alright, that's tagging a 50k cell table almost 40x faster. Not bad for an afternoon's work, eh?

Conclusions

And so we come to the end of Guust's Adventures in Optimization Land! But in all seriousness, I wanted to highlight how these improvements were achieved since it shows how profiling and a little curiosity can lead to huge performance wins with minimal risk.

All the above table improvements were including in the iText Core 9.1.0 release in February. If you’re generating PDFs at scale or using collapsed borders/tagging, I highly recommend checking it out as as you should get some free performance improvements.

Also, if you're tackling similar bottlenecks feel free to reach out — I'm more than happy to nerd out on this stuff. 😁

联系我们 contact @ memedata.com