20
gblues
7y

Today's project was answering the question: "Can I update tables in a Microsoft Word document programmatically?"

(spoiler: YES)

My coworker got the ball rolling by showing that the docx file is just a zip archive with a bunch of XML in it.

The thing I needed to update were a pair of tables. Not knowing anything about Word's XML schema, I investigated things like:
- what tag is the table declared with?
- is the table paginated within the table?
- where is the cell background color specified?

Fortunately this wasn't too cumbersome.

For the data, CSV was the obvious choice. And I quickly confirmed that I could use OpenCSV easily within gradle.

The Word XML segments were far too verbose to put into constants, so I made a series of templates with tokens to use for replacement.

In creating the templates, I had to analyze the word xml to see what changed between cells (thankfully, very little). This then informed the design of the CSV parsing loops to make sure the dynamic stuff got injected properly.

I got my proof of concept working in less than a day. Have some more polishing to do, but I'm pretty happy with the initial results!

Comments
  • 1
    Congrats! So, no external libraries fit this use case?
  • 2
    @olezhka I didn't look for a full-on docx library because it'd be overkill for our needs (filling in a <w:tbl> element).

    I see docx4j and that might end up being useful for making a PDF out of the result.
  • 0
    Dude... Ever heard of Aspose.Words... It is what i am using now to create a document generator with templates and partial templates. Recommend u take a look at that.
  • 0
    @MrCSharp I haven't. But if that's a .net library it probably wouldn't be of use since this is a gradle script.
  • 0
    Oh! And I have to give Microsoft some credit--Word actually provides really useful errors if you fuck up the XML! Made debugging pretty easy.
  • 0
    I was looking into something similar a few months back. Don't know about tables, but the XML gets royally complicated on formatting if you make a lot of tiny edits. Good job on your project!
Add Comment