3
heyheni
5y

please i need your advice :)

I need to reform a service that offers legal advice and thus serves around 5000 Microsoft Word legal advice documents for the end user and every year there are 200 more documents created and published and changed manually.

So i had this idea to use a CMS, Git and continuous integration for

- automatic spell checking
- automatic assigning the copy text to translation bureaus, and get translations back.
- version control the texts and translations.
- document generation in multiple formats
- checking the text flow in the document (no overflown text)
- Checking for accessibility for the handy caped
- Deploying it on the Website

Do you think this is feasible? Can something that was made for code also be used to handle copy text documents? In my head this would save so much work but i'm no expert in CI/CD.
Thank you for your advice!

Comments
  • 2
    It could work, but AFAIK git and similars just check the file itself, so you won't ever be able to see "this line of a word doc was changed with this other one"

    Maybe it's just better to have versioning through Gdrive or something like that
  • 0
    @ZioCain thank you for your contribution.
    So could't you just write a script/test case for the output that is generated by CI/CD?
  • 3
    The enterprise solution I’ve seen for this is called Lawson http://mhcsoftwareinc.com/Integrati.... I would start your feature research there.
  • 1
    Everything seems feasible, EXCEPT using Word documents as text file source. Word documents are not text files. They are proprietary blobs that are hard to use with tools meant for actual text files.

    Either you keep the Word documents and are stuck with Microsoft tools or have to handle all Word subtelties. Or you switch to actual text files.
  • 0
    @Fradow yeah that seems to be a good idea anyway, thought of using a www.pandoc.org server to generate multiple file formats.
  • 1
    We had a requirement a year ago which required us to print Word documents. At that moment our printing pipeline only supported PDF files and native printer languages.

    Although the problem is different, our solution might help. Word has a COM interface on Windows which allows you to programmatically control Word. We used this interface to open Word documents and save them as PDF (I assumed Word itself knows best how to convert .docx to .pdf). In the end we used PHP to tie everything together. We never had any problem with this ever since. (And if PHP can do it, most other languages should be able to do same :P). The Interface has a lot of functions implemented from Word itself. You might be able to achieve your goals with it.
  • 0
    @Hel8y cool I'll look into thank you :)
  • 1
    For future reference, I still have (from ~2 years ago) a PHP microframework under active development. It's on the "planning table" again and I'll try to heavily invest to file requests when I get around to continuing it again.
Add Comment