Dirty Docs

I usually like Google Docs. They are not full-blown editing tool and they will not replace Microsoft Office for me (at least not soon) but they offer quite a lot when it comes to editing documents from various locations. Your document goes where your browser is.

For reasons I will not get into, I had to do some automatic processing on one of my Google Docs textual documents. I created it with care, all headings were properly defined and most of text was of "Normal" style. It was made with clean export into HTML in mind.

Unfortunately export result was quite far from clean code. First thing is that all is in one line. Yes, this saves few bytes but it is pain in the ass if you need any manual editing of this document. Fortunately PSPad knows how to expand such code.

Biggest issue I have here is that everything is set via style-sheets. While I usually agree with that, Google overdid it this time. They added bunch of span tags all over place. Even when you have just "Normal" text, you can be sure that it will not stand without <span> around it. Event bolds and italics will not get just <strong> and <em> but they will have full-blown CSS definition.

I agree that this is not an issue if you just want to view it. However this makes any automatic processing of text real pain-in-the-ass. It definitely brings back memory of Microsoft FrontPage and it's html mess.

P.S. No, ODT export is not solution - it is even bigger and dirtier.

Leave a Reply

Your email address will not be published. Required fields are marked *