Microsoft Office vs LibreOffice for .NET applications
Let me start off by saying that I’m going to focus on LibreOffice instead of OpenOffice, because the latter one in my mind is pretty much abandoned. Although, the things I’m going to say about LibreOffice are also true for OpenOffice, since they are for the most part one and the same…
My application needs to show tables of data and for some tables graphs based on that data. While developing the application I had to choose between:
a) Writing everything myself from the ground up – windows for showing the tables, graph logic, making everything printable etc;
b) Automating LibreOffice;
c) Automating Microsoft Office;
I could write off option A immediately, because it simply wouldn’t have been feasible. There were no benefits going down that route, only costs.
Option B was intriguing. I very much liked the idea of using a free office suite and not having to pay for Microsoft Office licenses. I was disappointed to find out that automating LibreOffice is impossible, and here’s why:
1) 0 documentation - literally! I can’t find any documentation on the API, there are almost no getting started guides, the only source of information seems to be LibreOffice forums (which is no good for active development)…
2) The API is disgusting – seriously! The few code snippets I have managed to find, suggests that there are no properties, only methods. You can forget about IntelliSense, because most methods require a string parameter for the action to perform – eg. “getSheets”.
3) The logic between the API and the classes involved is completely warped.
In the end I had no other option, but to use C. Microsoft Office has its own share of problems, which I’ll come to in a bit, but at least you can program against it, you can understand the API, you do have documentation, you have hundreds of posts on the web that you can search for answers, etc.
Here are some caveats about Microsoft Office:
1) First off, if you need to target/support multiple versions of Office, you need to use the lowest Office version Primary Interop Assemblies. So if you want to support Office 2003 and up, you need to use PIAs for Office 2003.
2) I implore you to only support Office 2007 and up. You will save yourself a huge amount of headaches that way. The two primary reasons I dropped support for Office 2003 were:
a) Office 2003 only supports 3 formating conditions (you can use formating conditions for alternating row colors, having certain cells with certain values colored differently, etc), but they are a must if you want to get any sort of decent speed out of your automation. Coloring cells/rows one by one is painfully slow. For example I had a large table with more than a thousand rows. When coloring them one by one, the automation took close to a minute, using formating conditions dropped the automation time to ~20 seconds.
b) Office 2003 uses different color codes than Office 2007. For instance if I use the method System.Drawing.ColorTranslator.ToOle() I get the exact color in Office 2007 as I pass in for the parameter. But Office 2003 might not give the exact same color or might give a completely different color. So you have to specifically code for Office 2003 and provide an exact color code (you have to use Excel color codes) instead of just saying you want to color the cell System.Drawing.Color.LightPink.
3) You will never get a meaningful error message when automating Microsoft Office, you will only get HRESULT=(…). So you have to track down the offending line of code and figure out yourself what is wrong with it.
4) Every Office object you reference (Application, WorkSheet, Sheet, Range, etc) you must
NAR() (google it, but basically you have to do – System.Runtime.InteropServices.Marshal.ReleaseComObject(o);). I only NAR the “biggest” objects, the ones I mentioned in the last sentence, I don’t NAR single cell or range references that I use in my automation code, so far it seems to be working…
5) Office automation is painfully slow. So don’t over do it, use the minimum required work for acceptable output. When inserting data, always try to insert a whole range of data, don’t insert data cell by cell.
6) On rare occasions you might get strange exceptions. For instance some times I’ve got a “title not set” exception when creating graphs. But that’s impossible since the code always sets a title. This can be contributed to how buggy COM interop is. I’ve reduced the amount of exceptions by placing System.Threading.Thread.Sleep(500) right after the line where I set the title.