Tree map LibreOffice vs OpenOffice

LibreOffice vs OpenOffice: Anatomical study of a fork

Samuel Langlois
Posted by Samuel Langlois

Last week, the first LibreOffice conference
gathering all the "New" LibreOffice community was organized in Paris (co-located at "La cantine" and "IRILL"). It focused on the future and the needs of this community after the schism with, now hosted in Apache Incubator.

As you may know, LibreOffice community is a one year old project based on a "fork", and many things have changed since the fork creation from the version. Many blog posts gave an update on the progress of the two projects. If you missed them, you can start with "LibreOffice and One Year After the Schism" from Scott Merrill, one of the Techcrunch's editor. Michael Meeks does a quick analysis based on diff between Apache's codebase and LibreOffice's. He shows that about 2 millions lines of code out of 7.7 millions have been touched. And 290k lines were "really" added looking only at common files. Raw data are available (11Mb).

Here is the project's story chronology with the main dates:

Libreoffice OpenOffice - History - click on the image to enlarge it

At Antelink, we decided to add our brick to the building, using some of the multisource tracking tools we have. We love images, and tried to give a global view in a single infographic, showing exactly where the LibreOffice community has been working during the last year.

Basically, we track each piece (exact file or code snippet) of LibreOffice v3.4.3.2, and determine where it comes from. This approach is more robust than a "diff", since it is resilient to file renaming or moving within the archive, and to automated formatting from an IDE.

We put all the data in a treemap, using GREEN for content coming from LibreOffice community and BLUE for contents coming from community.

There are 96959 different files in the current version of LibreOffice v3.4.3.2 and more than 77% look specific to LibreOffice.

Here is a large scale view, showing that except in a few areas (libreoffice-extraslibreoffice-testing) almost all files are modified:

Libreoffice OpenOffice - Files View - click on the image to enlarge it

However, taking whole files into account is not really relevant. For instance, changing the name of a class, or even a file header, will flag the file as modified.

We then conduct the same analysis, but focusing on editable contents (mainly Java and C based source code) at the scale of the code snippet (robust to automated formatting and reorganization of the release). We counted 24346 different snippets in the current version of
LibreOffice v3.4.3.2 and more than 30% have been specifically modified for LibreOffice (or originates from outside the project).

With the following treemap, you can notice that libreoffice-writer, libreoffice-calc, libreoffice-base modules were almost entirely impacted and updated. Some other modules are mostly based on the original ones - like libreoffice-testing, which is good news :-).

Libreoffice OpenOffice - Snippets View - click on the image to enlarge it

Treemaps  allow easy zooming, so here is a more detailed view of libreOffice-lib-core

Libreoffice OpenOffice - Snippets View zoom on core - click on the image to enlarge itOf course, a more complete interpretation would need more knowledge the software itself. And we will make the data available to both Apache and LibreOffice communities, upon request!

Since the "hot" topic is to know how both communities will evolve and if some source code, patches, etc, will migrate from Apache to LibreOffice, we will keep an eye on it.

Appointment is made for the second birthday!

Download this analysis with the full size images here.

Stay informed on our latest news!

Follow us