Antepedia - referenced files
2011.10.10

Antepedia the largest Knowledge Base of Open Source components

Samuel Langlois
Posted by Samuel Langlois

Each time we talk about our famous Knowledge Base of Open Source components, we claim it is the largest in the world... But how can we say that?

This blog post will show you how this Knowledge Base grows and how we work to improve its content quality.

As we already wrote in some blog posts, the content of the base comes from many different locations:

And the content could be:

  • an artifact from a project's download area
  • a source code file dumped from a project's repository (CVS, Subversion, Git, ...)
  • or an extracted file from an archive (archives could be a zip, jar, tar, tar.gz, tar.bz2...)

Here is the evolution of the number of files we reference in the whole database:

Antepedia - referenced files

Each time there is a change in the slope of this graph, it corresponds to the addition of a new location and/or a new archive extraction campaign.

But facts are here: today we index, qualify and store an average of 630,000 new files each day - almost 19 millions new files every month - and we add an average of 1,000 new projects each day!

Antepedia - projects evolution

Moreover, to stay in sync with the very dynamic world of Open Source, we have automated processes which regularly check and update projects and contents already in the database.

We're really proud to announce that our database, now, lets you find the source of your components among 537 million files and more than 1 million open source projects.

Here you can find the repartition of projects by place:

Antepedia - projects repartition

Obviously a large part of the projects comes from GoogleCode and Sourceforge, but it's quite interesting to notice the number of projects is so high that Maven Central represents only 5% of our databases (and we have the whole content of this repository in our databases!)

How is it possible to do that?

Actually we have to manage really large databases (with a crazy cumulated size of 476GB) and to store an amazing volume of files (more than 18TB!), which both grow every day!

The two following graphs show how our databases and the volume of stored files grow:

Antepedia - size of the databases

Antepedia - size of stored files

And remember, we're still able to find 99% of your open source component in less than 10 seconds thanks to the optimization of our bases.

Maybe you think you won't find your sources/projects on Antepedia.
Well that's actually possible but don't worry, we are working daily to add more locations, projects and contents.

And if you have any suggestions about new sources we have to integrate, feel free to tell us!

We definitely got the largest Knowledge Base of Open Source components in the world, and all of our products (Antepedia Notifier or Antepedia Reporter) are already plugged to it.

Stay informed on our latest news!

Follow us