In the previous article we've seen how to parse a git log file. We ended up having an array of commits:
Let's start extracting some useful statistics from it.
The first thing that can come into our mind is to know how many commits we have done to the repository. That's pretty easy to do:
Now it's time to calculate the number of entities changed, that is the number of times that we commited a change in an entity:
As you can see, we are using Array.collect to concatenate the arrays of Files inside each Commit and count them.
Let's continue with the number of entities that exist in the repository:
The code is very similar to the previous one, but before counting we are grouping the array by the file name.
It's time to calculate the number of authors. We can start doing something like this:
But this information is not totally accurate. If we take a look at the contents of the array (remove the last line and execute the code another time) we'll see that some authors have been comitting changes using two different accounts. Let's try to consolidate the names.
First of all we need a map between the name existing in the commit information and the real name:
And now let's use this information to extract the real number of authors:
First of all we are defining a function to consolidate the names. This function is using pattern matching to see if the user name of the commit is one of the names that we want to convert to a real name. If it's one of them, we make the conversion. If not, we just return the name.
And then, we use the previous code with a couple of changes. The first one is to use the brand new consolidateNames function (line 9) and the second one is to use the Array.distinct function (line 10) to not return a name more than once.
And finally let's calculate the number of revisions of each file. We can do that very easily:
We are creating an array with all the files and grouping it by file name. Then, we are creating a new array that contains a tuple with the name of the file and the number of times that the file appears. Finally we sort the array by that number to know the files that changes more.
In this article we've seen how to extract some basic statistics from a git log file. And that doing it is really really easy. In future posts we'll see how to extract more complex information.
Author Vicenç García