Extracting information from your code repository using F# - Part 1: parsing the log file
Contents
I've recently read the excellent book Your Code as Crime Scene by Adam Tornhill. In this book, Adam explain several techniques to extract very useful information from the commits in the code repository to help you to understand your code, your dependencies and your organisation. If you haven't read the book, please do yourself a favor and get a copy as a Christmas present.
On the othe hand, this week I've attended the fantastic Progressive F# Tutorials at Skills Matter. There were 8 awesome workshops from people like Jamie Dixon, Tomas Petricek or Ian Russel explaining how you can use F# in your daily work. You can read a very good summary by the other Jamie Dixon here.
So, I've decided to improve my F# skills using it to do some of the analysis that Adam does in his book using his own tool code-maat.
Creating a useful log
The first thing we need is to create a log that we can parse easily and that has all the information we need. So please, use your favorite git command line tool to navigate to the base folder of the repository that you want to analyze and type the following command:
This command will write into gitLog.log something similar to this
The basic structure here is that we have each commit separated by a line break. In every commit, we have a line with the commit information (hash, author, date and message) and several lines with the files that have changed (additions, deletions, file path). If the commit is a merge, all this structure is preceeded by another commit information line with the merge information.
Parsing the log file
So first of all, let's translate this structure into F# tpyes:
As we can see we have the CommitInfo that is a Record type formed by three strings and a DateTime and a CommitedFile that is also a Record type formed by two optional integers and a string. The integers are optional because you can have some file in a binary format and git can't count the additions and deletions. In this case the log will display a "-" instead of a number. Finally, we have a Record type called Commit that has a CommitInfo field an array of CommitedFile. Prety straightforward.
Let's read the content of the file and split it in the different commits to be able to parse it.
As you can see we start defining a constant in F# using the [<Literal>] annotation. After that we read the file using .Net standard libraries. And finally we split the content of the file unsing a double line break as a separator. So far so good.
Now that we have an array with all the commits (still in text format), lets parse each of this chunks of data. First of all we need to know which of those lines are the commit info and which of them are the commit lines.
The first thing we do is to split the commit lines removing any empty line that we can possibly have. After that, we take the commit info line as the last line that is a commit info line (a line that starts with the hash information) removing all the merge info that we don't neeed. Finally, we take the file lines as all the lines that are not a commit info line.
It's time to convert our commit info line into a CommitInfo object, much more convenient for our purposes.
As you can see, we are using the magic of Type providers to parse the line and extract the information. In this case, using the CsvProvider, we are defining that the third column will be a date using the Schema parameter. We just need to fill a CommitInfo object with the information of the first row.
And finally we need to parse the information of the commit lines. We'll use a very similar process:
The idea is the same, but we just need to iterate over all the commit lines. In this case, the format of the csv is a bit different (tabs are used as separators) and we use the Schema parameter to indicate that the two integers are optional.
Finally, we just need to create the Commit object:
This is the whole function code:
The last bit, is to use this function in all the commits from the file:
Summary
In this post we've seen how easy is to parse a git log file using F# and type providers. In future posts we'll see how can we extract information from this data. You can see the code of this post in this gist. See you soon!
Author Vicenç García
LastMod 13-12-2015