In the previous article we've seen how to parse a git log file. We ended up having an array of commits:

TODO: Raw content of a Gist file.

Let's start extracting some useful statistics from it.

The first thing that can come into our mind is to know how many commits we have done to the repository. That's pretty easy to do:

As you can see, we are using the pipe forward operator (|>) and the Array.length function to extract this information.

Now it's time to calculate the number of entities changed, that is the number of times that we commited a change in an entity:

As you can see, we are using Array.collect to concatenate the arrays of Files inside each Commit and count them.

Let's continue with the number of entities that exist in the repository:

The code is very similar to the previous one, but before counting we are grouping the array by the file name.

It's time to calculate the number of authors. We can start doing something like this:

But this information is not totally accurate. If we take a look at the contents of the array (remove the last line and execute the code another time) we'll see that some authors have been comitting changes using two different accounts. Let's try to consolidate the names.

First of all we need a map between the name existing in the commit information and the real name:

And now let's use this information to extract the real number of authors:

First of all we are defining a function to consolidate the names. This function is using pattern matching to see if the user name of the commit is one of the names that we want to convert to a real name. If it's one of them, we make the conversion. If not, we just return the name.

And then, we use the previous code with a couple of changes. The first one is to use the brand new consolidateNames function (line 9) and the second one is to use the Array.distinct function (line 10) to not return a name more than once.

And finally let's calculate the number of revisions of each file. We can do that very easily:

We are creating an array with all the files and grouping it by file name. Then, we are creating a new array that contains a tuple with the name of the file and the number of times that the file appears. Finally we sort the array by that number to know the files that changes more.


In this article we've seen how to extract some basic statistics from a git log file. And that doing it is really really easy. In future posts we'll see how to extract more complex information.

I've recently read the excellent book Your Code as Crime Scene by Adam Tornhill. In this book, Adam explain several techniques to extract very useful information from the commits in the code repository to help you to understand your code, your dependencies and your organisation. If you haven't read the book, please do yourself a favor and get a copy as a Christmas present.

On the othe hand, this week I've attended the fantastic Progressive F# Tutorials at Skills Matter. There were 8 awesome workshops from people like Jamie Dixon, Tomas Petricek or Ian Russel explaining how you can use F# in your daily work. You can read a very good summary by the other Jamie Dixon here.

So, I've decided to improve my F# skills using it to do some of the analysis that Adam does in his book using his own tool code-maat.

Creating a useful log

The first thing we need is to create a log that we can parse easily and that has all the information we need. So please, use your favorite git command line tool to navigate to the base folder of the repository that you want to analyze and type the following command:

This command will write into gitLog.log something similar to this

The basic structure here is that we have each commit separated by a line break. In every commit, we have a line with the commit information (hash, author, date and message) and several lines with the files that have changed (additions, deletions, file path). If the commit is a merge, all this structure is preceeded by another commit information line with the merge information.

Parsing the log file

So first of all, let's translate this structure into F# tpyes:

As we can see we have the CommitInfo that is a Record type formed by three strings and a DateTime and a CommitedFile that is also a Record type formed by two optional integers and a string. The integers are optional because you can have some file in a binary format and git can't count the additions and deletions. In this case the log will display a "-" instead of a number. Finally, we have a Record type called Commit that has a CommitInfo field an array of CommitedFile. Prety straightforward.

Let's read the content of the file and split it in the different commits to be able to parse it.

As you can see we start defining a constant in F# using the [<Literal>] annotation. After that we read the file using  .Net standard libraries. And finally we split the content of the file unsing a double line break as a separator. So far so good.

Now that we have an array with all the commits (still in text format), lets parse each of this chunks of data. First of all we need to know which of those lines are the commit info and which of them are the commit lines.

The first thing we do is to split the commit lines removing any empty line that we can possibly have. After that, we take the commit info line as the last line that is a commit info line (a line that starts with the hash information) removing all the merge info that we don't neeed. Finally, we take the file lines as all the lines that are not a commit info line.

It's time to convert our commit info line into a CommitInfo object, much more convenient for our purposes.

As you can see, we are using the magic of Type providers to parse the line and extract the information. In this case, using the CsvProvider, we are defining that the third column will be a date using the Schema parameter. We just need to fill a CommitInfo object with the information of the first row.

And finally we need to parse the information of the commit lines. We'll use a very similar process:

The idea is the same, but we just need to iterate over all the commit lines. In this case, the format of the csv is a bit different (tabs are used as separators) and we use the Schema parameter to indicate that the two integers are optional.

Finally, we just need to create the Commit object:

This is the whole function code:

The last bit, is to use this function in all the commits from the file:


In this post we've seen how easy is to parse a git log file using F# and type providers. In future posts we'll see how can we extract information from this data. You can see the code of this post in this gist. See you soon!

In the era of JavaScript it's a good idea to support users that aren't getting JavaScript enhancements, specially if you work for UK's goverment. If you are using simple forms you don't have any problems, but as long as you start adding complexity to your page supporting this scenario can be a bit tricky. Obviously your first thought should be: "Can I provide a similar experience using a less complex view?" The answer to this question could be splitting your view in different steps, but you should talk with your team's UX expert, Business Analyst and Product Owner to see if this solution is acceptable. In this article, we'll see a couple of techniques to deal with this situation if you are only allowed to use a single view.


Imagine that you're developing a recipes website, where the user can add her own recipes. One step of this process is to add the ingredients needed to cook the recipe and you're asked to provide a search for the ingredients, the option of add to your recipe one of the search results, specify de grams of that ingredient used in the recipe and the option to remove one of the ingredients already added. You've developed the version that uses JavaScript with the last framework in the market, and now you need to develop the non JavaScript version of it. Let's start.

[caption id="attachment_286" align="alignnone" width="300"]Application Application[/caption]


Multiple buttons in a form

The first challenge you will face is to have multiple buttons in a form. As you know, you usually have a submit button in a form, but in this case we should have more than one. This is a problem well solved by ASP.Net MVC by using a derived type of ActionNameSelectorAttribute.

Here you can find a possible implementation of this attribute:

As you can see, basically what we are doing is checking the value of the button is the same than the name of the method in case the action was triggered by a button with the name specified in the SubmitButtonActionName property.

To use this attribute we need to make changes in both the controller and the view. Let's see both changes and I will explain them later:

As you can see, in the view we are setting the name and the value of an extra submit button. In the controller we should decorate an action with the MultipleActionButton attribute, setting the SubmitActionName property with the same value that we've specified in the name attribute of the button. The name of the action must be the same as the value of the value attribute of the button.

Passing parameters

In our application, we need to tell our backend what ingredient we want to add. Using the previous attribute is not enough, unless we want to end with a lot of actions doing the same thing (wich doesn't seem a very good idea). So we need to somehow pass a parameter to the controller, telling it with ingredient we want to add from the results search or we want to remove from the list of ingredients. Here we have to be more imaginative and combine an ActionNameSelectorAttribute with an ActionFilterAttribute. Let's start taking a look at the view:

What are we doing here is creating a value of the button like UseIngredient-0 for the first item in the list, UseIngredient-1 for the second one, etc. If we do that, we can no longer use the implementation of the ActionNameSelectorAttribute recently explained because we'd need an action called UseIngredient-0, an action called UseIngredient-1, etc. So, we need a new implementation of the attribute that extracts the real name of the action. Something like this:

Basically is the same code than the previous implementation, but we remove the parameter information before doing the comparison.

What problem do we have now? Well, right now our actions don't know anything about the parameter, we need to do something to pass the parameter to the action.

ActionFilters to the rescue

An ActionFilter is a mechanism to add custom behavior before or after an action is executed. In our case what we want is to inject the value of the parameter in the action. To do that we can write an ActionFilter like this one:

In this piece of code we are retrieving the original value of the button which triggered the action (line 11), extracting the value of the parameter (line 12), converting the value (a string) to the appropiate type (line 14) and injecting the value into the action (line 15).

Finally we just have to appy this filter in an action using the appropiate values:

Wrap up

In this article we've seen how to use different buttons in the same form and pass parameters to the controller without using any kind of JavaScript. Remember that if you want to use some code when JavaScript is not enable you can use the noscript tag. You can find the source code in this github repo.


Yesterday I attended XPDay 2015, an event organized by the people of the eXtreme Tuesday club. It was the first time I was there and it was great to share a day with such a bunch of talented people like Allan Kelly, Nat Pryce, Steve Freeman, Giovanni Asproni or Liz Keogh among others.

In this time where a lot of people talk about agile hangover and say that agile doesn't work it's good to return to the origins and talk about eXtreme Programming.

The event started on Monday where they started to configure the agenda (I couldn't be there). Hopefuly they reserved some time on Tuesday to propose more talks. The format of the event was an Open Space.

I started the day attending a talk about UX and agile. We started defining what is UX and the difference between UX, Design and Usability. Then, people exposed how they work with UX people, how are they integrated in their teams and how they decide the next feature to develop. Jasper (the host) ended drawing an interesting funnel describing the different artifacts using in UX.

[caption id="attachment_280" align="alignnone" width="300"]UX funnel UX funnel[/caption]

I continued with a session called "Strategies on identifying domain models on legacy databases", proposed by Sabrina. Sabrina explained the current situation in her company, where, after merging with another company, they've inherited a legacy codebase which must be maintained and evolved, and they struggle on identifying a domain model from a bunch of databases not very well designed. We ended up with a list of things they can do to soften this (probably painful) process, that contains items like: identify the seams, use test environments, define characterization tests, look at changes on the outputs when introducing an input (and invest in tooling to automate it), monitoring usage, make a complete rewrite.

I've explained briefly my last work at Skills Funding Agency, where we are replacing an old system with a new one, how we keep the data on both systems synchronised and how both systems were alive at the same time for some months.

This is the kind of sessions I love to see in an Open Space where the host, instead of trying to explain a topic, asks for help. I hope Sabrina took some useful ideas from the session.

My third session was about thinking out of the box talking about how the product manager and the delivery team can work together. We started defining what is and what isn't a product, and we talked about how can collaborate to make a successful product. I didn't get too much from this session, but probably it was because the session was just after lunch. :-)

After that I attended a session proposed by Monika Turska about unhealthy products and how can we get rid of them. It was an interesting session where we talked about what we think what an unhealthy product is, things to consider when deciding if we have to abandon that project, and how can we do that. It's an interesting topic because as a company developing a portfolio of products you can lose a lot of money extending the live of some of them.

And finally I attended a session proposed by Giovanni Asproni called "Where has design gone?". Giovanni explained that some years ago people talked so much about architecture and design (upfront design, yes) and now people are not talking too much about design, and a lot of applications and codebases suffers from a lack of it. Some people mentioned that we are too focused on deliver features and we are going too fast to be able to think on design. Some people mentioned that we are skipping the refactoring part of TDD and that we've forgotten the meaning of the last two Ds. And some people mentioned that some frameworks like Rails maked those kind of decisions for us and we only have to fill in the gaps. Finally a couple of talks were mentioned: Simple made easy, and Java the unix way.

I had a great time at the conference, I'll try to attend some of the eXtreme Tuesday meetups and I hope I can be part of XP Day 2016.

The #NoEstimates movement is an interesting thing. If you haven't heard about it, take a look at #NoEstimates hashtag on Twitter, read these articles from Ron Jeffries (article 1, article 2, article 3) or buy (and read) the excellent book written by Vasco Duarte.

Let me make a summary for you: we fail making estimations. A lot. Take a look at famous CHAOS report if you need more evidence. Agile methodologies improved the numbers a bit but not too much (I bet the numbers are going down again because everybody is "agile").

Failing making estimations is not bad per se. The problem is the use we give to estimates:

[caption id="attachment_274" align="alignnone" width="502"]Estimates as deadlines Estimates as deadlines[/caption]

And we usually do that when we know less about the project: before even starting it. Before starting a project maybe we have a good definition of the most important stories, but the others can be vaguely defined (which is good). So we should be able to estimate with some precision the first and most important stories of the backlog (which we already know we must implement), and we will produce very bad estimations for the vast majority of the backlog. Why are we estimating then?

There can be multiple answers to the previous question: we need to know if you will be able to do the job within the budget, we need a release date, etc. And all of them are fair. But the problem is that our answer is just a guess, it is not based on any empirical data. You can claim that you are an expert project manager with more than 15 years of experience, but you have to admit that this is the first time that you're developing this project for this client with this team.

Don't our clients deserve a better answer? Yes, of course. Why don't we use real data from the project to answer these questions? Why don't we talk with our client and ask to work on the project for three or four sprints and use that data to make a projection? Why don't we work on splitting stories in order to make them similar in size and avoid estimating and just count them (and increase our knowledge about them)?.

It sounds quite reasonable to me.

My main conclusions are the following ones:

  • Question all your practices. Estimations, retrospectives, testing, daily meetings... all of them. Are the practices giving value to you and your client?
  • Build trust with your client. If your client trusts you, probably you won't need estimations.
  • Work on slicing stories. Estimate them after doing it if you want, but you'll gain knowledge about them, and about your domain if you do it. You'll be able to deploy them sooner and shrink the feedback loop.
  • Use real data to make predictions. Real data beats guesses. Always.