Spain and Germany are dominating with an iron fist last Champions League editions. After a lot of investment, a Premier League team is ready to conquer the longed for trophy. In this article we’ll demonstrate this fact.

The data

We’ve borrowed the data for this study from the UEFA’s official page. If you go to this address http://www.uefa.com/uefachampionsleague/season=2011/matches/all/index.html# you’ll see all the matches played in the season 2010/2011. Change the year in the query string to see another year’s results. With this information we’ve created a very simple CSV file which summarises the competition from the round of 16. We’ve taken the resuls from the season 2004/2005 because is the first season with the actual format (the knockout rounds start at the round of 16). The CSV file looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
year,round,team1,team2,winner
2004,8,Lokomotiv Movska,Monaco,Monaco
2004,8,Celta,Arsenal,Arsenal
2004,8,Bayern,Real Madrid,Real Madrid
2004,8,Sparta Praha,Milan,Milan
2004,8,Stuttgart,Chelsea,Chelsea
2004,8,Porto,Man. United,Porto
2004,8,Real Sociedad,Lyon,Lyon
2004,8,Deportivo,Juventus,Deportivo
2004,4,Porto,Lyon,Porto
2004,4,Milan,Deportivo,Deportivo
2004,4,Real Madrid,Monaco,Monaco
2004,4,Arsenal,Chelsea,Chelsea
2004,2,Monaco,Chelsea,Monaco
2004,2,Porto,Deportivo,Porto
2004,1,Monaco,Porto,Porto

Loading the data

We have a CSV file, we’re going to use F#… CSV type provider to the rescue!!

We’re going to use Paket so add this lines to your paket.dependencies file

1
2
3
source https://nuget.org/api/v2

nuget FSharp.Data

And this line to the paket.dependencies file of your project:

1
FSharp.Data

Run the install command of paket and you’ll have FSharp.Data referenced in your project. To use it from your script file, we have to reference it:

1
#r "../packages/FSharp.Data/lib/net40/FSharp.Data.dll"

and open it:

1
open FSharp.Data

Now we’re ready to load the data. To keep things simple we’re going to use just a couple of types to store the data

1
2
type Team = string
type RoundGame = {Year: int; Round: int; Team1:Team; Team2:Team; Winner:Team}

Let’s use the fantastic CSV type provider to load all the games:

1
2
3
4
5
6
type ChampionsLeague = CsvProvider<"year,phase,team1,team2,winner", Schema = "year(int),phase(int),team1,team2,winner">

let file = __SOURCE_DIRECTORY__ + "\Data\champions.csv";
let text = File.ReadAllText(file)

let championsLeagues = ChampionsLeague.Load(file);

And finally, let’s parse the data into the recently defined types:

1
2
3
let champions =
    championsLeagues.Rows
    |> Seq.map(fun r -> {Year = r.Year; Round = r.Phase; Team1 = r.Team1; Team2 = r.Team2; Winner = r.Winner})

Glories from the past

First of all let’s review how many times a Premier League team has won the Champions League in the last twelve years. Premier League teams are very powerfull and they play a great football, I’m sure we’ll find a lot.

1
2
3
4
5
let championsWonBy teams =
    champions
    |> Seq.filter(fun f -> f.Round = 1 && teams |> Array.contains f.Winner)

championsWonBy [|"Liverpool"; "Man. United"; "Chelsea"; "Man. City"; "Arsenal"|] 

Not too bad. Liverpool in 2005, Manchester United in 2008 and Chelsea in 2012 won the Champions League. So, every 3.5 years a Premier League team wins the Champions League. Maybe 2016 will be the next time?

Let’s compare that with other leagues, I’m sure Premier League will be the strongest one!

Germany and Portugal have won 1 cup, Italy 2 and Spain 5. That puts Premier League in second position, not bad!

As I’m F.C. Barcelona fan, let me see how many Champions League we won in the past twelve years… 4. One more than the whole Premier League… Well, we have Messi. It’s like cheating a bit… ;)

Round of 16

In the round of 16 there were two teams representing Premier League: Arsenal and Manchester City. Arsenal played against Barcelona and they lost. Let’s study their last matches to see if that was an unexpected result:

1
2
3
4
5
let roundWith team1 team2 =
    champions
    |> Seq.filter(fun f-> (f.Team1 = team1 && f.Team2 = team2) || (f.Team1 = team2 && f.Team2 = team1))

roundWith "Arsenal" "Barcelona"

That gives us three results: Final of 2006, quarter-finals of 2010 and round of 16 of 2011. In all theses matches Barcelona won, so it wasn’t a great surprise that this year they’ve won too…

Let’s study Manchester City a bit. We can start analysing how many times they’ve played a quarter-final match.

1
2
3
4
5
6
let timesInPhase phase team =
    champions
    |> Seq.filter(fun g -> (g.Team1 = team || g.Team2 = team ) && g.Round = phase)
    |> Seq.length
    
"Man. City" |> timesInPhase 4

Wow, they never played a quarter-final game! Let’s study then their games in round of 16. They are a very rich and powerful team, so I guess they have played a lot of games in that round.

1
"Man. City" |> timesInPhase 8

Mmmmm… only two. Let’s take a look at those games

1
2
3
4
5
let gamesInRound round team =
    champions
    |> Seq.filter(fun g -> (g.Team1 = team || g.Team2 = team ) && g.Round = round)

"Man. City" |> gamesInRound 8

They played both times against Barcelona and they lost… So, Barcelona is definetively a rival to avoid in the next round.

Manchester City plays against Paris St Germain. Have they played any game before?

1
roundWith "Man. City" "Paris"

No, they’ve never played before. Let’s take a look at PSG games in quarter finals:

1
"Paris" |> gamesInRound 4

PSG has played three times in quarter finals. Two against Barcelona (2013 and 2015) and one against Chelsea (2014). They have lost the three of them, so it could be a good team to play against.

Semifinals

Let’s imagine Manchester City wins PSG at quarter-final round. Which team could be the best rival to play against? Let’s see if anyone of those teams have never played a semi-final round.

1
2
3
4
let rivals = ["Barcelona"; "Real Madrid"; "Atletico"; "Wolfsburg"; "Bayern"; "Benfica"]

rivals
|> Seq.filter(fun f -> timesInPhase 2 f = 0)

Wolfsburg and Benfica have never played a semi-final. Actually, only they have never played a final (remember, in the last twelve years). And yes, this is the first time they are playing a quarter final. So let’s study a bit their rivals to see which of them have more chances to win.

Wolfsburg plays agains Real Madrid. Let’s see how Real Madrid played the quarter final round:

1
2
"Real Madrid" |> gamesInRound 4 |> Seq.length
"Real Madrid" |> gamesInRound 4 |> Seq.filter(fun f-> f.Winner <> "Real Madrid") |> Seq.length

They played six times and they won five. They don’t seem a good team to play against.

Lets take a look at Bayern, Benfica’s rival. They’ve played eight times the quarter final round, and they’ve been eliminated three times. In 2005 they lost against Chelsea (finalist). In 2007 they lost against Milan (semi-finalist) and in 2009 they lost against Barcelona (winner). So, although they’ve lost against great teams, it looks like Benfica has more chances to win them than Wolfsburg to win Real Madrid.

In case nor Benfica neither Wolfsburg can win their games, who will be a good rival? Well, we can say that the best rival is the one that has less percentage of winnings in semi-finals. Let’s calculate it.

1
2
3
4
5
6
7
8
9
let winsInRound round team =
    let games = team |> gamesInRound round
    float (games 
            |> Seq.filter(fun f -> f.Winner = team)
            |> Seq.length ) / float (games |> Seq.length)

rivals
|> Seq.map(fun f -> f, f |> winsInRound 2)
|> Seq.sortBy snd 

Looks like Real Madrid is the worst team playing semi-finals. So, if Benfica or Wolfsburg can’t pass to semi-final, maybe Real Madrid could be a good rival.

Final

Let’s get the teams with worst percentage of victories in a Champions League final.

1
2
3
rivals
|> Seq.map(fun f -> f, f |> winsInRound 1)
|> Seq.sortBy snd

Well, Barcelona is not a very good rival… They’ve played four finals and they’ve won all of them. Something similar applies to Real Madrid, but just for one final. Bayern only wins one of every three finals they play. And Atletico only has played one final and they’ve lost it. So, if Manchester City gets to the final, Atletico could be a good rival.

Recap

Well, taking a look at the last twelve years results it’s quite clear that no Premier League team will win this Champions League edition. But there’s a little chance to win if they play well and have luck in the next draw. Only one thing seems clear: don’t play against Barcelona!! :-)

A sequence is a list of potential values (all of them of the same type) computed on demand.

Sequence creation

As with arrays there are several ways to create a sequence.

Create from a range expression

You can create a new sequence from a range expression. In this case, instead of using

1
[|
and
1
|]
you should use
1
{
and
1
}

    let numbers = {1..20}

Create from a sequence expression

You can use an expression inside brackets (and after seq keyword) to create a new sequence:

    let numbers = seq {for i in 1..20 do yield i}

We can write a compacted version using the forward arrow followed by the value to yield:

    let numbers = seq {for i in 1..20 -> i}

Create using a function in the Seq module

As with arrays, there are some functions in the Seq module to create a sequence.

We can use Seq.init to initialise a sequence of n elements.

    let numbers = Seq.init 20 (fun i -> i * 2)

Or we can use initInifinite to create an infinite sequence.

    let numbers = Seq.initInfinite (fun i -> i * 2)

Finally, we can also create a sequence directly from an IEnumerable

    let files =
        System.IO.Directory.EnumerateDirectories("C:\\Windows")
        |> Seq.map(fun f -> f.Length)

Operations in Seq module

All the operations we’ve seen in the last article about arrays are applicable to this one, changing Array by Seq (i.e. from Array.iter to Seq.iter) and the behavior is the same. Let’s see some other functions that can be valuable.

Seq.unfold

Unfold is a way to generate a sequence from a generator function. You can see it as an extension to Seq.initInfinite. The function takes two parameters: the first one is the generator function, which must return an option tuple with the next element of the sequence and the next state value to be passed to the next iteration. The second parameter is the initial state.

For example, if we want to generate the square of a number up to a certain threshold we can do something like this

    >let squareUpTo (top) =
        1
        |> Seq.unfold(fun i -> if ( i*i > top ) then None
                                else Some(i*i, i + 1))
                                
    1 4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400 441 484 529 576 625 676 729 784 841 900 961 

The first 1 is the initial state, in this case 1. This initial state is passed to unfold function. The generator checks if the result will be greater than the threshold and, if it is, returns None. If it isn’t, returns a tuple with the next element of the sequence (in this case de square) and the next state to be passed to unfold (in this case the next number).

Seq.find

Find takes a boolean as a parameter and returns the first element of the sequence that where the function returns true. If it can’t find any result, returns an exception.

Following the previous example if we do

    >let j = 
        squares
        |> Seq.find(fun i -> i < 0)

We get the following error System.Collections.Generic.KeyNotFoundException: Exception of type ‘System.Collections.Generic.KeyNotFoundException’ was thrown.

On the other hand, if we do

    let j = 
        squares
        |> Seq.find(fun i -> i > 200 )

We get

    val j : int = 225

If we don’t want to get an exception, we can use the tryFind function, which returns an option type. Then, the follwing code

    let j = 
        squares
        |> Seq.tryFind(fun i -> i < 0)

Returns

    val j : int option = None

Seq.pick

Given a sequence takes the first result of the function provided as a parameter that is not a None (the function must return an option)

    let j = 
        squares
        |> Seq.pick(fun i -> if (i > 200) then Some(i) else None)
        
    val j : int = 225

If there isn’t any valid value, it throws an exception. If you want to avoid this, use tryPick.

Seq.findIndex

Same as Seq.find but returns the index of the element, not the element itself. If you want to avoid the exception, use tryFindIndex.

    let j = 
        squares
        |> Seq.findIndex(fun i -> i > 200 )

    val j : int = 14

Seq.exists

Returns true if the function supplied returns true for any of the values of the sequence.

    let j =
        squares
        |> Seq.exists(fun i -> i = 100)

    val j : bool = true

Seq.groupBy

Groups a sequence by the results of the function supplied. Returns sequence of key/value pairs.

    let howBigIsTheNumber i =
        if i < 100 then "Small"
        elif i < 500 then "Medium"
        else "Big"

    let groupedSquares =
    squares
    |> Seq.groupBy(fun i -> howBigIsTheNumber i)

Seq.disctint

Given a sequence, return only the unique elements

    >let random = System.Random()
     let randomNumbers = Seq.init 50 (fun i -> random.Next(1, 5))
     let disctintNumbers =
        randomNumbers
        |> Seq.distinct
        |> Seq.iter (fun i -> printf "%d " i )

    3 2 1 4

If we need to get the disctint elements given a function, we can use disctintBy

Seq.pairwise

Given a sequence creates a sequence of tuples. The first tuple will contain the first and second element of the original sequence, the second tuple will contain the second and third, and so on.

    >let numbers = {1..5}
    let tNumbers = 
    numbers
    |> Seq.pairwise
    |> Seq.iter (fun i -> printf "(%d, %d) " (fst i) (snd i) )
    
    (1, 2) (2, 3) (3, 4) (4, 5) 

Seq.windowed

Given a sequence creates a sequence of arrays of the length supplied. The first array will contain the elements between the first element of the original sequence and length. The second one will contain the elements between the second element of the original sequence and length + 1, and so on.

    >let numbers = {1..10}
     let tNumbers = 
        numbers
        |> Seq.windowed 5
        |> Seq.iter (fun i -> printf "%s\n" ( String.Join(",", i)) )
        
    1,2,3,4,5
    2,3,4,5,6
    3,4,5,6,7
    4,5,6,7,8
    5,6,7,8,9
    6,7,8,9,10

Seq.collect

Given a sequences applies a function to each element that creates a sequence and concatenates the results:

    >let numbers = {1..5}
    numbers
    |> Seq.collect (fun i -> {i*10..i*10+5} )
    |> Seq.iter (fun i -> printf "%d " i )
    
    10 11 12 13 14 15 20 21 22 23 24 25 30 31 32 33 34 35 40 41 42 43 44 45 50 51 52 53 54 55 

Summary

In this article, we’ve continued taking a look at data structures in F#, in this case sequences. We’ve seen some other functions that can also be applied to other data structures.

Arrays are one of the basic data structures in F#. In this article we’re going to see an introduction of what can we do with them.

Creation

There are several ways to create an array in F#

Create from a literal

We can create an array with a predefined set of values. To do that, we just need to specify the values separated by semicolons and wrapped between

1
[|
and
1
|]

    >let numbers = [|1;2;3;4|]
    
    val numbers : int [] = [|1; 2; 3; 4|]

Create a range

We can create an array of predifined values using the range notation:

    >let numbers = [|100..120|]

    val numbers : int [] =
        [|100; 101; 102; 103; 104; 105; 106; 107; 108; 109; 110; 111; 112; 113; 114; 115; 116; 117; 118; 119; 120|]

In the previous code, we are creating an array of numbers between 100 and 120.

Wan can specify the gap between those numbers:

    >let numbers = [|100..3..120|]

    val numbers : int [] = [|100; 103; 106; 109; 112; 115; 118|]

And we can use an expression inside the brackets too:

    >let numbers = [|for i in 100..120 do
                    yield i * 2|]
                    
    val numbers : int [] =
    [|200; 202; 204; 206; 208; 210; 212; 214; 216; 218; 220; 222; 224; 226; 228; 230; 232; 234; 236; 238; 240|]

Create from a function in the Array module

We can create an array using the Array.create function. This function takes two parameters, the number of positions you want to create and the value you want to use.

    >let numbers = Array.create 4 5

    val numbers : int [] = [|5; 5; 5; 5|]

Another function we can use is init, which is very similar to create but instead of taking the value it takes a function to create the different values

    >let numbers = Array.init 4 (fun i -> i * 2 )

    val numbers : int [] = [|0; 2; 4; 6|]

We can also use zeroCreate to create an array filled with zeros

    >let numbers : int[] = Array.zeroCreate 4

    val numbers : int [] = [|0; 0; 0; 0|]

Finally we can create an array from other array or IEnumerable

    >let files =
        System.IO.Directory.EnumerateDirectories("C:\\Windows")
        |> Array.ofSeq

Accessing elements in an Array

It’s easy to access an element in an Array. Just use the following notation:

    >let number = numbers.[0]

You must take into account that arrays are 0 based.

Operations in Array module

There are more than 70 functions in the Array module. Let’s see some of the most used.

Array.map

Takes an array and returns another array of the same length with the result of applying a function to each element.

    >let numbers = [|1..5|]
    >let squares = 
        numbers 
        |> Array.map (fun i -> i * i)
    
    val squares : int [] = [|1; 4; 9; 16; 25|]

Array.mapi

Is very similar to Array.map but it provides the index of each element.

    >let letters = [|'a';'b';'c';'d'|]
    >letters |> Array.mapi (fun i l -> sprintf "The letter at index %i is %c" i l)
    val letters : char [] = [|'a'; 'b'; 'c'; 'd'|]
    val it : string [] =
    [|"The letter at index 0 is a"; "The letter at index 1 is b";
        "The letter at index 2 is c"; "The letter at index 3 is d"|]

Array.iter

Iterates and call a function with each element, but it doesn’t returns anything (only has side effects). We can user Array.iteri if we need the index.

    >let letters = [|'a';'b';'c';'d'|]
    >letters |> Array.iteri (fun i l -> printf "The letter at index %i is %c" i l)

Array.filter

Given an array only returns those elements on which the function applied returns true.

    >let numbers = [|1..20|]
    >let evenNumbers = 
      numbers
      |> Array.filter (fun n -> n % 2 = 0)
  
    val evenNumbers : int [] = [|2; 4; 6; 8; 10; 12; 14; 16; 18; 20|]

Array.choose

Given an array only returns those elements on wich the function applied returns a ‘Some’ result. So, the function applied must return an option type.

    >let numbers = [|1..20|]
    >let evenNumbers = 
      numbers
      |> Array.choose (fun n -> if ( n % 2 = 0 ) then Some(n) else None)
  
    val evenNumbers : int [] = [|2; 4; 6; 8; 10; 12; 14; 16; 18; 20|]

Array.sum

Sum the values of the array. The type of the array must support addition and must have a zero member.

    >let numbers = [|1..20|]
    >let sum = 
        numbers
        |> Array.sum
  
    val sum : int = 210

Array.sumBy

Same as sum but takes a function that select the element to sum.

Let’s start the example defining a function to get random strings

    >let random = System.Random()
    >let randomStr len = 
        let chars = "ABCDEFGHIJKLMNOPQRSTUVWUXYZ0123456789"
        let charsLen = chars.Length

        let randomChars = [|for i in 0..len -> chars.[random.Next(charsLen)]|]
        new System.String(randomChars)`
    
    val randomStr : len:int -> System.String  

Now, create some random strings

    >let strings = 
        [|10..15|]
        |> Array.map randomStr
        
    val strings : System.String [] =
        [|"ZEQNA1HUXS3"; "1C8K1Z5UO58A"; "FT9O8MDAVGFO4"; "G85O8P1NSLE6HX";
            "63XOR0DL4ANJKUS"; "JV6VQW09FPRHUUH4"|]

And finally, sum the length of those strings

    >let sum = 
        strings
        |> Array.sumBy (fun s -> s.Length)
        
    val sum : int = 81

Array.sort

Given an array, returns the array sorted by the element. If we use sortBy, we can specify a function to be used to sort

    >let sortedStrings =
        strings
        |> Array.sort
        
    val sortedStrings : System.String [] =
    [|"1C8K1Z5UO58A"; "63XOR0DL4ANJKUS"; "FT9O8MDAVGFO4"; "G85O8P1NSLE6HX";
        "JV6VQW09FPRHUUH4"; "ZEQNA1HUXS3"|]

Array.reduce

Given an array, uses the supplied function to calculate a value that is used as accumulator for the next calculation. Throws an exception in an empty input list.

    >let strings = [|"This"; "is"; "a"; "sentence"|]
     let sentence =
     strings
        |> Array.reduce (fun acc s -> acc + " " + s)
        
   val sentence : string = "This is a sentence"

Array.fold

Same as reduce, but takes as a parameter the first value of the accumulator.

    >let strings = [|"This"; "is"; "a"; "sentence"|]
     let sentence =
        strings
        |> Array.fold  (fun acc s -> acc + " " + s) "Fold:"

Array.scan

Like fold, but returns each intermediate result

    >let strings = [|"This"; "is"; "a"; "sentence"|]
     let sentence =
        strings
        |> Array.scan  (fun acc s -> acc + " " + s) "Scan:"
        
    val sentence : string [] =
        [|"Scan:"; "Scan: This"; "Scan: This is"; "Scan: This is a";
            "Scan: This is a sentence"|]

Array.zip

Takes two arrays of the same size and produce another array of the same size with tuples of elements from each input array.

   >let colorNames = [|"red";"green";"blue"|]
    let colorCodes = [|"FF0000"; "00FF00"; "0000FF"|]
    let colors =
       Array.zip colorNames colorCodes
        
   val colors : (string * string) [] =
       [|("red", "FF0000"); ("green", "00FF00"); ("blue", "0000FF")|]

There’s a very similar function called zip3, wich take three array as inputs, and another call unzip (and unzip3) with takes an array of tuples and decomposes it in two arrays of single values.

Summary

We’ve seen the basics of the Array module. We’ve seen how to create arrays and some of the most used functions in the Array module.

In the previous article we've seen how to parse a git log file. We ended up having an array of commits:

TODO: Raw content of a Gist file.

Let's start extracting some useful statistics from it.

The first thing that can come into our mind is to know how many commits we have done to the repository. That's pretty easy to do:

As you can see, we are using the pipe forward operator (|>) and the Array.length function to extract this information.

Now it's time to calculate the number of entities changed, that is the number of times that we commited a change in an entity:

As you can see, we are using Array.collect to concatenate the arrays of Files inside each Commit and count them.

Let's continue with the number of entities that exist in the repository:

The code is very similar to the previous one, but before counting we are grouping the array by the file name.

It's time to calculate the number of authors. We can start doing something like this:

But this information is not totally accurate. If we take a look at the contents of the array (remove the last line and execute the code another time) we'll see that some authors have been comitting changes using two different accounts. Let's try to consolidate the names.

First of all we need a map between the name existing in the commit information and the real name:

And now let's use this information to extract the real number of authors:

First of all we are defining a function to consolidate the names. This function is using pattern matching to see if the user name of the commit is one of the names that we want to convert to a real name. If it's one of them, we make the conversion. If not, we just return the name.

And then, we use the previous code with a couple of changes. The first one is to use the brand new consolidateNames function (line 9) and the second one is to use the Array.distinct function (line 10) to not return a name more than once.

And finally let's calculate the number of revisions of each file. We can do that very easily:

We are creating an array with all the files and grouping it by file name. Then, we are creating a new array that contains a tuple with the name of the file and the number of times that the file appears. Finally we sort the array by that number to know the files that changes more.

Summary

In this article we've seen how to extract some basic statistics from a git log file. And that doing it is really really easy. In future posts we'll see how to extract more complex information.

I've recently read the excellent book Your Code as Crime Scene by Adam Tornhill. In this book, Adam explain several techniques to extract very useful information from the commits in the code repository to help you to understand your code, your dependencies and your organisation. If you haven't read the book, please do yourself a favor and get a copy as a Christmas present.

On the othe hand, this week I've attended the fantastic Progressive F# Tutorials at Skills Matter. There were 8 awesome workshops from people like Jamie Dixon, Tomas Petricek or Ian Russel explaining how you can use F# in your daily work. You can read a very good summary by the other Jamie Dixon here.

So, I've decided to improve my F# skills using it to do some of the analysis that Adam does in his book using his own tool code-maat.

Creating a useful log

The first thing we need is to create a log that we can parse easily and that has all the information we need. So please, use your favorite git command line tool to navigate to the base folder of the repository that you want to analyze and type the following command:

This command will write into gitLog.log something similar to this

The basic structure here is that we have each commit separated by a line break. In every commit, we have a line with the commit information (hash, author, date and message) and several lines with the files that have changed (additions, deletions, file path). If the commit is a merge, all this structure is preceeded by another commit information line with the merge information.

Parsing the log file

So first of all, let's translate this structure into F# tpyes:

As we can see we have the CommitInfo that is a Record type formed by three strings and a DateTime and a CommitedFile that is also a Record type formed by two optional integers and a string. The integers are optional because you can have some file in a binary format and git can't count the additions and deletions. In this case the log will display a "-" instead of a number. Finally, we have a Record type called Commit that has a CommitInfo field an array of CommitedFile. Prety straightforward.

Let's read the content of the file and split it in the different commits to be able to parse it.

As you can see we start defining a constant in F# using the [<Literal>] annotation. After that we read the file using  .Net standard libraries. And finally we split the content of the file unsing a double line break as a separator. So far so good.

Now that we have an array with all the commits (still in text format), lets parse each of this chunks of data. First of all we need to know which of those lines are the commit info and which of them are the commit lines.

The first thing we do is to split the commit lines removing any empty line that we can possibly have. After that, we take the commit info line as the last line that is a commit info line (a line that starts with the hash information) removing all the merge info that we don't neeed. Finally, we take the file lines as all the lines that are not a commit info line.

It's time to convert our commit info line into a CommitInfo object, much more convenient for our purposes.

As you can see, we are using the magic of Type providers to parse the line and extract the information. In this case, using the CsvProvider, we are defining that the third column will be a date using the Schema parameter. We just need to fill a CommitInfo object with the information of the first row.

And finally we need to parse the information of the commit lines. We'll use a very similar process:

The idea is the same, but we just need to iterate over all the commit lines. In this case, the format of the csv is a bit different (tabs are used as separators) and we use the Schema parameter to indicate that the two integers are optional.

Finally, we just need to create the Commit object:

This is the whole function code:

The last bit, is to use this function in all the commits from the file:

Summary

In this post we've seen how easy is to parse a git log file using F# and type providers. In future posts we'll see how can we extract information from this data. You can see the code of this post in this gist. See you soon!