Thinking Functionally: What is LINQ really?

LINQ (the grammar, not the fluent library) is an enormously powerful feature of C#. It allows for very succinct processing of collections and data-sources from many sources. But it has a much more fundamental property: it is C#'s way of doing monads.

Monad as a term is almost as famous for its mystique as it is in functional programming circles as a super-powered compositional tool.

Why should I care about monads?

Because a monad is a monoid in the category of endofunctors, so what's the problem?. Only kidding. This is a bit of a joke on the way that category theorists tend to describe monads. And it's true their origins are from category theory, but then, so is polymorphism, and type theory, and actually all of programming. So let's not get to caught up on that.

What are monads for? Succinctly they're design patterns that allow you to stop writing the same error prone boilerplate over and over again.

IEnumerable<A> is a monad it removes the need to write:

    foreach(var a in listA)
    {
        foreach(var b in listB)
        {
             yield return Process(a, b);
        }
    }

Instead you can write:

    var result = from a in listA
                 from b in listB
                 select Process(a, b);

Now that might seem like a small win. But for us functional programmers it's quite large, because the second example is an expression, whereas the first one isn't.

OK, let's quickly go through some other monads to see how it removes boilerplate:

   using static LanguageExt.Prelude;

   Option<string> optionA = Some("Hello, ");
   Option<string> optionB = Some("World");
   Option<string> optionNone = None;

   var result1 = from x in optionA
                 from y in optionB
                 select x + y;

   var result2 = from x in optionA
                 from y in optionB
                 from z in optionNone
                 select x + y + z;

The Option monad represents optional values, they can be in one of two states: Some(value) or None. In result1 the output is Some("Hello, World"), in result2 the output is None. Whenever a None appears in the LINQ expression the final result will always be None.

This is the shortcut for if. And it's an expression, which is good. The imperative equivalent would be:

    string valueA = "Hello, ";
    string valueB = "Hello, ";
    string valueNone = null;

    string result1 = null;
    if(valueA != null)
    {
        if(valueB != null)
        {
           result1 = valueA + valueB; 
        }
    }

    string result2 = null;
    if(valueA != null)
    {
        if(valueB != null)
        {
            if(valueC != null)
            {
               result2 = valueA + valueB + valueC; 
            }
        }
    }

You may think "So what?" I could just change it to: if(valueA != null && valueB != null && valueC != null) to make it more concise. And yes, in this trivial example you could. But just imagine a real world example where valueA, valueB, valueC are calls to functions that depend on the previous values and you'll see that there is a complexity to this. In fact it has a name: Cyclomatic Complexity; and this is what we're reducing with the Option monad.

Writing with expressions also removes silly mistakes. In fact whilst I was writing this I had written result1 = valueA + valueB + valueC; for the second example. That mistake couldn't happen with the first one.

And what about else? Should we have done something there? There is no compilation error, but a programmer looking at your code might not immediately know that there's a bug because you left off the else part of the statement. This is a major source of bugs in C#. So if you want to know why monads are useful, this is one reason right here.

So how does it work?

Like all good design patterns monads capture common behaviours so you make fewer mistakes. The problem most people have is that the rules that make monads what they are, are so abstract that it's hard to get a handle on them. LINQ is a syntax for monads and it works like so:

    from a in ma

This is saying "Get the value a out of the monad ma". For IEnumerable that means get the values from the stream, for Option it means get the value if it's not in a None state.

    from a in ma
    from b in mb

As before this is saying "Get the value a out of monad ma, and then if we have a value get the value b out of monad mb". So for IEnumerable if ma is an empty collection then the second from won't run. For Option if ma is a None then the second from won't run.

    from a in ma
    from b in mb
    select a + b;

select in monad land means put this value back into a new monad. So for IEnumerable it means create a new stream of values with the result, and for Option it means create a new option with the result it.

Let's look at how this is implemented:

    public static class EnumerableExtensions
    {
        public IEnumerable<B> Select<A, B>(this IEnumerable<A> self, Func<A, B> map)
        {
            foreach(var item in self)
            {
                yield return map(item);
            }
        }

        public IEnumerable<C> SelectMany<A, B, C>(
            this IEnumerable<A> self, 
            Func<A, IEnumerable<B>> bind, 
            Func<A, B, C> project)
        {
            foreach(var a in self)
            {
                foreach(var b in bind(a))
                {
                    yield return project(a, b);
                }
            }
        }
    }

So that's the definition for IEnumerable. Select allows this to work:

    from a in ma
    select a;

SelectMany allows this to work:

    from a in ma
    from b in mb
    select a + b;

You can see from the implementations how they capture the foreach behaviour I mentioned before. This is good. If I were to expand out the above expressions:

    from a in ma
    select a;

    // Is the same as

    ma.Select(a => a);

And:

    from a in ma
    from b in mb
    select a + b;

    // Is the same as

    ma.SelectMany(a => mb.Select(a), (a, b) => a + b);

Let's take a look at the implementation for Option:

    public static class EnumerableExtensions
    {
        public Option<B> Select<A, B>(this Option<A> self, Func<A, B> map) =>
            self.Match(
                Some: a  => map(a),
                None: () => None
            );

        public Option<C> SelectMany<A, B, C>(
            this Option<A> self, 
            Func<A, Option<B>> bind, 
            Func<A, B, C> project) =>
            self.Match(
                Some: a  => bind(a).Match(
                                Some: b  => project(a, b),
                                None: () => None),
                None: () => None 
            );
    }

What's happening here is that when the Option is in None state nothing is happening. If you look at SeletcMany then if self is None then None is returned; bind isn't invoked, and neither is project. But if self is Some(a) then bind is invoked.

If the return of bind(a) is None then project isn't run; but if it is Some(b) then project(a, b) is run.

This is capturing the if behaviour from before. The process is known as binding.

TBC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Thinking Functionally: What is LINQ really?

Why should I care about monads?

So how does it work?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally