-
-
Notifications
You must be signed in to change notification settings - Fork 440
Thinking Functionally: What is LINQ really?
LINQ (the grammar, not the fluent library) is an enormously powerful feature of C#. It allows for very succinct processing of collections and data-sources from many sources. But it has a much more fundamental property: it is C#'s way of doing monads.
Monad as a term is almost as famous for its mystique as it is in functional programming circles as a super-powered compositional tool.
Because a monad is a monoid in the category of endofunctors, so what's the problem?. Only kidding. This is a bit of a joke on the way that category theorists tend to describe monads. And it's true their origins are from category theory, but then, so is polymorphism, and type theory, and actually all of programming. So let's not get to caught up on that.
What are monads for? Succinctly they're design patterns that allow you to stop writing the same error prone boilerplate over and over again.
IEnumerable<A>
is a monad it removes the need to write:
foreach(var a in listA)
{
foreach(var b in listB)
{
yield return Process(a, b);
}
}
Instead you can write:
var result = from a in listA
from b in listB
select Process(a, b);
Now that might seem like a small win. But for us functional programmers it's quite large, because the second example is an expression, whereas the first one isn't.
OK, let's quickly go through some other monads to see how it removes boilerplate:
using static LanguageExt.Prelude;
Option<string> optionA = Some("Hello, ");
Option<string> optionB = Some("World");
Option<string> optionNone = None;
var result1 = from x in optionA
from y in optionB
select x + y;
var result2 = from x in optionA
from y in optionB
from z in optionNone
select x + y + z;
The Option
monad represents optional values, they can be in one of two states: Some(value)
or None
. In result1
the output is Some("Hello, World")
, in result2
the output is None
. Whenever a None
appears in the LINQ expression the final result will always be None
.
This is the shortcut for if
. And it's an expression, which is good. The imperative equivalent would be:
string valueA = "Hello, ";
string valueB = "Hello, ";
string valueNone = null;
string result1 = null;
if(valueA != null)
{
if(valueB != null)
{
result1 = valueA + valueB;
}
}
string result2 = null;
if(valueA != null)
{
if(valueB != null)
{
if(valueC != null)
{
result2 = valueA + valueB + valueC;
}
}
}
You may think "So what?" I could just change it to: if(valueA != null && valueB != null && valueC != null)
to make it more concise. And yes, in this trivial example you could. But just imagine a real world example where valueA
, valueB
, valueC
are calls to functions that depend on the previous values and you'll see that there is a complexity to this. In fact it has a name: Cyclomatic Complexity; and this is what we're reducing with the Option
monad.
Writing with expressions also removes silly mistakes. In fact whilst I was writing this I had written result1 = valueA + valueB + valueC;
for the second example. That mistake couldn't happen with the first one.
And what about else
? Should we have done something there? There is no compilation error, but a programmer looking at your code might not immediately know that there's a bug because you left off the else
part of the statement. This is a major source of bugs in C#. So if you want to know why monads are useful, this is one reason right here.
Like all good design patterns monads capture common behaviours so you make fewer mistakes. The problem most people have is that the rules that make monads what they are, are so abstract that it's hard to get a handle on them. LINQ is a syntax for monads and it works like so:
from a in ma
This is saying "Get the value a
out of the monad ma
". For IEnumerable
that means get the values from the stream, for Option
it means get the value if it's not in a None
state.
from a in ma
from b in mb
As before this is saying "Get the value a
out of monad ma
, and then if we have a value get the value b
out of monad mb
". So for IEnumerable
if ma
is an empty collection then the second from
won't run. For Option
if ma
is a None
then the second from
won't run.
from a in ma
from b in mb
select a + b;
select
in monad land means put this value back into a new monad. So for IEnumerable
it means create a new stream of values with the result, and for Option
it means create a new option with the result it.
Let's look at how this is implemented:
public static class EnumerableExtensions
{
public IEnumerable<B> Select<A, B>(this IEnumerable<A> self, Func<A, B> map)
{
foreach(var item in self)
{
yield return map(item);
}
}
public IEnumerable<C> SelectMany<A, B, C>(
this IEnumerable<A> self,
Func<A, IEnumerable<B>> bind,
Func<A, B, C> project)
{
foreach(var a in self)
{
foreach(var b in bind(a))
{
yield return project(a, b);
}
}
}
}
So that's the definition for IEnumerable
. Select
allows this to work:
from a in ma
select a;
SelectMany
allows this to work:
from a in ma
from b in mb
select a + b;
You can see from the implementations how they capture the foreach
behaviour I mentioned before. This is good. If I were to expand out the above expressions:
from a in ma
select a;
// Is the same as
ma.Select(a => a);
And:
from a in ma
from b in mb
select a + b;
// Is the same as
ma.SelectMany(a => mb.Select(a), (a, b) => a + b);
Let's take a look at the implementation for Option
:
public static class EnumerableExtensions
{
public Option<B> Select<A, B>(this Option<A> self, Func<A, B> map) =>
self.Match(
Some: a => map(a),
None: () => None
);
public Option<C> SelectMany<A, B, C>(
this Option<A> self,
Func<A, Option<B>> bind,
Func<A, B, C> project) =>
self.Match(
Some: a => bind(a).Match(
Some: b => project(a, b),
None: () => None),
None: () => None
);
}
What's happening here is that when the Option
is in None
state nothing is happening. If you look at SeletcMany
then if self
is None
then None
is returned; bind
isn't invoked, and neither is project
. But if self
is Some(a)
then bind
is invoked.
If the return of bind(a)
is None
then project
isn't run; but if it is Some(b)
then project(a, b)
is run.
This is capturing the if
behaviour from before. The process is known as binding.
TBC