Microsofts LINQ presents a wonderful sql-like syntax for querying arbitrary collections of objects. Under the hood, however, that sql-like syntax is transformed into what is called a monoid comprehension, which, scary though it may sound, is basically one or more nested loops.
This nested loopiness means that if you do anything more complex than filtering or mapping on a collection, for example a join, you will rapidly end up in a world of performance pain and your query starts performing O(n^m), where m is the number of nested loops.
Consider the following LINQ query:
var q = from c in customers
join o in orders on c.Key equals o.Key
select new {c.Name, o.OrderNumber};
This code will get translated into something like this:
foreach (var c in customers)
foreach (var o in orders)
if (c.Key == o.Key)
yield return new { c.Name, o.OrderNumber };
For small collections, this approach is fine, but when they start to get larger, the number of times through the loop is going to get out of control.
Relational databases sometimes have to perform the same algorithm, but they also provide the possibility of indexing the collections, which in the case of the example above, would eliminate the necessity of scanning through all the orders checking each to see if its key matches a customer - instead, the index would be used to rapidly identify the small number of matching orders.
You can bet your bottom dollar that Microsoft has something in the works to address this issue - LINQ to objects is basically useless without indexes. In the mean time, however, there is i4o, a project to do exactly that:
Introducing i4o
Codeplex - i4o
Posted by Damien Morton in
.Net,
Database,
Open Source /
Digg this /