I had a discussion today with a software architect who disagreed with me that LINQ to Objects should be used instead of foreach loops. My claim is that LINQ is better. He says that I shouldn’t make such a blanket statement, because LINQ is inefficient. I stand by my assertion.
Declarative Code is Easier to Read
From the initial development point of view, writing what you’re doing rather than how you’re doing it is more succinct, easier to read, and easier for others to maintain. Even if you don’t know LINQ, which is easier to decipher?
var developerNames = employees.Where(e => e.Role == Role.Developer)
.OrderBy(e => e.LastName)
.Select(e => e.FullName)
.ToArray();
or the 2.0 List<T> way
public class EmployeeLastNameComparer : IComparer<Employee>
{
public int Compare(Employee x, Employee y)
{
return x.LastName.CompareTo(y.LastName);
}
}
…
var employeeList = new List<Employee>();
employeeList.AddRange(employees);
employeeList.Sort(new EmployeeLastNameComparer());
var names = new List<string>();
foreach (var employee in employeeList)
{
if (employee.Role == Role.Developer)
{
names.Add(employee.FullName);
}
}
var developerNames = names.ToArray();
I much prefer to read a declarative code than iterative code. But the architect’s concern about performance is still valid. Let’s check the numbers. Using Rex, I will create an employees array with a thousand members.
string regexName = @"^[A-Z][a-z]+$";
RexSettings nameSettings = new RexSettings(regexName)
{
k = 1000,
encoding = CharacterEncoding.ASCII
};
var firstNames = RexEngine.GenerateMembers(nameSettings);
var lastNames = RexEngine.GenerateMembers(nameSettings);
Random randomRole = new Random();
var employees = firstNames.Zip(lastNames,
(f, l) => new Employee
{
FirstName = f,
LastName = l,
Role = (Role)randomRole.Next(3)
})
.ToArray();
I then timed both versions of the code using a StopWatch instance. Here are the results for the List<T> version.
00:00:00.0027104
00:00:00.0026925
00:00:00.0028171
00:00:00.0027148
00:00:00.0027858
And now the LINQ version.
00:00:00.0019929
00:00:00.0019156
00:00:00.0019871
00:00:00.0018066
00:00:00.0019116
Okay, clearly the LINQ version is optimized because the filter is occurring before the sort. It’s time to make the iterative code even uglier to squeeze some performance out of it (and remember, the LINQ version is functionally complete and rather clean).
var employeeList = new List<Employee>();
foreach (var employee in employees)
{
if (employee.Role == Role.Developer)
{
employeeList.Add(employee) ;
}
}
employeeList.Sort(new EmployeeLastNameComparer());
var names = new List<string>();
foreach (var employee in employeeList)
{
names.Add(employee.FullName);
}
var developerNames = names.ToArray();
Now that I’ve optimized the code, the results are much better.
00:00:00.0014110
00:00:00.0013445
00:00:00.0013484
00:00:00.0016516
00:00:00.0013563
Is it really worth that mess to squeeze out a little bit of time? It really depends on your application. But as you’ll see, that’s still not an excuse. Besides, if you really cared so much about performance, you would use arrays and write your own, optimized sort methods. If you must write those applications, you’re probably using C on an embedded device and this posting is moot. For business applications, readability trumps premature optimization.
Take Advantage of the Hardware
It’s much easier to take advantage of multiple cores so present in today’s computers using LINQ. On .NET 4 (or using the Parallel Extensions with 3.5), it is as simple as adding an extension method or two.
var developerNames = employees.AsParallel().AsOrdered()
.Where(e => e.Role == Role.Developer)
.OrderBy(e => e.LastName)
.Select(e => e.FullName)
.ToArray();
Again, I must warn against premature optimization. Due to the speed of the original statement, the overheard is not worth the cost. It will actually make the routine slower. However, if I add a Thread.Sleep(10) to the getter of Employee.FullName, the difference is 5 seconds without The AsParallel() option to 2.5 seconds with it. Needless to say, optimizing the ForEach version to take advantage of the hardware isn’t as elegant. Maintaining order requires using a specific overload of Parallel.ForEach. This situation is easy since we can use an array, but do not doubt that it requires much work in many situations. Here is the piece for the code that needs to be optimized.
string[] names = new string[employeeList.Count];
Parallel.ForEach(employeeList, (employee, loopState, elementIndex) =>
{
names[elementIndex] = employee.FullName;
});
What If LINQ Really Doesn’t Work In My Situation?
The important thing isn’t whether or not you use LINQ, the important thing is to have readable code. Stating what you’re doing is more maintainable than how you’re doing it. Encapsulation is key. I feel it’s best to start with the LINQ statement, then optimize if necessary. Here are the steps to to squeeze out the milliseconds by going from the LINQ version of the code above to the iterative version while hiding the complexity, thereby maintaining readability.
The first thing that should be done is to use the reduce chain refactoring.
public static class EnumerableEmployee
{
public static IEnumerable<string> DeveloperNames(this IEnumerable<Employee> employees)
{
return employees.Where(e => e.Role == Role.Developer)
.OrderBy(e => e.LastName)
.Select(e => e.FullName);
}
}
Then call the method with the following.
var developerNames = employees.DeveloperNames().ToArray();
Of course, with a name like that, why do you even need a variable? I love taking a piece of code and making it express the essence of what it is.
Since the implementation for DeveloperNames() is encapsulated in the extension method, it’s a simple matter to change that implementation for the iterative version of the code we were using earlier.
public static class EnumerableEmployee
{
public static IEnumerable<string> DeveloperNames(this IEnumerable<Employee> employees)
{
var employeeList = new List<Employee>();
foreach (var employee in employees)
{
if (employee.Role == Role.Developer)
{
employeeList.Add(employee);
}
}
employeeList.Sort(new EmployeeLastNameComparer());
var names = new List<string>();
foreach (var employee in employeeList)
{
names.Add(employee.FullName);
}
return names;
}
}
Conclusion
The vast majority of code I come across that either can be written in LINQ or refactored to LINQ has no noticeable, negative performance impact, but it has a positive impact on maintainability. On top of that, LINQ statements can be made to scale with the hardware easier, and a more readable manner, than a collection of iterative statements. LINQ statements should still be refactored for even further readability, and by encapsulating the implementation it can be replaced with iterative code while hiding said code’s complexity.