Last night, I was asked why one should refactor if the test is passing. The test-driven development mantra of “red, green, refactor,” is ingrained in many practitioners' head, but why exactly should you bother with this ‘frivolous’ third step? When you write perfect code, why bother?
Your Code is Not Perfect
I have seen beautiful code. Code that is not repeated, is easy to maintain, and easy to read. However, to make the claim that the code is perfect is fallacious. Most developers write in a stream of consciousness style, which may not be clear even to the same developer a few months later. Even if you immediately create beautiful, clean code, it is still necessary to proofread your work once the functionality is verified.
Don’t Repeat Yourself
Many people following TDD are familiar with the DRY principle. You should write code once, and duplication should be removed. This modification is a form of refactoring. If you neglect this aspect of your development practice, it will eventually lead to further problems.
Although many people recognize duplicated business logic, far fewer recognize duplication with within other algorithms. For example, in Reduce Chain and Extract Projection Refactorings, I move a projection that reverses the order of a comma-delimited string to an extension method called DelimitedReverse(). One benefit of doing this is the functionality is reusable without duplication. Many developers wouldn’t think twice about copying and pasting the lambda expression inside a Select method, but this is often the first indication of a problem. Although this functionality is unlikely to change, if it were necessary it need only be modified in one place. Additionally, the minute piece of code is now testable.
var strings = new[] { "red,green,blue" };
var reversed = strings.DelimitedReverse().First();
Assert.AreEqual("blue,green,red", reversed);
Maintainability
The repetition of logic directly affects the maintainability of a solution. Changes to functionality are not propagated to duplicate pieces of code. After all, if you’ve actually done this, the assumption is that whereas the functionality is the same when it is created, changes should be independent to each other. This is bad for consistency and will lead to bugs in the future when a change causes one subsystem to act differently than others.
Readability
Martin Fowler and Bob Martin have both published books that encourage best practices for writing readable code. There are two rules I think are vitally important for methods: they should be named what they do (not how they do it), and they should be short.
My methods are extremely short as I strive for functional cohesion and refactor to fluent interfaces where I can. This leads to extremely readable code. When I open my codebase up after a long break, I understand exactly what’s going on. My code reads like English, from left to right. The typical codebase contains imperative code which can be difficult to decipher. Many people will claim that the code only needs more comments. However, this leads to an even bigger mess to decipher than before. Properly written code is self-documenting.
This is a simple example that will pass a standard, happy path unit test (with an IO dependency).
public void ReverseLines(string fileName)
{
var lines = File.ReadAllLines(fileName);
var reversedLines = new List<string>();
foreach (string line in lines)
{
var elements = line.Split(',');
Array.Reverse(elements);
var reversed = String.Join(",", elements);
reversedLines.Add(reversed);
}
File.WriteAllLines(fileName, reversedLines);
}
Here is a readable version of the method with the same functionality.
public void ReverseLines(string fileName)
{
File.ReadAllLines(fileName)
.DelimitedReverse()
.Write(fileName);
}
This is accomplished by favoring declarative code and assigning functionality where it belongs. IEnumerable<string> should be responsible for writing itself to a file. IEnumerable<string> should be responsible for delimiting itself, and in this case I refactored further by making a reusable piece of functionality for performing a delimited reverse.
Reusable
By taking the time to refactor your code base, you end up with reusable units of functionality.
public static class EnumerableStringExtensions
{
private const string Delimiter = ",";
public static string Delimit(this IEnumerable<string> strings)
{
return strings.Delimit(Delimiter);
}
public static string Delimit(this IEnumerable<string> strings, string delimiter)
{
return string.Join(delimiter, strings);
}
public static void Write(this IEnumerable<string> strings, string path)
{
File.WriteAllLines(path, strings);
}
public static IEnumerable<string> DelimitedReverse(this IEnumerable<string> source)
{
return source.Select(l => l.Split(',').Reverse().Delimit());
}
}
If I ever need to delimit an IEnumerable<string>, no problem! if I need write an IEnumerable<string> to a file, easy! If the functionality needs to change in some way, I only have to do it in one place.
The Third Step
Test-driven development is great for writing working code, but the third step should not be ignored. Many people don’t want to spend the time cleaning up their code once the functionality is written. However, if you spend the time upfront to complete the TDD process, it will be worth it. Clean code is beautiful code, and clean code requires polish.