Reduce Chain and Extract Projection Refactorings

by KodefuGuru 12. February 2010 17:58

One of the principles I introduce in my new talk is Strive for Functional Cohesion. My original article on the subject focuses on determining where the functionality belongs, then adding it to it’s appropriate place. In that scenario, I had complete access to the code. Sometimes, however, you don’t have access to the code to make the change, or the implementation belongs to an interface. If the functionality is assign appropriately, it makes the code much easier to read.

var lines = File.ReadAllLines("file.txt");
var reversed = lines.Select(s => String.Join(",", s.Split(',').Reverse().ToArray())).ToArray();
File.WriteAllLines("file.txt", reversed);

Although procedural, it is declarative. We’re reading lines, reversing them, then writing them back out. We’ve avoided the morass that is iterative programming here by using LINQ. Despite the benefit from being declarative, it isn’t very readable. In my presentation, I show how by assigning the functionality to the class or interface it belongs makes the code more readable. In this case, an IEnumberable<string> should be able to delimit itself, and it should be able to write itself out.

File.ReadAllLines("file.txt")
    .Select(s => s.Split(',').Reverse().Delimit())
    .Write("file.txt");

This cleanup was accomplished with two, very reusable, extension methods (there’s an overload for Delimit in case it isn’t comma-separated). These extensions are available in Fluent.NET.

public static class EnumerableStringExtensions
{
    private const string Delimiter = ",";

    public static string Delimit(this IEnumerable<string> strings)
    {
        return strings.Delimit(Delimiter);
    }

    public static string Delimit(this IEnumerable<string> strings, string delimiter)
    {
        return string.Join(delimiter, strings.ToArray());
    }

   public static void Write(this IEnumerable<string> strings, string path)
    {
        File.WriteAllLines(path, strings.ToArray());
    }
}

While the application of principles enabled me to write cleaner code, it struck me while I was giving this presentation that I still had to explain what how the reversal occurred. If I have to explain it to an audience, someone is probably having to spend too many brain cycles reading my code to figure out what’s going on. It is easier to figure out that s.Split(‘,’).Reverse().Delimit() reverses a set of delimited strings than String.Join(",", l.Split(',').Reverse().ToArray())).ToArray(), but there must be a way to make the code even easier to read.

Reduce Chain

The first way the statement can be simplified is by using the Reduce Chain refactoring. This takes common links from a method chain and reduces it to one link. In this case, we have 3 methods forming our chain: Split, Reverse, and Delimit. We are operating on the string class, so an extension method will be necessary. Since is a delimited string being reversed, I will call the method DelimitedReverse().

public static class StringExtensions
{
    public static string DelimitedReverse(this string source)
    {
        return source.Split(',').Reverse().Delimit();
    }
}

Any place this common method chain was being called can be reduced to this statement. This adheres to the DRY principle by cleaning up those repeated chains.

File.ReadAllLines("file.txt")
    .Select(s => s.DelimitedReverse())
    .Write("file.txt");

This refactoring is useful if the chain is being called from many different places, but I suspect that I will be making this call only in the context of a LINQ select. There’s another refactoring for that.

Extract Projection

Projection is the term used to describe transforming one sequence of items to another. The LINQ Select statement is the de facto way in C# to perform a projection. In the example, we’re doing a DelimitedReverse() projection on IEnumerable<string>. If we extract the projection and give it a name, the code will be easier to read. If that projection was used in more than one place, then the duplication has been removed as well.

There are two ways to do this. The first way merely passes through to a LINQ Select.

public static IEnumerable<string> DelimitedReverse(this IEnumerable<string> strings)
{
    return strings.Select(s => s.Split(',').Reverse().Delimit());
}

The second way builds an iterator using the yield keyword.

public static IEnumerable<string> DelimitedReverse(this IEnumerable<string> strings)
{
    foreach (var s in strings)
    {
        yield return s.Split(',').Reverse().Delimit();
    }
}

My opinion on which is better? Go with the declarative, pass through version unless the projection is nasty. If your code is easier to read in the iterative version, do it that way.

Here’s how the code now reads.

File.ReadAllLines("file.txt").DelimitedReverse().Write("file.txt");
It’s much better than the original. It’s even easier to read than the version refactored for functional cohesion.

Tags: , ,

Kodefu

A Tale of Two Principle Violations

by KodefuGuru 24. January 2010 23:57

When I decided to write the With statement in Fluent.NET, I never imagined that I would encounter a common bug and school myself in principles that I happen to present on. But that is what occurred, and I want to share it so others can avoid the same mistakes I made.

With is a Fluent version of Add, only it returns the object or interface it is acting upon rather than void. This allows you to chain methods together in a fluid manner. Instead of calling strings.Add(“Hello”); strings.Add(“World”);, you can call strings = strings.With(“Hello”).With(“World”);. Since you should write tests first, I will post my test that I expected to work.

[TestMethod]
public void With()
{
    var strings = new[] { "This", "is", "a" } .AsEnumerable();
    strings = strings.With("test");

    Assert.IsTrue(strings.Contains("test"));
}

With is cool because it even works with arrays! In fact, the point of it was that it would work with any IEnumerable<T>. Here was my initial attempt at an implementation.

public static IEnumerable<T> With<T>(this IEnumerable<T> source, T item)
{
    var collection = source as ICollection<T> ?? source.ToList();
    
    collection.Add(item);

    return collection;
}

I cast source to an ICollection<T> to get to that more specific interface. If it doesn’t implement ICollection<T> the null coalescing operator is used and source is convert to List<T>. In subsequent calls to With, source would be a List<T> so the casting would work. This violates the Liskov Substitution Principle because I’m operating on an ICollection or converting to one no matter what. I decided that I would knowingly do so because I wanted to extend IEnumerable and optimize for a type that it could convert to itself.

This code is buggy. It compiled, but my unit test failed.

Here’s another example of the same bug.

public static void AddFail(IList<string> strings)
{
    strings.Add("Fail");
}

If you haven’t figured it out, don’t feel bad. 40 people over 2 of my sessions haven’t answered it (out loud anyway) either. I’m certain I’ve seen this failure in many methods over the years as well.

The problem is that arrays implement ICollection<T> and IList<T>. However, arrays are a fixed size. Here’s the error I received when running my test: “System.NotSupportedException: Collection was of a fixed size.”

If you’re performing operations that would modify a collection, you must check the IsReadOnly property of that collection. This fixed the bug and is what I checked into CodePlex… but don’t use it, because it’s still wrong.

public static IEnumerable<T> With<T>(this IEnumerable<T> source, T item)
{
    var collection = source as ICollection<T> ?? source.ToList();

    if (collection.IsReadOnly)
    {
        collection = collection.ToList();
    }
    
    collection.Add(item);

    return collection;
}

It works like a charm, but some things were bugging me about the code. I was talking with some of the guys after the Alabama Code Camp, and I lamented that I was violating the Liskov Substition Principle. We came to the agreement that it’s okay to violate principles if you do so purposefully with full understanding of why you are doing so. But something kept nagging me after I left there and continued the 8 hour drive back home to Columbia. It was more than one principle that I was breaking.

After I spent the night at a motel and was leaving, it suddenly dawned on me that I was truly violating a functional principle from my own C# Ninjitsu talk: Avoid Side Effects. What happens if you already have a list. It’s possible to have unintentional consequences in your code.

[TestMethod]
public void WithShouldNotHaveSideEffects()
{
    var strings = new List<string> { "This", "is", "a" };
    var strings2 = strings.With("test");
    Assert.IsTrue(strings2.Contains("test"));
    Assert.IsFalse(strings.Contains("test"));
}

This test fails because it’s not expected that the original sequence will be modified. It certainly isn’t modified when an array is passed in, why should it work any different with a list?

I knew I was in violation of two different principles: one agile, one functional. The code had to be refactored, so I started thinking about where I went wrong. I decided it wasn’t being enough like LINQ. There is a design guideline, and I’m not sure if it’s written down anywhere: if you return an IEnumerable<T>, Consider Yield. I thought about how I would solve the same problem using yield, and it turn out to be quite simple.

public static IEnumerable<T> With<T>(this IEnumerable<T> source, T item)
{
    foreach (T t in source)
    {
        yield return t;
    }

    yield return item;
}
I’m amazed that I found one common bug and violated two design principles in what started as a three line method. It is rather embarrassing, but I hope by sharing this that you will avoid this bug and apply the Liskov Substitution Principle and Avoid Side Effects in your own code.

Tags: , , , ,

Kodefu

Refactor Switch to Dictionary

by KodefuGuru 20. January 2010 14:48

In my post on Z3, I created an extension method for proving an equation. This extension method used a switch statement to analyze a result and return a string. This is a bad practice: the mapping of one type to another is trapped in code. What if I wanted to move this mapping to a configuration file, a resource file, or a database? Even if I didn’t want to do that, I feel an imperative switch statement isn’t as readable as a declarative statement. Favor Declarative over Imperative. Below is the original code snippet.

public static string Prove(this Context context, Term term)
 {
     context.Push();
     Term not = context.MkNot(term);
     context.AssertCnstr(not);
     switch (context.Check())
     {
         case LBool.False:
             return "valid";
         case LBool.Undef:
             return "unknown";
         case LBool.True:
             return "invalid";
     }
     return null;
 }

When you see case statements simply return or an assign an item, you know you can perform this refactoring. You will create a generic dictionary of both types and use it instead. Here is the refactored example.

public static string Prove(this Context context, Term term)
{
    var checkResults = new Dictionary<LBool, string>
    {
        { LBool.False, "valid" },
        { LBool.Undef, "unknown" },
        { LBool.True, "invalid" }
    };

    context.Push();
    Term not = context.MkNot(term);
    context.AssertCnstr(not);
    
    return checkResults[context.Check()];            
}

The great thing about this is that you have the mappings side by side. Since it is in a dictionary format, it is loadable from another location; the mapping is no longer tied to the code.

Tags: , , , , ,

Kodefu

Find and Replace Text with Regex

by KodefuGuru 12. January 2010 17:18

A common scenario you may be faced with is displaying text to a user that contains identifying information such as social security numbers. This is a pretty simple task in .NET if you know the correct class and methods to use.

The first thing you’re going to need is a regular expression pattern. Here’s a simple one for a social security number: \d{3}-\d{2}-\d{4}.

The next thing you should do is add a reference to System.Text.RegularExpressions. Now, we can get to coding.

string text = "garbage text 123-12-1234 more garbage";
string pattern = @"\d{3}-\d{2}-\d{4}";
text = Regex.Replace(text, pattern, m => 
                    "***-**-" + m.Value.Substring(m.Value.Length - 4, 4));

Regex.Replace existed before the Func classes, so it requests MatchEvaluator rather than Func<Match, string>. No matter, we can still use a lambda here.

This was a pretty simple solution to a common problem. Enjoy.

Tags:

Kodefu

Solve SMT Problems with Z3

by KodefuGuru 11. January 2010 22:06

Microsoft Research has released Z3 2.4. Z3 is a Satisfiability Modulo Theories problem solver. This is the first time I’ve downloaded Z3 and the first time I’ve researched SMT problems, so I’m unfamiliar with them at this point. Being a little curious as to what implications this could have for the .NET framework, I opened up the dotnet example.

The it has a simple cs file and a build.cmd, so I built it. When I ran the resulting executable, I’m not sure if I received the expected output because parser_example5 through an exception. The other example wrote formulas to the screen so that was pretty neat. After I opened the cs file, it turns out the example5 was expected to receive an error: “/* the following string has a parsing error: missing parenthesis */.”

From my brief analysis of this library, it appears that you build an equation with it, then you ask Z3 to prove it. My gripe is the syntax of it. You can build yourself a z3 context easily enough,

using (Config config = new Config())
{
    config.SetParamValue("MODEL", "true");
    using (Context context = new Context(config))
    {

    }
}

but working with it afterwards is a pain to manage. Also, why can’t I take advantage of object initializers to set up the config class? I can understand why you may not want to the context to dispose of the config file in case there are other context’s using the config, but there is some fishy about that the setup.

Let’s prove something simple. If x==y, then it stands to reason that f(x) == f(y).

The first thing I’m going to do is write an extension method for the z3 Context class so I can prove a term. I will use the Prove method found in the examples, modified for my usage.

public static string Prove(this Context context, Term term)
{
    context.Push();
    Term not = context.MkNot(term);
    context.AssertCnstr(not);
    switch (context.Check())
    {
        case LBool.False:
            return "valid";
        case LBool.Undef:
            return "unknown";
        case LBool.True:
            return "invalid";
    }
    return null;
}

Next I’m going to write out the steps necessary to assert that x is equal to y, then ask if fx is equal to fy. This code is contained within the using blocks above.

FuncDecl f = context.MkFuncDecl("f", context.MkIntSort(), context.MkIntSort());
Term x = context.MkConst("x", context.MkIntSort());
Term y = context.MkConst("y", context.MkIntSort());
Term fx = context.MkApp(f, x);
Term fy = context.MkApp(f, y);
context.AssertCnstr(context.MkEq(x, y));
Term proof = context.MkEq(fx, fy);
Console.WriteLine("f(x) == f(y): {0}", context.Prove(proof));

That’s a lot of code, but it is valid! “f(x) == f(y): valid” was returned. But how can we be sure that it won’t always return that? Let’s assert the constraint that x is greater than y.

context.AssertCnstr(context.MkGt(x, y));

Now we received the text “f(x) == f(y): invalid.”

You probably don’t want to write all of that code out. Z3 comes with a few parsers, but I haven’t figured out the syntax for them yet. The methods to call on the context is ParseSmtlibFile, ParseSmtLibString, ParseZ3File, ParseZ3String, ParseSimplifyFile, ParseSimplifyString.

The code reads to me like C++. Looking at the .NET assembly in reflector, it is just a wrapper over unsafe assemblies. It is a shame, as I would play with it a lot more if it felt like I was using .NET. No worries though: the way things are nowadays it’s a matter of time before I find LinqToZ3 on Codeplex.

Tags: ,

Kodefu

Null Coalescing Assignment Operator for C# 5

by KodefuGuru 7. January 2010 16:53

If you read my article on the null coalescing operator (??), you’ve probably got a good idea on how useful it is. It makes your code more declarative, which makes it easier to read. But I find one thing disappointing about it, which I hope can be corrected in C# 5 with the introduction of a new operator.

One way to use the null coalescing operator is by checking a variable against itself, and if it’s null assign another value. Besides eliminating imperative code, this is great when combined with factory methods.

client.WorkOrder = client.WorkOrder ?? WorkOrder.Create();

My complaint is that you have to declare the same value on the left and right side. It’s not much, but it’s easily corrected. Due to C#’s C-based heritage, other types of assignment operators, called compound assignment operators, already exist: +=, –+, *=, /=, %=, &=, |=, ^=, <<=, and >>=. These work as if you had broken them apart. In other words, x += y is the same as x = x + y. There are two differences: the left operand isn’t repeated and the C# compiler evaluates the left operand once with the compound assignment operator.

I propose a new compound operator for C# 5: the null coalescing operator. The symbol for it would logically be ??=. It will simplify our code when working in a declarative manner.

client.WorkOrder ??= WorkOrder.Create();

It doesn’t require a framework change as the IL generated by compound assignment operators is the same as their analogues. It’s a compiler change, and I don’t believe it would be a difficult one at that.

Tags: , ,

Kodefu

What’s the deal with AsEnumerable?

by KodefuGuru 6. January 2010 17:11

There’s an odd LINQ method that appears to do nothing but is actually quite useful: AsEnumerable().

Declarations

When using LINQ in C#, we typically use the var keyword as it beats typing IEnumerable<Product>. Besides, do you really care where you end up? The important thing is you have a variable that tells you what your sequence is. If I have “var products” on the left side of the assignment, I know I have a sequence of products, and that’s what is important.

There is a small danger in using the var keyword. Some LINQ functions that return a sequence don’t return IEnumerable<T> but instead return a specialized version. OrderBy returns IOrderedEnumerable<T>, and this can cause you problems if you need to reassign the variable.

var products = context.Products.OrderBy(p => p.Category);

if (!productName.IsNullOrEmpty())
{
    products = products.Where(p => p.Name.Contains(productName));
}

The products variable cannot be assigned in the example above because Where returns IEnumerable<T>, but products has been declared as IOrderedEnumerable<T>. The principle, program to the interface, not the implementation applies, and the AsEnumerable will still allow you to use the var keyword.

var products = context.Products.OrderBy(p => p.Category).AsEnumerable();

if (!productName.IsNullOrEmpty())
{
    products = products.Where(p => p.Name.Contains(productName));
}

Adding .AsEnumerable() after the OrderBy fixes the problem.

Escape Custom LINQ Implementations

Sometimes people have implemented the standard extension methods on their on enumerations. Even worst, sometimes people’s own enumerable classes are inherited from List<T>.

public class ProductList : List<Product>
{
    public IEnumerable<Product> Where(Func<Product, bool> predicate)
    {
        Console.WriteLine("Got Here");
        foreach (var product in this)
        {
            if (predicate(product))
            {
                yield return product;
            }
        }
    }
}

Here’s the code to call this.

var products = productList
                    .Where(p => p.Name.Contains(productName))
                    .ToArray();

This writes “Got Here” to the Console.

var products = productList
                    .AsEnumerable()
                    .Where(p => p.Name.Contains(productName))
                    .ToArray();

Adding AsEnumerable() causes Enumerable.Where to be call, and nothing will be written to the Console.

AsEnumerable is also great if you want to break out of a Queryable and do the processing in memory rather than on a server. It’s a nice tool to have for any C# developer.

Tags:

Kodefu

Is Fluent.NET the same as Mono.Rocks

by KodefuGuru 5. January 2010 11:51

I came across a comment on dotnetguru.org about Fluent.NET that needs a response.

rien de nouveau...
Mono.rocks faisait déjà la même chose...
http://www.mono-project.com/Rocks

This translates to “nothing new… Mono.rocks was already doing the same thing.”

I was aware of Mono.Rocks before I released Fluent.NET, so why did I bother? The difference is the purpose in the libraries. Fluent.NET exists to make the .NET framework more fluent. Extension methods and classes are added only if they serve that goal. Mono.Rocks is a set of extension methods for the base class libraries.

Goal of Fluent.NET: Make the .NET Framework more fluent.

Goal of Mono.Rocks: Provide useful extensions for the .NET Framework.

There is overlap between these two goals as the primary technique to make the .NET Framework more fluent is to add useful extension methods. However, Fluent.NET contains the factory method Sequence.Create as well, which has no place in Mono.Rocks. In the future, Fluent.NET may even contain classes that can be instantiated if it serves its goal.

Mono.Rocks has very few extension methods. This differentiates itself from some of the bucket extension libraries out there, but it is far too selective for the purposes of Fluent.NET. Fluent.NET differentiates itself by what it intends to do. It is not Mono.Rocks, it is not a buck extension method collection; it is a class library that makes the .NET Framework more fluent.

The good news is you can use both.

Tags: , ,

Kodefu

Fluent.NET 1.0.0.0 Released

by KodefuGuru 3. January 2010 17:04

The first release of Fluent.NET is in the wild. It’s not a lot right now, but it is useful, and it will grow as I come across .NET framework methods that should be more fluent. So perhaps I should explain what Fluent.NET is.

As an English speaker (a fair assumption since you’re reading my blog), you probably read from left to right. Why then do we as C# developers read and write code from the outside in?

File.WriteAllLines("new.txt", File.ReadAllLines("old.txt"));

Fluent.NET adds extension methods to give the functionality where it truly belongs. This fits in with a design princple, Strive for Functional Cohesion. In the above example, the WriteAllLines method logically belongs to File, but functionally it belongs to is IEnumerable<String>. In the Fluent.NET framework, IEnumerable<String> was extended so it now has the method Write.

File.ReadAllLines("old.txt").Write("new.txt");

This has the added benefit of allowing us to read and write code like we do natural (albeit Western) languages, from left to right.

Fluency involves method chaining, and in this case it is no different. ReadAllLines returned an array of strings, which we can then call a Linq Select statement on to transform it into something else. Assume that the text file being read contains comma-separated values. The text file being written to should have those values in reverse. This can easily be written as the following snippet of code.

File.ReadAllLines("old.txt")
    .Select(s => s.Split(',').Reverse().Delimit(","))
    .Write("new.txt");

Delimit is another function introduced in Fluent.NET. It is a pass-through method to String.Join.

Here’s a small program that introduces two things within Fluent.NET.

1.To(5).Execute(Console.WriteLine);

The To extension method simply creates an IEnumerable<int> from 1 int to another (up or down). You can create an array, but who wants to write all those elements out? You could also have used Enumerable.Range, but I felt that was clunky and not exactly fluent as you can’t start from any integer (it’s probably more approprite to do start.To(end) in a real program).

Execute is what most people have created as ForEach in their own library of extension methods. I have a problem with the name ForEach. ForEach implies the keyword in C#, and many of the Linq extension methods do a foreach on the sequence anyway. The name means nothing. Execute was chosen because an action is being immediately executed on each element in the sequence, in this case Console.WriteLine.

Fluent.NET does contain an Each method as well. This method is similar to the Execute method except that it is deferred and it returns the element it acts upon.

Fluent.NET currently contains two assemblies: Fluent.dll and Fluent.Web.dll. One goal is to not tie users of Fluent to assemblies they don’t require, so each assembly will be related to the .NET assembly it is making fluent. This led to an odd design decision: I did not wish for duplicate namespaces everywhere. To solve this, I made every Fluent class belong to the Fluent namespace. To turn Fluent on in a file, add the appropriate assemblies, use the Fluent namespace, and you’re good to go.

Tags: , , ,

Kodefu

Pair Has Changed But Is Not Going Away

by KodefuGuru 30. December 2009 17:40

I wrote earlier about creating a factory method for KeyValuePair<K, V> similar to the factory method in the Tuple class. Like the KVP, a two element Tuple is known as a pair as well. But did you know there exists another Pair class in the .NET Framework?

System.Web.UI.Pair sits in System.Web.dll. It’s the dirtiest class of all as it isn’t even generically-typed. Here is the definition.

[Serializable]
[AspNetHostingPermission(SecurityAction.LinkDemand,
    Level = AspNetHostingPermissionLevel.Minimal)]
public sealed class Pair
{
    public object First;
    public object Second;

    public Pair()
    {
    }

    public Pair(object x, object y)
    {
        this.First = x;
        this.Second = y;
    }
}

I decided to take a look at it in .NET 4 to see if there were any differences. I didn’t expect much as I’m sure it would break many pieces of code, but it would see that Pair should at least inherit from Tuple<object, object> then have its First and Second properties mapped to Item1 and Item2. In an ideal world, Pair would be an alias for Tuple<T1, T2>.

[Serializable]
public sealed class Pair
{
    public object First;
    public object Second;

    [TargetedPatchingOptOut(
        "Performance critical to inline this type of method across NGen image boundaries")]
    public Pair()
    {
    }

    [TargetedPatchingOptOut(
        "Performance critical to inline this type of method across NGen image boundaries")]
    public Pair(object x, object y)
    {
        this.First = x;
        this.Second = y;
    }
}

No such luck on the inheritance hierarchy, but there are a few attribute changes.

First, of all, the AspNetHostingPermission attribute was removed. This is a code access security attribute that grants the minimum permissions required for execution to the caller of this class. I compared the usage of this attribute in System.Web 2.0 and System.Web 4 (there is no inbetween release for that assembly) and it appears that it is being removed from nearly everything. The attribute is still used in a few places, but it is almost completely removed from the System.Web.UI namespace (with the exception of PageParserFilter and Page.get_LastFocusedControl()). I wouldn’t be surprised to see it further deprecated by the time .NET 4 goes live. For reference, I am comparing it with Beta 2.

The next thing you will notice is the addition of the TargetedPatchingOptOut attribute. When you generate a native assembly, this attribute informs the native assembly generator to inline the method across image boundaries. This will improve performance for your native assemblies.

I am disappointed the antiquated Pair class is still in the Framework, but I understand that to change it would cause many people headaches. It is interesting to see how such a benign class has been changed in the process of upgrading System.Web.dll since its last release in 2005.

Tags: , , ,

Kodefu

Powered by BlogEngine.NET 1.6.0.0
Theme by Mads Kristensen

Whois KodefuGuru

Chris Eargle

Chris Eargle
.NET Community Champion

LinkedIn Twitter Technorati Facebook

MVP - Visual C#

 

INETA Community Champions
Friend of RedGate
Telerik .NET Ninja
Community blogs & blog posts

I am a #52er


World Map

RecentComments

Comment RSS

Tag cloud

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2010