Refactoring Partition List

by KodefuGuru 20. December 2009 21:24

I came across an article on Visual C# Kicks that describes how to partition a list. The author’s goal is to split a list of elements into an array of smaller list, and the author succeeds at that. Unfortunately, the code appears to be designed for C# 2.0 instead of C# 3.0. I’m going to examine the second version of the method the author posted, explain the problems I have with it, then refactor it to be fluent. Please note that I made two modifications to the source: renamed the method from Partition2 to Partition and fixed a typo.

public static List<T>[] Partition<T>(List<T> list, int size)
{
    if (list == null)
        throw new ArgumentNullException("list");

    if (size < 1)
        throw new ArgumentOutOfRangeException("totalPartitions");

    int count = (int)Math.Ceiling(list.Count / (double)size);
    List<T>[] partitions = new List<T>[count];

    int k = 0;
    for (int i = 0; i < partitions.Length; i++)
    {
        partitions[i] = new List<T>(size);
        for (int j = k; j < k + size; j++)
        {
            if (j >= list.Count)
                break;
            partitions[i].Add(list[j]);
        }
        k += size;
    }

    return partitions;
}

The most glaring problem is that it returns a List<T>. Okay, actually, it’s returning an array of List<T>, but the problem is still the same. List<T> is a much too specific scenario. Unfortunately, we as developers became dependent on them between 2005 and 2008 because they were just so useful. If you’re designing a method for reusability, it is better to lean on the most general interface available. That interface is IEnumerable<T>.

The second problem is actually part of the first: it returns an array. We shouldn’t tie this to an array and instead make it IEnumerable<T> as well.

The third problem is that this method is not fluent at all. I assume this method sits in a helper class somewhere, but there is no extension made. So, we’re left calling a method on the helper class.

var partitions = ListHelper.Partition(list, 5);

If I want to partition some list, I want to call the partition method on the list. So, that’s how I’m going to refactor this: move it to IEnumerable<T> and make it an extension method.

The great news is that all that logic can be cleaned up by using LINQ. Here’s how I refactored the code.

public static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> source, int size)
{
    if (source == null)
        throw new ArgumentNullException("list");

    if (size < 1)
        throw new ArgumentOutOfRangeException("size");

    int index = 1;
    IEnumerable<T> partition = source.Take(size).AsEnumerable();

    while (partition.Any())
    {
        yield return partition;
        partition = source.Skip(index++ * size).Take(size).AsEnumerable();
    }
}

You can now access the Partition method directly from any IEnumerable<T>. In addition, it is no longer returning an array. If you really desire an array, you can use the ToArray() method like any other IEnumerable<T> with LINQ.

var partitions = list.Partition(5).ToArray();

That refactoring was fun. If you have anything you’d like me to take a look at, send me a message through the contact form.

Edit: This is a minor change, but after thinking about it I decided that starting the index at 1 and using index++ was more readable than starting the index at 0 and using ++index.

Tags: , , ,

Kodefu

Comments

12/21/2009 8:04:51 AM #

Chris

I think I should remove the AsEnumerable() from the statements. Since I needed to check for emptiness, I decided to go ahead and force execution. However, I'm not so certain that's necessary. Thoughts?

Chris United States

12/31/2009 12:34:47 PM #

Gavin

Note that your version is extremely efficient, due to Skip/Take having to repeatedly iterate over the source. On a list of 50,000 elements, it takes approx 1200ms, vs about 6ms vs this version:

        public static IEnumerable<IEnumerable<T>> Partition2<T>(
            this IEnumerable<T> ie, int size) {
            if (ie == null)
                throw new ArgumentNullException("ie");
            if (size < 1)
                throw new ArgumentOutOfRangeException("size");
            IEnumerator<T> e = ie.GetEnumerator();
            var a = new T[size];
            var i = 0;
            while (e.MoveNext()) {
                a[i] = e.Current;
                if (++i == size) {
                    yield return a;
                    a = new T[size];
                    i = 0;
                }
            }
            if (i > 0) {
                Array.Resize(ref a, size);
                yield return a;
            }
        }


Gavin United States

12/31/2009 1:59:55 PM #

chris

Gavin, I assume you meant inefficient. Indeed, I didn't run any performance tests. I'll test yours out.

chris United States

Add comment




  Country flag

biuquote
  • Comment
  • Preview
Loading



Powered by BlogEngine.NET 1.6.0.0
Theme by Mads Kristensen

Whois KodefuGuru

Chris Eargle

Chris Eargle
.NET Community Champion

LinkedIn Twitter Technorati Facebook

MVP - Visual C#

 

INETA Community Champions
Friend of RedGate
Telerik .NET Ninja
Community blogs & blog posts

I am a #52er


World Map

RecentComments

Comment RSS

Tag cloud

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2010