Be careful when sorting lists of structs!!

Tags: ,
No Comments »

Working on a private project we stumbled over an interesting behavior in the .NET sorting routines. There is a crucial difference between sorting (generic) lists of structs and sorting lists of objects.Take a look at this code:

(I am avoiding properties to shorten the example)

internal struct fancything : IComparable<fancything>

{

    // Counters

    public static int ctor_counter = 0;

    public static int getsortkey_counter = 0;

 

    public string sortkeystring;

    public int sortkeyint;

 

    public fancything(int rndbase)

    {

        sortkeystring = null;

        sortkeyint = rndbase;

        ctor_counter++; // Count number of ctor-calls

    }

 

    internal string GetSortkeystring()

    {

        if (sortkeystring == null)

        {

            sortkeystring = sortkeyint.ToString();

            getsortkey_counter++; // Count inits of sortkey

        }

        return sortkeystring;

    }

 

    public int CompareTo(fancything other)

    {

        return this.GetSortkeystring().CompareTo(other.GetSortkeystring());

    }

}

 

It’s a simple class that implements IComparable; Items are compared by the return value of the method GetSortkeystring. GetSortkeystring returns the variable ‘string sortkeystring’ and – if sortkeystring is null – initializes it with the .toString()-value of a random number that was to the constructor of the calls when creating the object.

Don’t look at the code too critical; it’s just the shortest code to illustrate the idea. In our real case the initialization of the sortkeystring is a very costly method that you wouldn’t want to call more often than absolutely necessary.

Now lets just get 100 random items und have them sorted.

static void Main()

{

    // Get 100 random elements

    Random r = new Random();

    List<fancything> list = new List<fancything>();

    for (int i = 0; i < 100; i++)

        list.Add(new fancything(r.Next(0, 100)));

 

    Report("Before");

    list.Sort();

    Report("After");

 

    //Make sure it has been sorted

    foreach (fancything ft in list)

        Console.Write(ft.sortkeystring + " ");

 

    Console.ReadLine();

}

 

private static void Report(string moment)

{

    Console.WriteLine(

        string.Format(

            "{0} sorting. ctor_counter {1}, getsortkey_counter {2}",

            moment,

            fancything.ctor_counter,

            fancything.getsortkey_counter));

}

 

What would you expect as output?

Before sorting. ctor_counter 100, getsortkey_counter 0
After sorting. ctor_counter 100, getsortkey_counter 100

Right. Now lets make a little change and turn the class into a struct.

struct fancything : IComparable<fancything>

What would you except as output now? The same as above? Dude, you are dead wrong!

Before sorting. ctor_counter 100, getsortkey_counter 0
After sorting. ctor_counter 100, getsortkey_counter 770!!!!!

Can you believe it? Neither couldn’t we and it took Benjamin a while to track it down. What is going on here? I didn’t know, so here is my first theory: The sortkeystring must be null again and again, since there is no other way to increase the getsortkeycounter. Doing the .Sort, .NET seems to create new instances of fancyitem all the time, which makes perfect sense: sortings objects is all about pushing around pointers, but structs are not "pointered at". So their real values have to be taken out of the list und put in at an other place in that list. But why does the value of sortkeystring get lost somewhere. A bug?

May I confuse you more?

Let’s try something else. In the example we initialized the sortkeystring as late as possible (lazy loading – remember. it’s a costly, costly method). But, come on, for the sorting we need each sortkeystring  anyway, so why no initializing it in the constructor?

internal struct fancything : IComparable<fancything>

{

    // Counters

    public static int ctor_counter = 0;

 

    public string sortkeystring;

    public int sortkeyint;

    public fancything(int rndbase)

    {

        sortkeyint = rndbase;

        sortkeystring = sortkeyint.ToString();

        ctor_counter++; // Count number of ctor-calls

    }

 

    internal string GetSortkeystring()

    {

        if (sortkeystring == null)

            throw new Exception("Help! My value is lost.");

 

        return sortkeystring;

    }

 

    public int CompareTo(fancything other)

    {

        return (this.GetSortkeystring().CompareTo(other.GetSortkeystring()));

    }

}

The getsortkey_counter is gone, sortkeystring gets initialized in the constructor and GetSortkeystring() throws an exception if the value is lost.

The result was unexpected to me again. I was prepared for an exception, because we saw that the value gets lost. But no exception occurred.

Before sorting. ctor_counter 100.
After sorting. ctor_counter 100.
(and the list is still sorted properly)

What the hack is that supposed to mean? Let’s take a moment to sit down, have a drink and sum up the facts:

  1. the constructor of the struct is officially called 100 times in all cases. Fine.
  2. the GetSortkeystring()-method is called a lot. For each comparison. Fine.
  3. in the first example the value of sortkeystring gets lost and has to be re-Initialized. Not fine but we have to take it.
  4. in the second example the value of sortkeystring is initialized in the constructor and does not get lost. Fine, that’s what one would expect.

I really didn’t have no idea why 3 works so much different from 4. I played more and can add two more facts:

  1. value types never get lost
  2. reference types get lost like sortkeystring in the first example – as long as they are not initialized in the constructor.

Is there a kind of protection for variables, that are initialized in the constructor? I can’t believe that.

Last night in the bedroom…

…my girl-friend was already asleep, so I could mull this all over again… and was struck by the truth. Doh, it’s so simple:

As you know objects are called by reference (meaning "a pointer is passed") while structs are passed by value. Let’s say a passed struct lives as an independent copy of the original in the called method. Changes to this copy do not effect the original at all. Now List.Sort roughly works like this:

  1. Find two elements to compare
  2. Compare elements.
  3. Switch their places if necessary

Whatever step 2 does, sooner or later it has to call

public int CompareTo(fancything other).

Ups… call by value. Now ‘other’ is of course just a copy and sortkeystring gets initialized in this copy – which does not affect the original in the list at all. And next time an original is compared… Do I have to say more?

This also explains, why variables, that have been initialized in the constructor, don’t get lost. Their values already live in the original – a pretty safe place to be in a struct world 🙂

Very satisfactory, I even don’t have to wake up my girl-friend! Good night, guys!

WP Theme & Icons by N.Design Studio
Entries RSS Comments RSS Log in