Tuesday, March 24, 2009

LINQ vs FOREACH vs FOR Loop Performance

I made a blatantly stupid logic mistake the other day in a business logic layer method that I was writing and got the following exception: “Collection was modified; enumeration operation may not execute.” The problem what I was trying to remove an object from a collection while I was enumerating through it using a FOREACH statement. After I realized what I had done I solved the problem by populating a new collection with a little bit of LINQ.

   1: // Get the original list of values


   2: List<bool> booleanValues = GetBooleanList(numberOfTests);


   3: // Now create a new list containing the objects we want to remove 


   4: // from the original list


   5: List<bool> valuesToDelete_Linq = new List<bool>(from b in booleanValues


   6:                                                 where !b


   7:                                                 select b);


   8: // Now remove the values safely from the original list


   9: foreach (bool b in valuesToDelete_Linq)


  10:     booleanValues.Remove(b);




There were several solutions I could have used to solve the problem; specifically I could have used a FOR loop, multiple FOREACH loops or the LINQ solution that I posted above (and there are probably more solutions that I haven’t thought of.) This led me to wonder about the performance for each solution, which in turn led me to this post: http://frater.wordpress.com/2008/02/20/collection-was-modified-enumeration-operation-may-not-execute/



FOR – FOREACH – LINQ Performance Tests


My system specs:
- Intel Core 2 Duo 2.5 GHz
- 4 GB memory
- Windows Vista Enterprise 64bit
- .NET Framework 3.5 SP1

After reading through the post above and the comments that it was followed by, I decided to test all three solutions to see what the speed was like. The results surprised me by quite a lot. I knew LINQ was somewhat slow, but I didn’t realize exactly how slow until I ran these tests.




For Loop results: Bool Count = 2002, Time = 00:00:00.0060000


Double For Each results: Bool Count = 2002, Time = 00:00:00.0760000


LINQ results: Bool Count = 2002, Time = 00:00:00.0760000


 


 


For Loop results: Bool Count = 20002, Time = 00:00:00.1380000


Double For Each results: Bool Count = 20002, Time = 00:00:00.9780000


LINQ results: Bool Count = 20002, Time = 00:00:00.9820000


 


 


For Loop results: Bool Count = 200002, Time = 00:00:06.4100000


Double For Each results: Bool Count = 200002, Time = 00:01:39.0640000


LINQ results: Bool Count = 200002, Time = 00:01:39.4220000



So yeah, LINQ is considerably slower than using a FOR loop in this instance. However, I would personally still use LINQ in instances where I am working with a small amount of data and the performance difference is negligible. Why? Because in my opinion LINQ is much easier to read, understand & lends itself to fewer bugs.


Performance Test Application Download

You can download and run the performance test that includes implementations of each solution I mentioned above from the link below.



Aaron Schnieder

http://www.churchofficeonline.com

9 comments:

Don Demsak said...

Someone else did a similar experiment (I can't find the link) but I did a similar thing and using 3.5 SP1, a simple LINQ select instead of a ForEach is 20% slower than the ForEach. But, YMMV. If I remember correctly, there was some changes put in SP1 that increased perf in certain situations.

Steve Sheldon said...

It's hard to outperform the for() loop.

BTW, I like to start the for loop at the end of the list and work backwards. so like: for(i = list.Count - 1; i < list.count; i--)

Then you don't have to do the i-- if you remove something, as you're already doing it.

Paulo Aboim Pinto said...

It's true. For now the LINQ is slower then the usual foreach method. The good thing is, the LINQ is on the framework and that can be improved.

I remember how was the concatenate of string. In the begin we use the + to concatenate but now we use StringBuilder or string.Format because is better and faster and more readable.

I'm sute the LINQ will be better in the future. For now, it's your choice to use it or not.


Regards
Paulo Aboim Pinto
Odivelas - Portugal

Aaron Schnieder said...

Don:

I updated the post with my system specs. I am currently running .NET Framework 3.5 SP1.

Aaron Schnieder said...

Paulo:
That is a great comment and exactly what I was getting at in my post. LINQ is AWESOME and in my opinion one of the best new releases in the .NET Framework. However, because there are performance considerations all developers that choose to use LINQ should consider the performance implications in their decision.

ZByszek said...

I am afraid, you just used Linq improperly. Your statement generates in fact cartesian product as it does not have an idea that each element is only once in the collection where you delete it from.
it should be:
booleanValues = booleanValues.Where(x=>x).ToList();
i.e. just get elements which do NOT satisfy criteria for delete.
In this scenarion speed is BETTER than for loop.

drokliss said...

Just now getting around to looking into linq, came across this when looking at performance vs loops. Nice writeup Aaron, thanks ;)

Damien said...

Hi, I do some tests and I found two very fast solutions :

1) First method :

booleanValues = GetBooleanList(numberOfTests);
startTime = DateTime.Now;

bool[] array = booleanValues.ToArray();
booleanValues = new List();

for (int i = 0; i < array.Length ; i++)
if (array[i])
booleanValues.Add(array[i]);

Console.WriteLine(string.Format("Very Fast For Loop results: Bool Count = {0}, Time = {1}", booleanValues.Count, DateTime.Now.Subtract(startTime)));

2) Second method :

booleanValues = GetBooleanList(numberOfTests);
startTime = DateTime.Now;

booleanValues.RemoveAll(k => !k);

Console.WriteLine(string.Format("Very Fast LINQ results: Bool Count = {0}, Time = {1}", booleanValues.Count, DateTime.Now.Subtract(startTime)));


Each of these method returned an incredible time of 00:00:00 for 500000 entries in the initial booleanValues, with 250000 == true and 250000 == false.

Kai Thoma said...

Damien, the reason why you reached "an incredible time of 00:00:00" is because your imprecise way to meassure time.
You'd better use System.Diagnostics.Stopwatch like this:

Stopwatch timer = new Stopwatch();
timer.Start();
//do your work
timer.Stop();
Console.WriteLine( timer.ElapsedMilliseconds );