Parallel (PRESTO) LINQ

April 19, 2011 at 3:17 pm | Posted in Microsoft, Technical Tips | Leave a comment
Tags: , , , , , , , ,

Unless you’ve been living in a cave for the last decade, you’ve probably heard about parallelization. It used to be that only a Fortune 500 company or government-funded research laboratory could afford multiple CPUs to take advantage of parallel processing.  Since chip manufacturers started releasing low-cost multi-core CPUs, parallelism has reached the masses. Why not have your .NET applications up their game as well?

Enter Parallel Language-Integrated Query (PLINQ).  Yes, it’s still LINQ, but with the super powers of parallelism! Although there are quite a few disclaimers attached to PLINQ performance,  I decided to dive right in and bask in all of the parallel goodness.

First off, I’m not much of a theoretical guy. I take a hands-on approach to learning anything and everything. Second, I prefer real-world scenarios to pristine testing environments. So, I decided to test using PLINQ for file iteration on my local Intel Core2 Duo CPU.

My PLINQ query is as follows. The only difference between it and the average-joe sequential LINQ query is the AsParallel portion. I added the where clause to slow down the query a bit.

var pQuery = from file in
files.AsParallel().WithExecutionMode(
ParallelExecutionMode.ForceParallelism)

where file.LastAccessTime > DateTime.Now.AddYears( -1 )
select file.Name;
foreach ( string fileName in pQuery ) {
Console.Write( fileName + "\t" );
}

When I compared the execution times for the sequential and parallel LINQ queries using the Stopwatch class, though, the sequential query written WITHOUT parallelization executed 50 milliseconds quicker on average – not exactly what I’d intended.

What gives? Oh, yeah, I should probably have read that disclaimer stuff…
I’ll save you the trouble, here’s what went wrong: my query was too simple. If I hadn’t forced parallelization, then it would have defaulted to sequential processing anyway.

I decided it was time to make a more complicated operation. The Math class is always good for some expensive operations, so I mixed in a logarithm, a square root, and some rounding and concatenation for good measure:

static string ComplicatedFunction( FileInfo file ) {
double logLength =
Math.Round( Math.Sqrt(Math.Log10( file.Length ) ));
return file.Name + “(” + logLength + “)”;
}

That didn’t help very much at all. Now the PLINQ was executing faster than with my first query, but the sequential LINQ was still out-performing it by about 30 milliseconds. Okay, I decided, I was done playing the stacking game. It was time to cheat.

I add a Thread.Sleep invocation to the ComplicatedFunction method. And once I did that, the magic happened. Not only does PLINQ out-perform the sequential LINQ query by over 30%, but the output visually demonstrates the difference! Whereas the sequential LINQ query displays only one filename (with extraneous calculation) at a time, the PLINQ query displays the filenames in chunks of five!

Screenshot of PLINQ test

PLINQ Test Output

It may take longer for PLINQ to determine the best parallelization scheme, but the processes are executed at the same time across multiple CPUs and is overall faster. The moral of the story is the more complicated your LINQ queries, the better PLINQ performs.

Try it yourself. Source code available here.

Leave a Comment »

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.
Entries and comments feeds.

%d bloggers like this: