[C#] Multithreading in C# 4 [2] – Data Parallelism

Data Parallelism is one part of the “Task Parallel Library (TPL)”. You can iterate over a collection of data and perform some tasks for each item in a parallel fashion.

1. Parallel Class

The “System.Threading.Tasks.Parallel” class is the primary class for data parallelism. It provide the “For()”, “ForEach()”, and “Invoke()” methods with many overloaded versions.

public static class Parallel
{
  public static ParallelLoopResult For(...);
  public static ParallelLoopResult ForEach();
  public static void Invoke();
}

Not that the “Parallel” class is a static class and provides only static methods.

2. What For?

Data Parallelism is intended to solve the following specific problems:

  • You have a collection of data
  • You need to perform some time-consuming tasks for each item of data
  • The order of items to be processed is not important

For example, you have 100 huge files to be compressed every day. It does not matter which one is zipped first. By using the “Data Parallelism”, you can request a system to run the tasks in parallel if possible.

At this moment, you might catch something. Parallel processing is really faster than sequential processing. It depends on your system. If it equips with multi processors, parallel processing will really shine. (Maybe helpful with a multi-core processor).

3. ParallelLoopResult Structure

System.Threading.Tasks.ParallelLoopResult” provides completion status on the execution of a Parallel loop.

public struct ParallelLoopResult
{
  public bool IsCompleted { get; }
  public Nullable LowestBreakIteration { get; } // index of the lowest iteration from which Break was called
}

4. Parallel.For

The “Parallel.For()” method is for the “for” loop. You need to specify the start index and the end index as well as a task to be performed. There are many overloaded versions but the simplest one is enough to get the idea.

public static class Parallel
{
  public static ParallelLoopResult For(int fromInclusive, int toExclusive, Action body);
}

When you call the “Parallel.For()” method, the main thread will wait until all parallel tasks are completed.

It is a good thing in most cases. If you do not want to wait, you need to create another thread and call “Parallel.For()” in the new thread.

public static void Test1()
{
  ParallelLoopResult result =
    Parallel.For(0, 20, i =>
      {
        // do some tasks
        Thread.Sleep(10);
        Console.WriteLine("Loop {0} in the thread {1}", i, Thread.CurrentThread.ManagedThreadId);
      });

  Console.WriteLine("IsCompleted : {0}", result.IsCompleted);
}

Note that the order of iteration is random. The output is not like “0, 1, 2, 3 …”.

5. Parallel.ForEach

The “Parallel.ForEach()” method is for the “foreach” loop. Rather than specifying the start and end indexes, you can specify the IEnumerable collection directly.

public static ParallelLoopResult ForEach(IEnumerable source, Action body)

The delegate accepts a single parameter of the type of a collection item.

public static void Test2()
{
  IEnumerable intCollection = Enumerable.Range(1, 100);
  ParallelLoopResult result =
    Parallel.ForEach(intCollection, (i) =>
      {
        // do some tasks
        Thread.Sleep(50);
        Console.WriteLine("Loop {0} in the thread {1}", i, Thread.CurrentThread.ManagedThreadId);
      });

  Console.WriteLine("IsCompleted : {0}", result.IsCompleted);
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s