Main FAQ Category: C# (13)
Also in: Beginner (39) 
How to remove duplicate Whitespace Characters from a string?
Posted: 24-Mar-2008
Updated: 21-Jun-2009
Views: 19100

Having duplicate whitespace characters in text can sometimes cause problems when displaying summaries and
reports.

Note: Whitespace does not have to be only a SPACE character (" ").
This term is used for a range of other non-visible characters like Tab Key, Return Key etc.

Luckily, .NET has some very slick string handling routines we can use for this task.
Also we have another option: Regular Expressions.

We will cover here both implementations and compare them:

With Regular Expressions, its much easier to remove duplicate Whitespace from string, because
there is a special symbol - '\s' that we can use to easily identify and remove all unwanted Whitespace characters from text:

Here is the function in C# that uses Regular Expressions to remove duplicate WhiteSpace characters from string:
     using System.Text.RegularExpressions;
 
     public string RemoveDuplicateWhiteSpace(string input)
     {   
        return Regex.Replace(input, "[\s]+", "", RegexOptions.Singleline | RegexOptions.IgnoreCase);
     }
Its a elegant one-liner method that does the trick, but there is a drawback!  IT IS VERY SLOW!

I will hit you with the statistics on that later, first let us implement the same method without Regular Expressions:

    public static string RemoveWhitespaceWithSplit(string inputString)

    {

        StringBuilder sb = new StringBuilder();

 

        string[] parts = inputString.Split(new char[] { ' ', '\n', '\t', '\r', '\f', '\v' }, StringSplitOptions.RemoveEmptyEntries);

 

        int size = parts.Length;

        for (int i = 0; i < size; i++)

            sb.AppendFormat("{0} ", parts[i]);

 

        return sb.ToString();

    }


Ok, now the stats:

When tested on shorter strings (30 characters with multiple Whitespaces) with 1 Million iterations, implementation without RegEx is 4 times faster.

When we test both methods with larger sample file (ASCII Ebook 8MB in size) the difference in speed is smaller
but its still significant (Split method without RegEx is still 2 times faster) .

So its on you to decide which implementation you will use, but my recommendation is the one without Regular Expressions.

Tip:
Use Regular expressions only when searching for complex patterns in data that you cannot find by using
the standard methods of .NET classes.
If you can accomplish the same thing with both methods, then for sure use the String.Split method becasue
of its superior efficiency.

Thanks to Dusan Pavkov for his useful suggestions for this FAQ.