<?xml version="1.0" encoding="UTF-8"?> <rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" ><channel><title>Quickduck &#187; RegEx</title> <atom:link href="http://quickduck.com/blog/category/development/regex/feed/" rel="self" type="application/rss+xml" /><link>http://quickduck.com/blog</link> <description>Straight from the mind of geniuseseses....</description> <lastBuildDate>Mon, 09 Jan 2012 02:29:30 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <item><title>Parsing comma separated values with regular expressions</title><link>http://quickduck.com/blog/2010/05/15/parsing-comma-separated-values-with-regular-expressions/</link> <comments>http://quickduck.com/blog/2010/05/15/parsing-comma-separated-values-with-regular-expressions/#comments</comments> <pubDate>Fri, 14 May 2010 21:22:01 +0000</pubDate> <dc:creator>Gerrod</dc:creator> <category><![CDATA[C#]]></category> <category><![CDATA[RegEx]]></category><guid isPermaLink="false">http://quickduck.com/blog/?p=282</guid> <description><![CDATA[We&#8217;ve all had to write code to parse comma separated values before; it sounds simple, but it can actually be quite tricky! Sure, if our lists were always nicely defined like this: &#8220;one&#8221;,&#8221;two&#8221;,&#8221;three&#8221;,&#8221;four&#8221; five,six,seven,eight Then we could simply use String.Split. But life is never that kind! When your input strings may be a bit more [...]]]></description> <content:encoded><![CDATA[<div class="google_plus_one"><g:plusone size="standard" count="false" url="http://quickduck.com/blog/2010/05/15/parsing-comma-separated-values-with-regular-expressions/"></g:plusone></div><p>We&#8217;ve all had to write code to parse comma separated values before; it sounds simple, but it can actually be quite tricky! Sure, if our lists were always nicely defined like this:</p><ul><li>&#8220;one&#8221;,&#8221;two&#8221;,&#8221;three&#8221;,&#8221;four&#8221;</li><li>five,six,seven,eight</li></ul><p>Then we could simply use String.Split. But life is never that kind! When your input strings may be a bit more loosely defined, like this:</p><ul><li>one,&#8221;two,three&#8221;,four,,six,&#8221;seven&#8221;</li><li>,two,&#8221;three,four,five&#8221;,,</li></ul><p>It gets a little tougher.</p><p>So can you do it using a single regular expression? Yes, you most certainly can! It&#8217;s simply a matter of breaking down the possibilities, then catering for the <em>best case scenario</em> (quoted values), down to the <em>worse case scenario</em> (zero-length values), and finally, catering for the delimiters (either a comma, or the end of the string). Lets look at them one step at a time.</p><p>Firstly, quoted values. This is by far the easiest of all the conditions &#8211; find any length of text between two quotes. We&#8217;ll use a non-greedy expression (the question mark after the star) to ensure we don&#8217;t over-extend the length of text that we match:</p><pre class="brush: csharp; title: ; notranslate">
private const string
    Template_QuotedValues = @&quot;&quot;&quot;(?&lt;content&gt;.*?)&quot;&quot;&quot;;
</pre><p>The next easiest type of match to capture are non-quoted, non-zero length values. To do this, we&#8217;ll simply look for one or more characters which are <em>not</em> a comma. Again, we&#8217;re using a non-greedy match:</p><pre class="brush: csharp; title: ; notranslate">
private const string
    Template_UnquotedValues = @&quot;(?&lt;content&gt;[^,]+?)&quot;;
</pre><p>Notice also that that for both templates, we&#8217;re creating a named group called &#8220;content&#8221; &#8211; this allows us to easily extract the contents the match, no matter what conditions were matched under.</p><p>The last type of match we need to cater for is non-quoted, zero-length matches. This is the trickiest of the three situations, since there&#8217;s &#8220;nothing&#8221; to actually match on! So instead, we look <em>zero repetitions</em> of any character, immediately after a delimiter. Since the <em>first</em> value in the list may be empty, the possible values for our delimiter are either the start of string (specified by the hat &#8211; ^), or a comma:</p><pre class="brush: csharp; title: ; notranslate">
private const string
    Template_EmptyValues = @&quot;(?&lt;=(?:,|^))(?&lt;content&gt;.{0})&quot;,
</pre><p>The final piece of the puzzle is the delimiters. Since we&#8217;re matching from left-to-right, we can assume that every match will be followed either by a comma, or the end of the string. We&#8217;ll use a <em>non-capturing group</em> since we don&#8217;t want the delimiter to be explicitly captured in a group.</p><pre class="brush: csharp; title: ; notranslate">
private const string
    Template_Delimiter = @&quot;(?=(?:,|$))&quot;;
</pre><p>Now, to put it all together. We have our three types of matches that we&#8217;re expecting, and our delimiter, so all we need to do is create a single RegEx for it all. Here goes:</p><pre class="brush: csharp; title: ; notranslate">
private const string
    // Any length value within quotes...
    Template_QuotedValues = @&quot;&quot;&quot;(?&lt;content&gt;.*?)&quot;&quot;&quot;,

    // ... or values with at least 1 character, not in quotes...
    Template_UnquotedValues = @&quot;(?&lt;content&gt;[^,]+?)&quot;,

    // ...or zero-length matches, not in quotes...
    Template_EmptyValues = @&quot;(?&lt;=(?:,|^))(?&lt;content&gt;.{0})&quot;,

    // ... followed either a comma, or end of string
    Template_Delimiter = @&quot;(?=(?:,|$))&quot;;

// Now join as one Template - notice the OR condition (pipe)
// between the three match types
readonly static private string
    Template = String.Format(&quot;({0}|{1}|{2}){3}&quot;,
        Template_QuotedValues,
        Template_UnquotedValues,
        Template_EmptyValues,
        Template_Delimiter);

// Finally, our RegEx!
readonly static private Regex CsvSplitterRegex
    = new Regex(Template, RegexOptions.Compiled);
</pre><p>Was that so bad? ;-) Iterating through the list of values in our comma separated list is now a piece of cake.</p><pre class="brush: csharp; title: ; notranslate">
// Assume CSV is in &quot;record&quot; field
foreach (Match match in CsvSplitterRegex.Matches(record))
{
    Console.WriteLine(&quot;Match value: {0}&quot;,
        match.Groups[&quot;content&quot;].Value);
}
</pre><p>Simple, eh?</p> ]]></content:encoded> <wfw:commentRss>http://quickduck.com/blog/2010/05/15/parsing-comma-separated-values-with-regular-expressions/feed/</wfw:commentRss> <slash:comments>4</slash:comments> </item> </channel> </rss>
<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Minified using disk: basic
Page Caching using disk: enhanced

Served from: quickduck.com @ 2012-02-05 17:43:08 -->
