Bookmark and Share

Generate Matches for Regular Expressions Using Rex

by KodefuGuru 3. May 2010 18:59

rexRegular expressions are one of the more cryptic tools developers utilize in every day work. Sure, just about anyone understands that ^\d\d$ matches a line containing exactly two digits, but you may need a cheat sheet to figure out a regex like \b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b.

If you’re like me, you typically know that you need a regex to match on something, then you go about making a  regex to match the pattern. Then, you’ve either built a unit test or you use a website to test your regex against some test data. This is great for positive verification, but how can you be sure that bad data isn’t matched as well?

Microsoft Research has produced a tool that will explores a regex to generate matching values much like Pex explores a program to generate relevant test inputs. This tool is call Rex (Regular Expression Exploration). This solves the problem mentioned above. With Rex, you can explore valid values for a given regex and an analyze them for potential problems.

I downloaded the 1.0 release of this command-line tool and gave it a shot. It was pretty easy to get started: just type rex.exe <regex> and it will give you a value. If you want more values, type in /k:<number> (I have no idea what k stands for, it’s the only option without a full name). You can pass more than one regex in by using a file, and then with the /intersect option you can instruct rex that all regexes must be applied for the generated values. Use /? to see more options.

After trying out the regex from the beginning of this article, I discovered that Rex does have a few limitations. Here are the regex constructs not supported: anchors \G, \b, \B, named groups, lookahead, lookbehind, as-few-times-as-possible quantifiers, backreferences, conditional alternation, substitution.

I’m not satisfied with a command-line tool. Luckily, Rex is a .NET application. I created a new Console application and added a reference to Rex.exe. I then added a using clause to Rex, and began typing out some code. Here’s what I came up.

string regex = @"^(0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])[- /.](19|20)\d\d$";
RexSettings settings = new RexSettings(regex) { k = 5, encoding = CharacterEncoding.ASCII };
var results = RexEngine.GenerateMembers(settings);
foreach (var result in results) { Console.WriteLine(result); }

It turned out that you can create an instance of RexEngine and GenerateMembers, but since it was meant to be a command-line tool I decided to use the RexSettings class and pass it to the static version of the GenerateMembers method. I obtained that particular regex from a site claiming that it matched dates. Well it does, but not consistently. As you can see by Rex’ output, it would be better to come up with a different regex, enforce consistency on the separator, or make it match one of several regexes.

12/09-2085
12-31-2089
09 31.2098
12/15-1989
03 31 1989

I vote for consistency on the separator.

This is a very useful tool Microsoft Research has invented. My only concern is that unscrupulous people will be using it as well. Then again, they’ve probably had these kind of generators all along.

Tags: ,

Weapons

Bookmark and Share

Find and Replace Text with Regex

by KodefuGuru 12. January 2010 17:18

A common scenario you may be faced with is displaying text to a user that contains identifying information such as social security numbers. This is a pretty simple task in .NET if you know the correct class and methods to use.

The first thing you’re going to need is a regular expression pattern. Here’s a simple one for a social security number: \d{3}-\d{2}-\d{4}.

The next thing you should do is add a reference to System.Text.RegularExpressions. Now, we can get to coding.

string text = "garbage text 123-12-1234 more garbage";
string pattern = @"\d{3}-\d{2}-\d{4}";
text = Regex.Replace(text, pattern, m => 
                    "***-**-" + m.Value.Substring(m.Value.Length - 4, 4));

Regex.Replace existed before the Func classes, so it requests MatchEvaluator rather than Func<Match, string>. No matter, we can still use a lambda here.

This was a pretty simple solution to a common problem. Enjoy.

Tags:

Kodefu

KodefuGuru.GetInfo()

Chris Eargle
LinkedIn Twitter Technorati Facebook

Chris Eargle
C# MVP, INETA Community Champion


MVP - Visual C#

 

INETA Community Champions
Friend of RedGate
Telerik .NET Ninja
Community blogs & blog posts

I am a #52er

I have joined Anti-IF Campaign


World Map

Tag cloud

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2010