Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
RegEx and complex pattern
Message
From
23/01/2019 08:56:01
 
General information
Forum:
C#
Category:
Coding, syntax and commands
Miscellaneous
Thread ID:
01665645
Message ID:
01665672
Views:
58
Likes (1)
>>>Hi everybody,
>>>
>>>We have the following code:
>>>
>>>
>>>public static Regex XMLRegex = new Regex(@"<(?<field>[^/>]+)>(?<data>.*)</\k<field>>", RegexOptions.Compiled | RegexOptions.Singleline);
>>>
>>>and then
>>>
>>>
>>>MatchCollection matches = XMLRegex.Matches(tcSQML);
>>>         foreach (Match m in matches)
>>>         {
>>>            if (!tDictionary.ContainsKey(m.Groups["field"].ToString()))
>>>               tDictionary.Add(m.Groups["field"].ToString(), m.Groups["data"].ToString().Trim());
>>>         }
>>>
>>>Unfortunately, we found a case where this pattern doesn't work, e.g.
>>
>>>
>>>
>>><func>appendrecs</func><tcoperator>ADMIN</tcoperator>
>>><tcrecorddata>CHARGEDATE0001902/28/2019 00:00:00SALE_TEXT 00429<func>PaymentPlanCharge</func><orig_amount>125</orig_amount>SALESPOINT00006DANAIITRANSTYPE 000010</tcrecorddata>
>>><tcsalespoint>DANAII032001</tcsalespoint><tctablename>WW_SALES</tctablename>
>>>
>>>
>>>
>>>In other words, the tcRecordData contains SALE_TEXT which in turn contains some XML like text. We need to parse the above string properly into a few parameters, in particular, the FUNC parameter is supposed to be "AppendRecs" but with the code above it becomes a complex text ending with the last /func
>>>
>>>(I removed some of the non relevant info from the input string).
>>>
>>>Is there a way to solve this problem?
>>>
>>>Thanks a lot in advance.
>>
>>You need to reduce the greed of the data group
>>
>>public static Regex XMLRegex = new Regex(@"<(?<field>[^/>]+)>(?<data>.*?)</\k<field>>", RegexOptions.Compiled | RegexOptions.Singleline);
>>
>>
>>the result then becomes
>>
>>
>><func>appendrecs</func>
>>
>><tcoperator>ADMIN</tcoperator>
>>
>><tcrecorddata>CHARGEDATE0001902/28/2019 00:00:00SALE_TEXT 00429<func>PaymentPlanCharge</func><orig_amount>125</orig_amount>SALESPOINT00006DANAIITRANSTYPE 000010</tcrecorddata>
>>
>><tcsalespoint>DANAII032001</tcsalespoint>
>>
>><tctablename>WW_SALES</tctablename>
>>
>
>I will not pretend I understand, but it does indeed work (for that test, at least).

Well, regex always tries to find the longest match - that is why it went up to the second func

I added a ? in the data group

read about the quantifiers *?, +? and ?? https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference#quantifiers
Gregory
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform