Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Displaying elements of strings in simple way howto
Message
 
To
15/02/2005 13:29:07
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Environment versions
Visual FoxPro:
VFP 8
Miscellaneous
Thread ID:
00986136
Message ID:
00987827
Views:
45
Thanks Randy, I appreciate your contribution very much.

>I was playing around with your specific need, in order to get better insight about when lookbehind assertions are needed. FWIW, here's some example code that seems to address what you need. It combines some RegEx with some VFP, much as Lauren Clarke does in our FoxTalk articles.
>
>In doing this, the original pattern required some tweaking, which was much easier with RegEx than if you use VFP code to search for the pattern. The first issue occurred if there already were periods in the strings. Say your string already was "Z. al Azhar"--the original pattern would consider the "." a word boundary and thus match the "Z" thus causing a second "." to be added. The first fix I tried was to change the trailing "\b" to "\s" (white space); however that failed to capture the case where an acronym appears at the complete end of the string, as there would be no white space. What appears to work is the more complex "(?=\s|$)", which says look for white space or the end of the string, but do not include this part in the match.
>
>
>CLEAR
>LOCAL lcTests, lcTestInput, lcTestOutput, ii
>lcTests = "J Randy Pearson,Sponsored by JBG Industries,Already A. Period,At OPS"
>FOR ii = 1 TO GETWORDCOUNT(m.lcTests, ',')
>  lcTestInput = GETWORDNUM(m.lcTests, m.ii, ',')
>  ? "   In:", lcTestInput
>  lcTestOutput = InsertPeriods(m.lcTestInput)
>  ? "  Out:", lcTestOutput
>  ?
>ENDFOR
>RETURN
>
>FUNCTION InsertPeriods(lcIn)
>  LOCAL loRE1 AS "VBScript.RegExp", loMatches, loMatch, lcOut, lcStr
>  loRE1 = CREATEOBJECT("VBScript.RegExp")
>  loRE1.IgnoreCase = .F. && case-sensitive
>  loRE1.Pattern = '\b[A-Z]+(?=\s|$)'
>  loMatches = loRE1.Execute(m.lcIn)
>  IF loMatches.Count = 0
>    lcOut = m.lcIn
>  ELSE
>    lnPtr = 0
>    lcOut = ""
>    FOR EACH loMatch IN loMatches
>      loMatch = loMatches.Item(0) && 0-based
>      IF loMatch.FirstIndex > m.lnPtr
>        * Copy residual before match:
>        lcOut = m.lcOut + SUBSTR(m.lcIn, m.lnPtr + 1, loMatch.FirstIndex - m.lnPtr)
>      ENDIF
>      lcMatch = SUBSTR(m.lcIn, loMatch.FirstIndex + 1, loMatch.Length)
>      ? "Match:", m.lcMatch
>      lcOut = m.lcOut + InterlacePeriods(m.lcMatch)
>      lnPtr = loMatch.FirstIndex + loMatch.Length
>    ENDFOR
>    IF m.lnPtr + 1 < LEN(m.lcIn)
>      * Copy residual after last match:
>      lcOut = m.lcOut + SUBSTR(m.lcIn, m.lnPtr + 1)
>    ENDIF
>  ENDIF
>  RETURN m.lcOut
>ENDFUNC
>
>FUNCTION InterlacePeriods(lcIn)
>  LOCAL lcOut, ii
>  lcOut = ""
>  FOR ii = 1 TO LEN(m.lcIn)
>    lcOut = m.lcOut + SUBSTR(m.lcIn, m.ii, 1) + "."
>  ENDFOR
>  RETURN m.lcOut
>ENDFUNC
>
>
>>Hi Randy,
>>
>>Thanks for the answer. I have read your co-authored article in FoxTalk about parsing and regular expressions which I found very very intresting. In the near future I will look at the aricle with more depth and will implement it in some way.
>>
>>Thanks for the suggestions.
>>
>>
>>
>>>From those examples, it looks like you're planning some character-by-character analysis coding to look for and react to patterns. This can get complex in a hurry, not to mention being hard to maintain.
>>>
>>>This is an area that can often be served by regular expression (RegEx) technology. For example, to find any occurrence of a single capital letter by itself, you simply specify a pattern of '\b[A-Z]\b'. To clarify a bit, '\b' is a special escape sequence meaning "word boundary". This includes spaces, beginnings and ends of sentences and paragraphs, etc.--you don't have to consider all the possibilities.
>>v>
>>>I wasn't certain if your second example was the same rule, or different. If you're looking for any sequence with a series of capital letters by itself, the above pattern can be changed to '\b[A-Z]+\b', which will find occurrences of one or more capital letters between work boundaries.
>>>
>>>Sometimes, once patterns are found, you can use native RegEx "replace" functionality to accomplish transformation goals. Unfortunately, VFP doesn't support RegEx natively, and without 3rd party products, we're limited to the VBScript.RegExp that comes with MSIE. This RegEx lacks the native abilities (both nesting and "positive lookbehind assertions") to let you break down the patterns into the single characters that would let you handle this in one pass.
>>>
>>>In other words, you can fetch the occurrences "Z" and "JAM" easily enough, you're still forced to do something more to insert a period after each letter (in the multi-letter case anyway). So, I'm left with the conclusion that for this use case, without a better RegEx implementation available, you're probably equally well off with pure VFP coding. Close call.
>>>
>>>-- Randy
>>>
>>>
>>>>>What is your actual use case? That is, what do you need to do with each >character (print, store, operate in some other way)?
>>>>
>>>>I have some names with initials. I want to delimit the initials with a dot,
>>>>for example Z al Azhar has to become Z. al Azhar; JAM Janssen becomes J.A.M. Janssen. I have the initials in a distinct field.
>>>>
>>>>> Is performance an issue? What character can appear in the string (any >control character, etc.)?
>>>>
>>>>No performance is not an issue, only characters in the alphabet can occure. I was just thinking that there was an handy function in foxpro to do this.
Zakaria al Azhar
My blog on Actuaris.net
Previous
Reply
Map
View

Click here to load this message in the networking platform