Displaying elements of strings in simple way howto

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Displaying elements of strings in simple way howto

Message

15/02/2005 13:29:07

Randy Pearson
Randy Pearson
Pennsylvanie, États-Unis

15/02/2005 11:29:17

Zakaria Al Azhar
Actuaris.Net
Tétouan, Morocco

Information générale

Forum:

Visual FoxPro

Catégorie:

Codage, syntaxe et commandes

Titre:

Re: Displaying elements of strings in simple way howto

Versions des environnements

Visual FoxPro:

VFP 8

Divers

Thread ID:

00986136

Message ID:

00987145

Vues:

I was playing around with your specific need, in order to get better insight about when lookbehind assertions are needed. FWIW, here's some example code that seems to address what you need. It combines some RegEx with some VFP, much as Lauren Clarke does in our FoxTalk articles.

In doing this, the original pattern required some tweaking, which was much easier with RegEx than if you use VFP code to search for the pattern. The first issue occurred if there already were periods in the strings. Say your string already was "Z. al Azhar"--the original pattern would consider the "." a word boundary and thus match the "Z" thus causing a second "." to be added. The first fix I tried was to change the trailing "\b" to "\s" (white space); however that failed to capture the case where an acronym appears at the complete end of the string, as there would be no white space. What appears to work is the more complex "(?=\s|$)", which says look for white space or the end of the string, but do not include this part in the match.

CLEAR
LOCAL lcTests, lcTestInput, lcTestOutput, ii
lcTests = "J Randy Pearson,Sponsored by JBG Industries,Already A. Period,At OPS"
FOR ii = 1 TO GETWORDCOUNT(m.lcTests, ',')
  lcTestInput = GETWORDNUM(m.lcTests, m.ii, ',')
  ? "   In:", lcTestInput
  lcTestOutput = InsertPeriods(m.lcTestInput)
  ? "  Out:", lcTestOutput
  ?
ENDFOR
RETURN

FUNCTION InsertPeriods(lcIn)
  LOCAL loRE1 AS "VBScript.RegExp", loMatches, loMatch, lcOut, lcStr
  loRE1 = CREATEOBJECT("VBScript.RegExp")
  loRE1.IgnoreCase = .F. && case-sensitive
  loRE1.Pattern = '\b[A-Z]+(?=\s|$)'
  loMatches = loRE1.Execute(m.lcIn)
  IF loMatches.Count = 0
    lcOut = m.lcIn
  ELSE
    lnPtr = 0
    lcOut = ""
    FOR EACH loMatch IN loMatches
      loMatch = loMatches.Item(0) && 0-based
      IF loMatch.FirstIndex > m.lnPtr 
        * Copy residual before match:
        lcOut = m.lcOut + SUBSTR(m.lcIn, m.lnPtr + 1, loMatch.FirstIndex - m.lnPtr)
      ENDIF
      lcMatch = SUBSTR(m.lcIn, loMatch.FirstIndex + 1, loMatch.Length)
      ? "Match:", m.lcMatch
      lcOut = m.lcOut + InterlacePeriods(m.lcMatch)
      lnPtr = loMatch.FirstIndex + loMatch.Length
    ENDFOR
    IF m.lnPtr + 1 < LEN(m.lcIn) 
      * Copy residual after last match:
      lcOut = m.lcOut + SUBSTR(m.lcIn, m.lnPtr + 1)
    ENDIF
  ENDIF
  RETURN m.lcOut
ENDFUNC

FUNCTION InterlacePeriods(lcIn)
  LOCAL lcOut, ii
  lcOut = ""
  FOR ii = 1 TO LEN(m.lcIn)
    lcOut = m.lcOut + SUBSTR(m.lcIn, m.ii, 1) + "."
  ENDFOR
  RETURN m.lcOut
ENDFUNC

>Hi Randy,
>
>Thanks for the answer. I have read your co-authored article in FoxTalk about parsing and regular expressions which I found very very intresting. In the near future I will look at the aricle with more depth and will implement it in some way.
>
>Thanks for the suggestions.
>
>
>
>>From those examples, it looks like you're planning some character-by-character analysis coding to look for and react to patterns. This can get complex in a hurry, not to mention being hard to maintain.
>>
>>This is an area that can often be served by regular expression (RegEx) technology. For example, to find any occurrence of a single capital letter by itself, you simply specify a pattern of '\b[A-Z]\b'. To clarify a bit, '\b' is a special escape sequence meaning "word boundary". This includes spaces, beginnings and ends of sentences and paragraphs, etc.--you don't have to consider all the possibilities.
>v>
>>I wasn't certain if your second example was the same rule, or different. If you're looking for any sequence with a series of capital letters by itself, the above pattern can be changed to '\b[A-Z]+\b', which will find occurrences of one or more capital letters between work boundaries.
>>
>>Sometimes, once patterns are found, you can use native RegEx "replace" functionality to accomplish transformation goals. Unfortunately, VFP doesn't support RegEx natively, and without 3rd party products, we're limited to the VBScript.RegExp that comes with MSIE. This RegEx lacks the native abilities (both nesting and "positive lookbehind assertions") to let you break down the patterns into the single characters that would let you handle this in one pass.
>>
>>In other words, you can fetch the occurrences "Z" and "JAM" easily enough, you're still forced to do something more to insert a period after each letter (in the multi-letter case anyway). So, I'm left with the conclusion that for this use case, without a better RegEx implementation available, you're probably equally well off with pure VFP coding. Close call.
>>
>>-- Randy
>>
>>
>>>>What is your actual use case? That is, what do you need to do with each >character (print, store, operate in some other way)?
>>>
>>>I have some names with initials. I want to delimit the initials with a dot,
>>>for example Z al Azhar has to become Z. al Azhar; JAM Janssen becomes J.A.M. Janssen. I have the initials in a distinct field.
>>>
>>>> Is performance an issue? What character can appear in the string (any >control character, etc.)?
>>>
>>>No performance is not an issue, only characters in the alphabet can occure. I was just thinking that there was an handy function in foxpro to do this.

Répondre

Fil

Voir

Click here to load this message in the networking platform