Random text in memo field

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Random text in memo field

Message

23/03/2023 06:07:14

Thomas Ganss
Main Trend
Frankfurt, Allemagne

22/03/2023 14:51:02

Denis Chasse
Ultimax
Montréal, Québec, Canada

Information générale

Forum:

Visual FoxPro

Catégorie:

Codage, syntaxe et commandes

Titre:

Re: Random text in memo field

Divers

Thread ID:

01686399

Message ID:

01686403

Vues:

Embolded part would be a red flag for me to work as defensively as possible ;-)

Expecting to work on a few 100K memo lines:
(and speed to be no issue compared to being correct)

ChrTran() approach might mangle product names if products like
Galaxy S20 FE, S300 rockets or 7up are included.

I'd work from right side of each line, pipe each memo field line into a cursor
with source line, current extracted name, price part, separation point and a couple of flag fields.

That way you can eyeball results, filter out anything known to work well and refine your approach.
If data is important, fill flag fields in advance, like has
n getwordcount()
n currency denominators,
n only numeric words
n only numeric+currency words
n only numeric+whitespaces+currency word tuples
currency denominator surrounded by white space
perhaps flagging numeric+point+separator chars

and so on. That way odd lines with symbols for $ and cent will be found if lurking in your memos.
On data cleaning manual entries (esp. with Excel in between) expect heavy facepalm moments -
been there, smacked myself as entry author was not available ;-)

Don't forget to ask in advance if result table must be cleaned on positional or typing errors:
McDonald's ==McDonalds == Mcdonalds == Mc Donald's == Mcdonlds ?
Cheese (gr.) == Cheese (grated) == Grated Cheese ?
Zucchini == Zuccini == Zuchini ?

experience from importing and cleaning several million rows each Q few Y ago
thomas

>>>Data that will go in the table could be entered manually by a user or with a paste because the user did a copy (ctrl-C) from an excel sheet.
>>
>>If you know for sure that the price part will contain only digits and a dollar sign, you could use CHRTRAN() to eliminate those characters:
>>
>>

>>DescOnly = CHRTRAN(MemoField,'0123456789$','')
>>

>>
>>However, if you might have commas and periods, it gets harder because those can appear in the descriptions, too. In that case, loop through the possible start characters for the price, using AT() to find out where that part starts. Then grab the front of the string.
>>
>>One other idea. If you can be sure the price part has no embedded spaces, use RAT() to find the rightmost space and keep everything in front of that.
>>
>>Tamar

Répondre

Fil

Voir

Click here to load this message in the networking platform