Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Random text in memo field
Message
De
23/03/2023 06:07:14
 
 
À
22/03/2023 14:51:02
Information générale
Forum:
Visual FoxPro
Catégorie:
Codage, syntaxe et commandes
Divers
Thread ID:
01686399
Message ID:
01686403
Vues:
70
Embolded part would be a red flag for me to work as defensively as possible ;-)

Expecting to work on a few 100K memo lines:
(and speed to be no issue compared to being correct)

ChrTran() approach might mangle product names if products like
Galaxy S20 FE, S300 rockets or 7up are included.

I'd work from right side of each line, pipe each memo field line into a cursor
with source line, current extracted name, price part, separation point and a couple of flag fields.

That way you can eyeball results, filter out anything known to work well and refine your approach.
If data is important, fill flag fields in advance, like has
n getwordcount()
n currency denominators,
n only numeric words
n only numeric+currency words
n only numeric+whitespaces+currency word tuples
currency denominator surrounded by white space
perhaps flagging numeric+point+separator chars

and so on. That way odd lines with symbols for $ and cent will be found if lurking in your memos.
On data cleaning manual entries (esp. with Excel in between) expect heavy facepalm moments -
been there, smacked myself as entry author was not available ;-)

Don't forget to ask in advance if result table must be cleaned on positional or typing errors:
McDonald's ==McDonalds == Mcdonalds == Mc Donald's == Mcdonlds ?
Cheese (gr.) == Cheese (grated) == Grated Cheese ?
Zucchini == Zuccini == Zuchini ?

experience from importing and cleaning several million rows each Q few Y ago
thomas

>>>Data that will go in the table could be entered manually by a user or with a paste because the user did a copy (ctrl-C) from an excel sheet.
>>
>>If you know for sure that the price part will contain only digits and a dollar sign, you could use CHRTRAN() to eliminate those characters:
>>
>>
>>DescOnly = CHRTRAN(MemoField,'0123456789$','')
>>
>>
>>However, if you might have commas and periods, it gets harder because those can appear in the descriptions, too. In that case, loop through the possible start characters for the price, using AT() to find out where that part starts. Then grab the front of the string.
>>
>>One other idea. If you can be sure the price part has no embedded spaces, use RAT() to find the rightmost space and keep everything in front of that.
>>
>>Tamar
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform