Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Random text in memo field
Message
From
23/03/2023 06:07:14
 
 
To
22/03/2023 14:51:02
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Miscellaneous
Thread ID:
01686399
Message ID:
01686403
Views:
69
Embolded part would be a red flag for me to work as defensively as possible ;-)

Expecting to work on a few 100K memo lines:
(and speed to be no issue compared to being correct)

ChrTran() approach might mangle product names if products like
Galaxy S20 FE, S300 rockets or 7up are included.

I'd work from right side of each line, pipe each memo field line into a cursor
with source line, current extracted name, price part, separation point and a couple of flag fields.

That way you can eyeball results, filter out anything known to work well and refine your approach.
If data is important, fill flag fields in advance, like has
n getwordcount()
n currency denominators,
n only numeric words
n only numeric+currency words
n only numeric+whitespaces+currency word tuples
currency denominator surrounded by white space
perhaps flagging numeric+point+separator chars

and so on. That way odd lines with symbols for $ and cent will be found if lurking in your memos.
On data cleaning manual entries (esp. with Excel in between) expect heavy facepalm moments -
been there, smacked myself as entry author was not available ;-)

Don't forget to ask in advance if result table must be cleaned on positional or typing errors:
McDonald's ==McDonalds == Mcdonalds == Mc Donald's == Mcdonlds ?
Cheese (gr.) == Cheese (grated) == Grated Cheese ?
Zucchini == Zuccini == Zuchini ?

experience from importing and cleaning several million rows each Q few Y ago
thomas

>>>Data that will go in the table could be entered manually by a user or with a paste because the user did a copy (ctrl-C) from an excel sheet.
>>
>>If you know for sure that the price part will contain only digits and a dollar sign, you could use CHRTRAN() to eliminate those characters:
>>
>>
>>DescOnly = CHRTRAN(MemoField,'0123456789$','')
>>
>>
>>However, if you might have commas and periods, it gets harder because those can appear in the descriptions, too. In that case, loop through the possible start characters for the price, using AT() to find out where that part starts. Then grab the front of the string.
>>
>>One other idea. If you can be sure the price part has no embedded spaces, use RAT() to find the rightmost space and keep everything in front of that.
>>
>>Tamar
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform