Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Algorithm question
Message
 
To
17/07/2011 03:55:33
General information
Forum:
Visual FoxPro
Category:
Coding, syntax & commands
Environment versions
Visual FoxPro:
VFP 9 SP2
OS:
Windows XP SP2
Network:
Windows 2008 Server
Database:
Visual FoxPro
Miscellaneous
Thread ID:
01518379
Message ID:
01518387
Views:
74
>>I have a basic algorithm question and know that there will be a very simple and efficient solution which replaces the looping and keeping track of variables version I currently have.
>>
>>I have a cursor with a list of key values such as -> AAAABBBBBCCCDDE (alpha value is simply for illustration)
>>
>>I want to produce a cursor with those values spread as evenly as possible by frequency, eg (done quickly by hand so not precise):
>>BACBADBECBADBCA
>>
>>In a nutshell, my current methodology is to track how often each element should appear in the list (eg. B should be one in every 3 (15/5)) and then insert based on that.
>>
>>I'm confident that there is someone out there who will make me slap my head in disgust by listing the simple & elegant solution that I should have come up with in the first place :)
>
>Here's one possibility that's IMO harder to explain than to code:
>
>* Set up source of data:
>CREATE CURSOR Source ;
>	( KeyVal C( 1 ) ;
>	, NumLeft I ;
>	, NumUsed I )
>
>INSERT INTO Source ;
>	( KeyVal ;
>	, NumLeft ;
>	, NumUsed ) ;
>	VALUES ;
>		( "A"  ;
>		, 4 ;
>		, 0 )
>
>INSERT INTO Source ;
>	( KeyVal ;
>	, NumLeft ;
>	, NumUsed ) ;
>	VALUES ;
>		( "B"  ;
>		, 5 ;
>		, 0 )
>
>INSERT INTO Source ;
>	( KeyVal ;
>	, NumLeft ;
>	, NumUsed ) ;
>	VALUES ;
>		( "C"  ;
>		, 3 ;
>		, 0 )
>
>INSERT INTO Source ;
>	( KeyVal ;
>	, NumLeft ;
>	, NumUsed ) ;
>	VALUES ;
>		( "D"  ;
>		, 2 ;
>		, 0 )
>
>INSERT INTO Source ;
>	( KeyVal ;
>	, NumLeft ;
>	, NumUsed ) ;
>	VALUES ;
>		( "E"  ;
>		, 1 ;
>		, 0 )
>
>* Guts of routine starts here:
>SELECT Source
>SUM NumLeft ALL TO lnInsertCount  && this line not necessary for 2nd possible index expression, below
>
>INDEX ON ( lnInsertCount * NumLeft ) - NumUsed TAG TagName DESCENDING
>* INDEX ON NumLeft * ( NumLeft / ( NumUsed + NumLeft ) ) TAG TagName DESCENDING && See Note 10 below
>GO TOP IN Source
>
>lcResult = ""
>
>DO WHILE Source.NumLeft > 0
>	lcResult = lcResult + Source.KeyVal
>	
>	REPLACE NumLeft WITH NumLeft - 1 ;
>		, NumUsed WITH NumUsed + 1 ;
>		NEXT 1 IN Source
>	
>	* Go to the (possibly new) top record:
>	GO TOP IN Source
>
>ENDDO
>
>?lcResult  && BABCABDCABEDCAB
>
>This is a variation on a bubble sort algorithm. The idea is:
>
>1. Create a "Source" cursor containing the key values ( "KeyVal" ) and their counts ( "NumLeft" ). This cursor also has another column ( "NumUsed" ) that contains the number of times each KeyVal has been used/inserted in the final result. This value is always zero to start.
>
>2. The critical thing is how the source cursor is indexed. For a first cut, you could simply INDEX ON NumLeft. This has problems which we'll get into later but let's leave it at this for now for the purposes of illustration.
>
>3. With that index expression, the top row in the cursor has the KeyVal "B" and NumLeft = 5.
>
>4. We enter the loop. We update the output (could be appending to a string as shown, or INSERTing a new row into another cursor).
>
>5. We update the top row of the Source cursor - decrement its NumLeft, and increment its NumUsed. Then we GO TOP in the Source cursor to go to the (possibly new) top row.
>
>6. Why do we have a NumUsed column? To help break ties. After the first pass through the loop, KeyVals B and A both have NumLeft = 4. One "fair" way to break the tie is to select the one that has the lowest NumUsed. As a second cut of the key expression, we could INDEX ON NumLeft - NumUsed. There is another subtle problem with that which we'll discuss later but that illustrates the idea.
>
>7. With the "improved" index expression, KeyVal "A" is now at the top of the list at the start of the second pass through the loop. It gets added to the output and its NumLeft and NumUsed values get updated.
>
>8. The index expression and the updates to its values causes the "next" KeyVal to always "bubble up" to the top row of the Source cursor. So, we can keep using the top row of the Source cursor for the next KeyVal to be used, until its NumLeft value is zero.
>
>With this algorithm:
>
>- "B" won't get used at all until NumLeft = 4 for "A"
>- "C" won't get used at all until NumLeft = 3 for "A" and "B"
>- "D" won't get used at all until NumLeft = 2 for "A", "B" and "C"
>- "E" won't get used at all until NumLeft = 1 for "A", "B", "C" and "D"
>
>However, once the NumLefts get down to those values, the new KeyVals will be the first used at that NumLeft level, because their NumUsed values are zero.
>
>9. Why is there a problem with using the index expression NumLeft - NumUsed? Because we can get ties such as (theoretical example only)
>
>KeyVal   NumLeft   NumUsed     IndexExpression
>
>  A         0         2               -2
>  B         1         3               -2
>
>In that case there is no guarantee the KeyVal "B" would bubble to the top of the Source cursor, so the loop could terminate prematurely (because NumLeft = 0 for KeyVal "A"). In fact, with the test data you provided, using that index expression yields a final result only 11 characters long, instead of 15. We always want values with NumLeft greater than zero to bubble to the top. One way to do this is to weight the NumLeft value by multiplying it by some value. My first thought was the multiplier should just be the number of rows in the Source cursor, but I suspect this may fail on initial NumLeft values that are significantly larger than the number of KeyVals. On second thought I decided to use the sum of all the initial NumLefts, i.e. the total number of appends/INSERTs that will be required. Maybe the MAX of the initial NumLefts would be good enough, I don't know.
>
>Like I said, easier to code than explain ;-)
>
>This bubble-sort yields a result ( "BABCABDCABEDCAB" ) that's different from your hand-coded example, but I believe it's mathematically "fair". Whether "fair" is the same as "evenly distributed" is up to you.
>
>10. The code for this algorithm is fairly simple and it's easy to experiment by changing the index expression. For example, changing it to the commented-out line in the code above gives another result:
>
>Index Expression 1 Result: BABCABDCABEDCAB
>Index Expression 2 Result: BABCADBCEABDCAB
>
>The second index expression factors in the % unused; the effect seems to be that lower-frequency KeyVals appear earlier in the result string.


Thanks Al, the basic methodology of coming up with an index expression and going to the TOP of the source cursor is already nicer than what I had going.Your solution is good for what I need insofar as the algorithm as well - there is no perfect answer and nothing is going to be much better as far as I can see.
Cheers,
Jamie
Previous
Reply
Map
View

Click here to load this message in the networking platform