SQL breakdown of data - Level Extreme

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

SQL breakdown of data

Message

From

29/05/2017 12:22:55

Steve Elster
San Diego, California, United States

28/05/2017 13:14:20

Steve Elster
San Diego, California, United States

General information

Forum:

Visual FoxPro

Category:

Coding, syntax & commands

Title:

Re: SQL breakdown of data

Miscellaneous

Thread ID:

01648857

Message ID:

01651604

Views:

>>>>>>>>>>>>>>>Am I supposed to start the program with the first line, because I get an error when I do:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>;with counts as (select Id, count(Id) as frequency
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>I posted you code to try in SQL Server Management Studio if you're using SQL Server.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>What is your back-end?
>>>>>>>>>>>>>
>>>>>>>>>>>>>I do a lot of work in VFP. Can you suggest a statistics package that I can use within VFP? I've never used one. Thanks, Steve
>>>>>>>>>>>>
>>>>>>>>>>>>I don't know of any. Someone else made that suggestion, hopefully he'll clarify.
>>>>>>>>>>>
>>>>>>>>>>>I have clarified this question, I now have some code in VFP that does at least part of the job, but would still like to know if this can be done solely with SQL, and in VFP? For I suspect that an SQL solution will be easier to expand to include more "levels." I start the question with a reference to "two levels."
>>>>>>>>>>>
>>>>>>>>>>>Consider a table with 2 columns, both numeric: The Qty column contains orange juice package sizes in gallons. The mprices columns shows the cost paid for each unit of gallon. My question: What quantity (in # of gallons) are people most often buying and what is most common price that is being paid. Both cases require dividing the numbers into ranges, and then counting what is happening for each. The result is sort of two histograms with bins: one for qty being purchased, the other for range of prices paid. The kind of report I am looking for is produced by the below code in: myHist.dbf. Apparently I cannot upload prg for dbf files so the code follows:
>>>>>>>>>>
>>>>>>>>>>So you want to know which Qty appears most often and, for each Qty, which price is most common. Correct. Here's a stab at it.
>>>>>>>>>>
>>>>>>>>>>

>>>>>>>>>>SELECT Qty, COUNT(*) AS QtySold ;
>>>>>>>>>>   FROM YourTable ;
>>>>>>>>>>   GROUP BY Qty ;
>>>>>>>>>>   INTO CURSOR csrQtys
>>>>>>>>>>
>>>>>>>>>>SELECT Qty, MAX(QtySold) AS SoldMost ;
>>>>>>>>>>   FROM csrQtys ;
>>>>>>>>>>   INTO CURSOR csrSoldMost
>>>>>>>>>>
>>>>>>>>>>SELECT Qty, MPrices, COUNT(*) AS PriceCount ;
>>>>>>>>>>   FROM YourTable ;
>>>>>>>>>>   GROUP BY Qty, MPrices ;
>>>>>>>>>>   INTO CURSOR csrPriceCounts
>>>>>>>>>>
>>>>>>>>>>SELECT Qty, MPrices, MAX(PriceCount) AS PriceMost ;
>>>>>>>>>>   FROM csrPriceCounts ;
>>>>>>>>>>   GROUP BY Qty, MPrices ;
>>>>>>>>>>   INTO CURSOR csrPriceMost
>>>>>>>>>>

>>>>>>>>>>
>>>>>>>>>>We could cut that down to two queries total by making the first query in each pair into a derived table in the second. That is, we could put the first query (without the INTO clause) into the FROM of the second.
>>>>>>>>>>
>>>>>>>>>>Hope I understood what you were looking for and that this helps.
>>>>>>>>>>
>>>>>>>>>>Tamar
>>>>>>>>>
>>>>>>>>>Thank you! The problem remains, however, that because no prices or Qty's repeat, the counts are always 1. For, the above does not create a range of Qty's and prices. So the question is unanswered: How many times do people select, a Qty between, for instance, between .25 and 1.00 gallons and what ranges of prices do they pay for a certain Qty of juice, and how often do select a price that lies within that range of prices? In order words, one has to specify the number of bins for each field, count the occurrences within each, and then filter them to see which is the most "popular." Steve
>>>>>>>>>
>>>>>>>>>Examples of it being done:
>>>>>>>>>1. using a function such as round() but then the values better lie within a certain range? http://blog.shlomoid.com/2011/08/how-to-quickly-create-histogram-in.html
>>>>>>>>>2. https://gist.github.com/wolever/9164392
>>>>>>>>>3. Or, automate excel?
>>>>>>>>>
>>>>>>>>>I changed the field name of mPrice to myPrices to make things easier to read. The same change would have to be made in range1.prg, (code posted above.)
>>>>>>>>>
>>>>>>>>>

>>>>>>>>>SELECT Qty, COUNT(*) AS QtySold ;
>>>>>>>>>   FROM myPrices ;
>>>>>>>>>   GROUP BY Qty ;
>>>>>>>>>   INTO CURSOR csrQtys
>>>>>>>>>
>>>>>>>>>brow
>>>>>>>>>
>>>>>>>>>SELECT Qty, MAX(QtySold) AS SoldMost ;
>>>>>>>>>   FROM csrQtys ;
>>>>>>>>>   GROUP BY Qty, QtySold ;
>>>>>>>>>   INTO CURSOR csrSoldMost
>>>>>>>>>
>>>>>>>>>brow
>>>>>>>>>
>>>>>>>>>SELECT Qty, myPrices, COUNT(*) AS PriceCount ;
>>>>>>>>>   FROM myPrices ;
>>>>>>>>>   GROUP BY Qty, myPrices ;
>>>>>>>>>   INTO CURSOR csrPriceCounts
>>>>>>>>>
>>>>>>>>>BROWSE
>>>>>>>>>
>>>>>>>>>SELECT Qty, myPrices, MAX(PriceCount) AS PriceMost ;
>>>>>>>>>   FROM csrPriceCounts ;
>>>>>>>>>   GROUP BY Qty, myPrices ;
>>>>>>>>>   INTO CURSOR csrPriceMost
>>>>>>>>>  
>>>>>>>>>browse
>>>>>>>>>

>>>>>>>>
>>>>>>>>Just went back and read the thread. Seems to me you could create a cursor with the various ranges you want to test and then group the data into those ranges and do a count. Something like this:
>>>>>>>>
>>>>>>>>

>>>>>>>>CREATE CURSOR QtyRange (iID I, nLow N(6,2), nHigh N(6,2)
>>>>>>>>INSERT INTO QtyRange VALUES (1, .25, 1)
>>>>>>>>INSERT INTO QtyRange VALUES (2, 1.25,2)
>>>>>>>>INSERT INTO QtyRange VALUES (3, .25, .5)
>>>>>>>>INSERT INTO QtyRange VALUES (4, .51, 1)
>>>>>>>>* etc., to try different sets
>>>>>>>>
>>>>>>>>SELECT QtyRange.iID, COUNT(*) AS RangeCount;
>>>>>>>>   FROM YourTable ;
>>>>>>>>      JOIN QtyRange ;
>>>>>>>>          ON YourTable.Qty BETWEEN QtyRange.nLow and QtyRange.nHigh ;
>>>>>>>>   GROUP BY QtyRange.iID ;
>>>>>>>>   INTO CURSOR csrRangeCount
>>>>>>>>

>>>>>>>>
>>>>>>>>And then go on from there.
>>>>>>>>
>>>>>>>>Tamar
>>>>>>>
>>>>>>>Thanks! The following gets counts for the ranges in the "first" bin for a histogram, in this case for the qty column, using the ranges in nLow through nHigh. And idea as to how to get a count for the 2nd column, namely for a range of prices? Steve
>>>>>>>
>>>>>>>

>>>>>>>CREATE CURSOR QtyRange (iID I, nLow N(6,2), nHigh N(6,2))
>>>>>>>mID = 1
>>>>>>>mMin =  .1
>>>>>>>mMax =  1.0
>>>>>>>mDiff = mMax - mMin
>>>>>>>mBins = 10
>>>>>>>mIncre = mDiff/mBins
>>>>>>>mStep = mIncre
>>>>>>>mLow = mMin
>>>>>>>FOR r = .1 TO mBins mStep	
>>>>>>>	INSERT INTO QtyRange VALUES (mId, mLow, mLow + mStep)
>>>>>>>	mID = mID + 1
>>>>>>>	mLow = mLow + mStep
>>>>>>>endfor
>>>>>>>
>>>>>>>SELECT qtyRange.iID, qtyRange.nLow, qtyRange.nHigh, COUNT(*) AS RangeCount;
>>>>>>>   FROM qtyRange ;
>>>>>>>      JOIN myPrices ;
>>>>>>>          ON myPrices.Qty BETWEEN QtyRange.nLow and QtyRange.nHigh ;
>>>>>>>   GROUP BY QtyRange.iID, qtyRange.nLow, qtyRange.nHigh ;
>>>>>>>   INTO CURSOR csrRangeCount
>>>>>>>

>>>>>>
>>>>>>I think you want prices by Qty, right? So use your existing data to create the relevant ranges. You can find out the current min and max for a Qty value like this:
>>>>>>
>>>>>>

>>>>>>SELECT Qty, MIN(MyPrice) AS MinPrice, MAX(MyPrice) AS MaxPrice;
>>>>>>    FROM MyPrices ;
>>>>>>    GROUP BY Qty
>>>>>>

>>>>>>
>>>>>>and then divide that range up. If you want to look at unit price rather than total price (or maybe you're already storing a unit price?), do the math in the query, and you probably don't need the Qty in there.
>>>>>>
>>>>>>Once you have a range, you can do the same kind of thing as for the Qty values.
>>>>>>
>>>>>>Tamar
>>>>>
>>>>>Thank you, Tamar. Your suggestion to use "between" is a real eye opener for creating ranges for bins. I think it might be helpful if I create the kind of preliminary report that I am hoping to get, using SQL as versus hard-coding it with VFP, as I did above with range1 and 2. Here it is: P.S. I could not create a table---here---with columns that line up. So I have attached a snapshot that illustrates the report. It appears at the top of this message, on my screen.
>>>>
>>>>So the image is what you want to get. Obviously, you'd either need two tables, one for the left side and one for the right in a 1-many set-up, or the data on the left would be repeated in each related record. If you know what the price ranges of interest are, then do the same thing for the prices that you did for the quantities and create a cursor listing them. Then you can use the two cursors to drive the query. Something like (showing only the new code):
>>>>
>>>>

>>>>CREATE CURSOR PriceRange (iID I, nLowPrice N(6,2), nHighPrice N(6,2))
>>>>INSERT INTO PriceRange VALUES (...)
>>>>...
>>>>
>>>>
>>>>SELECT qtyRange.iID, qtyRange.nLow, qtyRange.nHigh, PriceRange.nLowPrice, PriceRange.nHighPrice, COUNT(*) AS PriceCount ;
>>>>   FROM myPrices ;
>>>>      JOIN qtyRange ;
>>>>        ON myPrices.Qty BETWEEN QtyRange.nLow and QtyRange.nHigh ;      
>>>>      JOIN PriceRange ;
>>>>         ON myPrices.MyPrice BETWEEN PriceRange.nLowPrice AND PriceRange.nHighPrice ;
>>>>   GROUP BY 1, 2, 3, 4, 5 ;
>>>>   INTO CURSOR csrResults
>>>>

>>>>
>>>>Couple of side notes. First, if you need to do this regularly and the ranges are reasonably stable, you may want to create tables to hold them rather than populating cursors on the fly.
>>>>
>>>>Second, it appears that you're fairly new to SQL. I wrote a book about VFP's SQL (ebook-only) that you can get from Hentzenwerke. It takes you through all the SQL commands in VFP in some detail. My website also contains a number of papers that cover aspects of VFP's SQL. Go to http://tomorrowssolutionsllc.com/conferencepapers.php and choose SQL in the Categories dropdown.
>>>>
>>>>Tamar
>>>
>>>Thank you. I am going to get Taming Foxpro's SQL and will be reading your articles, and get more familiar with SQL and what FoxPro can do!!! One other question: What if I want "the data on the left would be repeated in each related record?" (This can make it easier to filter the resulting table.) Can an SQL statement do that as well? I have attached a JPG, showing exactly what I mean? Thanks again, Steve
>>
>>Run the two queries that generate the counts and then do a join to get it all into one cursor.
>>
>>Tamar
>
>A big THANK YOU! The following works. Step 6 includes the counts of the prices for each prices range. But I could not figure out how to also include the counts of each range for the QTY's. Is there a way to do this? As a temporary solution, I used step 7. I think this is a big deal because it shows a way to construct what one might call multiple layers of frequency counts / histograms.
>Thanks again, Steve
>
>

>
>CLOSE all
>SET SAFETY off
>
>*** 1 ***
>CREATE TABLE myPrices(qty n(10,6), myPrices n(12,2) )
>FOR r = 1 TO 5000
>	INSERT INTO myPrices VALUES (RAND(), RAND()*1000000 )
>endfor
>INDEX on 100000 - myPrices TO priceIndex
>GO top
>priceTop = myPrices
>GO bott
>priceBott = myPrices
>GO top
>BROWSE
>wait
>
>*** 2 ***
>CREATE table qtyRange (qtyID I, qtyLow N(6,2), qtyHigh N(6,2))
>mID 	= 1
>mMin 	=  .1
>mMax 	=  2.0
>mDiff 	= mMax - mMin
>mBins 	= 10
>mIncre 	= mDiff/mBins
>mStep 	= mIncre
>mLow 	= mMin
>FOR r 	= mMin TO mBins mStep	
>	mHigh = mLow + mStep
>	INSERT INTO qtyRange VALUES (mId, mLow, mHigh )
>	mID = mID + 1
>	mLow = mLow + mStep
>endfor
>
>*** 3 *** *********** prices stuff
>CREATE table prRange (prID I, prLow N(6,2), prHigh N(6,2))
>mID2 	=  1
>mMin2 	=  priceBott
>mMax2 	=  priceTop
>mDiff2 	= INT(mMax2 - mMin2)
>mBins2 	= 3
>mIncre2 = mDiff2/mBins2
>mIncre2 = INT(mincre2)
>mStep2 	= mIncre2
>mLow2 	= mMin2
>mHigh2  = mLow2 + mStep2
>FOR r = 1 TO mBins2 mStep2	
>	INSERT INTO prRange VALUES (mId2, mLow2, mHigh2)
>	mID2 	= mID2  + 1
>	mLow2 	= mLow2 + mStep2
>	mHigh2 	= mLow2 + mStep2
>endfor
>*BROWSE
>*wait
>
>*** 4 *** counting QTY's for each range
>SELECT qtyRange.qtyID, qtyRange.qtyLow, qtyRange.qtyHigh, COUNT(*) AS qtyCount;
>   FROM qtyRange, myPrices;
>      where myPrices.Qty 	 BETWEEN QtyRange.qtyLow and QtyRange.qtyHigh 	;
>	  GROUP BY qtyRange.qtyID, qtyRange.qtyLow, qtyRange.qtyHigh;
>	  into table qtyRange2
>SELECT qtyRange2
>brow
>wait
>
>*** 5 -- counts of prices within each range
>CLOSE all
>SELECT prRange.prID, prRange.prLow, prRange.prHigh, COUNT(*) AS prCount;
>   FROM prRange, myPrices;
>      where myPrices.myPrices BETWEEN prRange.prLow and prRange.prHigh 	;
>	  GROUP BY prRange.prID, prRange.prLow, prRange.prHigh;
>	  into table prRange2
>SELECT prRange2
>BROWSE
>wait
>
>*** 6 *** includes count of prices ranges for each QTY range
>SELECT 	qtyRange.qtyID as qtyID, qtyRange.qtyLow, qtyRange.qtyHigh,;
>		prRange.prID as prID, prRange.prLow, prRange.prHigh, COUNT(prRange.prID) AS prCount ;
>   FROM myPrices ;
>      JOIN qtyRange ;
>        ON myPrices.Qty 		BETWEEN QtyRange.qtyLow 			and QtyRange.qtyHigh ;      
>      JOIN prRange ;
>         ON myPrices.MyPrices 	BETWEEN prRange.prLow 	AND prRange.prHigh ;
>   GROUP BY 1, 2, 3, 4, 5, 6;
>   INTO table mAll
>SELECT mAll
>brow
>wait
>
>*** 7 ***  Add in qty counts to the above
>SELECT ;
>mAll.qtyID, mAll.qtyLow, mAll.qtyHigh,	qtyRange2.qtyCount ,;
>mAll.prID,  mAll.prLow,  mAll.prHigh,   mAll.prCount	; 
>FROM mAll, qtyRange2 ;
>WHERE mAll.qtyLow = qtyRange2.qtyLow AND mAll.qtyHigh = qtyRange2.qtyHigh;
>INTO TABLE tall
>brow
>

I have tried to do the same thing, but adding another layer; but don't get the report I was expecting. I added a year's column to myprices tables. Gave it some values that is 1993 - 2017; got the counts for each year. But, when I try to add in yrRange to the above query, the report does is not what I want. I have attached it. As can be seen, the report contains only one price range, whereas from the code I would have thought there would have been three prRange entries for each qtRange. Here is the code, just the part with the 3 joins: Should I only expect this kind of nested report to work two joins deeps? Thank you, Steve

*** 6a *** prices ranges for each yr and QTY range
SELECT 	 yrRange.prID  as  yrID,  yrRange.yrLow,  yrRange.yrHigh  ,;
	 	qtyRange.qtyID as qtyID, qtyRange.qtyLow, qtyRange.qtyHigh ,;
		prRange.prID  as  prID,  prRange.prLow,  prRange.prHigh  ,;
		COUNT(*) AS prCount ;
   FROM myPrices 	;
      JOIN  yrRange ;
        ON  myPrices.yr BETWEEN  yrRange.yrLow AND yrRange. yrHigh ;      
      JOIN qtyRange ;
        ON myPrices.Qty BETWEEN QtyRange.qtyLow AND QtyRange.qtyHigh ;      
      JOIN prRange  ;
         ON myPrices.MyPrices 	BETWEEN prRangeprLow AND prRange.prHigh ;
   GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9;
   INTO table mAll1234
SELECT mAll1234
brow
wait

The following does generate the kind of report I am looking for; but this assumes that joins can only be nested two-deep?

*** 6aa1 *** prices ranges for each yr and QTY range
SELECT 	 prRange.prID  as  yrID,  yrRange.yrLow,  yrRange.yrHigh  ,;
	 	qtyRange.qtyID  as qtyID, qtyRange.qtyLow, qtyRange.qtyHigh ,;
		COUNT(*) AS qtyCount ;
   FROM myPrices 	;
      JOIN  yrRange ;
        ON  myPrices.yr BETWEEN  yrRange.yrLow and  yrRange.yrHigh ;      
      JOIN qtyRange ;
        ON myPrices.Qty BETWEEN QtyRange.qtyLow and QtyRange.qtyHigh ;      
   GROUP BY 1, 2, 3, 4, 5, 6;
   INTO table mALL
SELECT mALL
*brow
*wait

*** 6aa2 *** prices ranges for each yr and QTY range
SELECT 	mAll.yrID,  mAll.yrLow, mAll.yrHigh,;
		mAll.qtyID, mAll.qtyLow, mAll.qtyHigh, mALL.qtyCount ,;
		prRange.prID, prRange.prLow, prRange.prHigh  ,;
		COUNT(*) AS prCount ;
   FROM mALL,myPrices ;
      JOIN  prRange;
        ON myPrices.myPrices BETWEEN prRange.prLow and prRange.prHigh ;      
   GROUP BY 1, 2,3,4,5,6,7,8,9,10;
   INTO table mALL_2
SELECT mALL_2
brow
wait

Map

View

Click here to load this message in the networking platform