Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
HOW TO REJECT BAD DATA????
Message
General information
Forum:
Visual FoxPro
Category:
Troubleshooting
Miscellaneous
Thread ID:
00019169
Message ID:
00019221
Views:
38
>>
>> >>Say I have a set of data that is the price of a home.
>> >>
>> >>100,000
>> >>125,000
>> >>92,000
>> >>175,000
>> >>135,000
>> >>10,000
>> >>500,000
>> >>
>> >>Well, I know that the 10,000 fiqure is really just for a lot. So when
>> figuring trends, I need to throw it out. I also know that the 500,000
>> fiqure is also way off, so it too needs to be thrown out.
>> >>
>> >>If each of these numbers were in a field in a dbf(one per record) How
>> could I programmatically throw out numbers that were way off? I have tried
>> diffrent forms of average and standard deviation, but nothing seems quite
>> right. I need some high school algebra or statistics lessons I think.
>> >
>
>No. It depends how you want to look at this. As a programmer, I would
>say that the specs are not complete and let the client sort this out.
>As a (honest, Bruce are there? :-)) statistician, you would have to
>justify your sample. If it is representative, you should not have to
>leave anything out, if it is not, ah then why conclude anything on its
>basis. And if you leave anything out, which would be acceptable if you
>can justify it, meaning if by doing so, you can better prove the
>representability of your data. I doubt it that this could be
>programmatically done since by definition, this would be a judgement
>call, which until lately cannot be done by a computer :-).
>
Yes, that's was I was getting at too. You shouldn't just arbitrarily toss data out because it doesn't look nice on your graph. But you might want to begin before the data-gathering stage to ensure that you're sampling exactly what you want. That is, you have to establish the focus of your sample before collecting data, not afterward. What types of homes in what neighborhoods in what timeframe, etc...

And is commonly the case with samples of housing costs and salaries, for example, with very high variability, the median is very good measure of central tendancy. You basically get to throw out the extreme values while remaining statistically fairly valid... of course, you'll need a number of medians from subcategories by region, time period, etc., to have a nice-looking graph :~)
The Anonymous Bureaucrat,
and frankly, quite content not to be
a member of either major US political party.
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform