Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
The 254/240 barrier in VFP
Message
From
24/12/2002 09:12:05
Cetin Basoz
Engineerica Inc.
Izmir, Turkey
 
 
To
23/12/2002 00:44:41
Vikas Burman
Coveda Technologies Pvt Ltd
Chandigarh, India
General information
Forum:
Visual FoxPro
Category:
Databases,Tables, Views, Indexing and SQL syntax
Miscellaneous
Thread ID:
00735115
Message ID:
00735621
Views:
7
>Thanks buddy
>
>In fact, I thought of this approach earlier, but can we be "100%" sure that checksum value returned by the function is always relied on and will always be a one to one mapping with its larger counterpart (text).
>
>Can you explain the checksum story in VFP - or can you send me a link which explains the checksum in detail?
>
>Thanks a lot.
>Vikas

Vikas,
Checksum is a hashing algorithm. It's very simple that many other algorithms like MD1,...MD5, SHA, HMAC etc are developed for high security services. CryptoAPI section has all descriptions, sample code etc. On MSDN check for title "The Cryptography API, or How to Keep a Secret". Searching for 'hash' brings a very long set with some C sample codes.
This is from http://www.math.utah.edu/~beebe/software/filehdr/node15.html :
"Consequently, a lot of research has been done on algorithms for finding checksums, and some have even achieved international standardization. One of these standard algorithms is known as a CRC-16 checksum. CRC stands for cyclic redundancy checksum ,¸cyclic redundancy checksum¸checksumcyclic redundancy and the redundancy of following it with the word checksum is accepted practice. The CRC-16 checksum¸checksumCRC-16 is capable of detecting error bursts up to 16 bits, and 99 percent of bursts greater than 16 bits in length. The checksum number is represented as a 16-bit unsigned number, encompassing the range 0 ... 65535. Thus, there is roughly one chance in 65535 of an error not being detected, that is, of two different files having the same checksum."

and this is from http://mathworld.wolfram.com/CyclicRedundancyCheck.html :
"The method is not infallible since for an N-bit checksum, of random blocks will have the same checksum for inequivalent data blocks. However, if N is large, the probability that two inequivalent blocks have the same CRC can be made very small."


Also there are other fast algorithms like Adler32 creating 32bits checksums.
You might also want to check these :
www.4d.com/ACIDOC/CMU/CMU79786.HTM
www.topshareware.com/details.asp?556
www.dataman.com/files/S4/checksum.htm
www.zvon.org/tmRFC/RFC2960/Output/chapter19.html
www.relisoft.com/Science/CrcNaive.html
pascal.sources.ru/crc/pascrc32.htm

Even thinking 2 different char streams might produce same checksum you could optimize your search putting the indexed one in front. ie:

... where sys(2007,field) == sys(2007,lcSearch) and field == lcSearch ...

Reading your purpose on other branch I don't think you need to index such a long text. Assign generated integer keys for real productID and assume user productIDs as kind of description. Products wouldn't be too much on any system and even without an index that table could be searhed fast. Combining with the above idea you can have an indirect index for it (collisions if ever occur still would pull out few to check further).
Cetin
Çetin Basöz

The way to Go
Flutter - For mobile, web and desktop.
World's most advanced open source relational database.
.Net for foxheads - Blog (main)
FoxSharp - Blog (mirror)
Welcome to FoxyClasses

LinqPad - C#,VB,F#,SQL,eSQL ... scratchpad
Previous
Reply
Map
View

Click here to load this message in the networking platform