Plateforme Level Extreme
Abonnement
Profil corporatif
Produits & Services
Support
Légal
English
Comparing 2 tables; getting list of missing records
Message
De
07/08/2005 07:57:36
 
Information générale
Forum:
Visual FoxPro
Catégorie:
Base de données, Tables, Vues, Index et syntaxe SQL
Versions des environnements
Visual FoxPro:
VFP 9
Divers
Thread ID:
01037464
Message ID:
01039145
Vues:
27
Hi Craig,

>That's a very ingenious use of a crawler. I would have never thought to compile a word >list in this way, but given your results so far it certainly seems effective.

Thanks for the compliment. I also feared that this might end up with only 10,000 words after crawling some hundred pages and running dead, but thought it's worth a try even just for the fun of it. There is a lot of good linked content, and that's just one site. Of course additional crawling will find new words more and more seldom.

Another issue I'm now trying to resolve is manually broken words, eg "re-ply". And I catched some abbreviations like "zB" (which corresponds to eg), that would be written "z.B." and I stripped off the points. That's also an issue when detecting sentence ends. An abbreviation could be wrongly interpreted as such.

There are some problems with a crawler, but it's rather effective nevertheless. Now I could pass all those words to MS Word and spellcheck them there, but that would perhaps be a legal issue, allthough I don't take the words from MS Word then, I only check them. But doing so I make use of the hard work of compiling such a spell check word list - I'm quite aware how time consuming that is or would be if doing it manually. I also profit from the double checking for misspellings a news magazine like the Spiegel does - getting correctly spelled words from their site with a high confidence. I wouldn't crawl universalthread because of that, allthough there may be less errors than on a random personal homepage.

Concerning the legal issue of this the question also is, if I hurt copyrights of the Spiegel. I'm not copying their articles, only the words. And even the word "Spiegel" is a quite general word like windows is (Spiegel actually means mirror - by the way Spiegel compares more to the Time magazine thatn the Daily Mirror), nevertheless there are some trademark rights on (or to?) them. If it's not those general purpose names, then those trademark product names like Coca Cola or Mercedes Benz. They need to be on a black list and it's quite hard to find and filter them out.

Bye, Olaf.
Précédent
Suivant
Répondre
Fil
Voir

Click here to load this message in the networking platform