How much faster is a SUN 5000 vs dual pentium pro 500Mhz

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

How much faster is a SUN 5000 vs dual pentium pro 500Mhz

Message

06/12/1999 09:37:15

Kenneth Downs
Secure Data Software, Inc.
New York, États-Unis

04/12/1999 08:50:46

Doug Harrison
Liquifusion Studios
Tallahassee, Floride, États-Unis

Information générale

Forum:

Visual FoxPro

Catégorie:

Autre

Titre:

Re: How much faster is a SUN 5000 vs dual pentium pro 500Mhz

Divers

Thread ID:

00298518

Message ID:

00299131

Vues:

Doug,

Answer first, theory following (code at the end). I had a nine million record table (record length appx. 100 bytes) to sort and it was taking anywhere from 2-4 hours, and was maddenly unpredictable. The solution was to break up the file into about 15 pieces, index and copy them out, append the results together, index that, and copy that out. Cut my time to 1 hour ten minutes and made it perfectly predictable.

Why does it work? In simplest terms, as a file grows it reaches a point where the records are scattered so far from each other that no buffering or caching mechanism can help you. The heads must navigate to each and every record, and navigation time goes as the square of the number of records (double the file, quadruple the navigation time).

By breaking the file up and sorting the pieces, then appending those and sorting the result, you are at the last step sorting a file with a great many of its records lined up already, and buffers, caching and so on are more of a help. Moreover, your navigation time goes from the square of the number of records to a straight linear relationship to the number of records.

All of this improves performance despite the odd fact that you are now reading and writing each record four times instead of just once.

Questions arise, however. Why did I pick 15? It was a practical result of running tests at 5, 10, 15, 20, 25, 30, 35, and 40 pieces. I made a more thorough mathematical analysis that takes into account the number of records n, the number of distinct values d, time to navigate the distance of one record Nt, and so on, but this type of analysis is difficult to carry through due to the fact that so many factors in OS and hardware spoil the ideal scenario that the equations describe. Fifteen worked so I used it.

The code goes something like this (beware my hardcoded constants, I'm typing this from memory, and add code to delete temporary files):

SELE MyTable
SET ORDER TO 0   &&---very important!
LOCATE

nRecordsPerBlock = INT(RECC()/15) + 2

FOR i = 1 TO 15
  lcI = ALLT(STR(i))

  IF i = 1
    LOCATE
  ELSE
    SKIP  &&--Else you get the last record again
  ENDIF
  COPY NEXT nRecordsPerBlock TO Temp1

  SELE 0
  USE Temp1
  INDEX ON MyField TAG Field
  COPY TO ("Temp" + lci + "Inter")

  IF i = 1
    SELE 0
    USE Temp1Inter
  ELSE
    SELE Temp1Inter
    APPEND FROM ("Temp" + lcI + "Inter")
  ENDIF

ENDFOR

SELE Temp1Inter
INDEX ON MyField TAG MyField
COPY TO MyFinalTable

You may of course choose to use SQL at the intermediate stage instead of COPY TO, I have not tested it yet (that's today's project).

Hope this helps!

>Thanks to each of you for your response. They are all exactly what I was looking for. The client was thinking about moving the back-end data processing to a faster server -- not thinking about the coding changes that would require. Currently they are running everything locally -- it is not even running across a network.
>
>Kenneth, you mentioned:
>1. If sorting a table of 3 million records, never let Fox do it. IOW, don't do an index and copy it out, and don't do a SQL SELECT...ORDER BY. There are tricks to vastly cutting the time it takes to sort large tables.
>
>Can you give me some hints in this area?
>
>Thanks again,
>Doug

Répondre

Fil

Voir

Click here to load this message in the networking platform