Replacing rushmore in 2020? The duck option:-)

Plateforme Level Extreme

Abonnement

Profil corporatif

Produits & Services

Support

Légal

English

Replacing rushmore in 2020? The duck option:-)

Message

18/03/2020 14:16:51

Thomas Ganss
Main Trend
Frankfurt, Allemagne

20/02/2020 04:39:45

Daniel Gadenne
France

Information générale

Forum:

Visual FoxPro

Catégorie:

Autre

Titre:

Re: Replacing rushmore in 2020? The duck option:-)

Divers

Thread ID:

01673095

Message ID:

01673716

Vues:

J'aime (1)

Daniel Gadenne

Hi Daniel,

found part 2 of the article again: (has parts 1 and 3 as well)
https://st2.ning.com/topology/rest/1.0/file/get/2808360794?profile=original
which I found to be very fair and balanced.

Look at the improvements gained in 4b and 7b.
Also: testing single column queries is great for desciptive stats, but many statistical methods do work on interdependancy of "column values of specific row" - think multiple regression or factor analysis. Therefore IMO a more valid test would be building sums of multiples or perhaps values of columns modified / icased according to other columns - think of modifiying column as physical or aggregate state, behaving in a notably different way.

Comparing vfp and SQLite as embedded data store it might be best to use vfp integer / double / datetime formats comparing them to SQLite, as scientific values IMO more often are not fall into those formats which SQLite has to convert from text column format.

best regards
thomas

>Hi Thomas
>
>You will find a more detailed output below. It contains the results for the VFP engine, still really great, the ubiquitous and equally impressive sqlite3 and the new OLAP-on-the-workstation beast in the making, duckdb.
>
>Sorry that's no real benchmark, just a few tests that try to validate python + an embedded sql-based local db engine as a replacement to VFP + cursors + index in-memory scenarii. The database scenario is pretty simple: ONE SINGLE TABLE with 100,000 records - 10Mb text resource upload at runtime except for sqlite where there is no immediate "warp-speed" upload from python as far as I check (I may be wrong).
>
>All these results, fox, sqlite, the new duck, are indeed equally impressive. But, should duckdb continue to improve in terms of performance and usability, I'd certainly be inclined to believe it could become the perfect fit for large data analytics on the desktop. It is designed to take advantage of the plentiful memory structure of our recent machines. And I like that!
>
>Regards, Daniel
>
>

>FOXPRO test - plain old VFP9 - RAM usage: 60 Mb in all cases
>data is fed into VFP arrays
>
>build in memory structure  0,0200
>load 100K text file 0,5200
>build btree 0,1100
>select count(*) from big_data 10x avg: 0,0400
>select count(*) from big_data filtered on text 10x avg: 0,0500
>select * from big_data 10x avg: 0,3100
>select single numeric from big_data 10x avg: 0,0900
>select single numeric from big_data ordered 10x avg: 0,1600
>select top 10,000 * from big_data order by numeric no-index 10x avg: 0,4200
>select text_field,count(*) as cnt from big_data group by text_field order by 2 desc 10x avg: 1,4500
>select text and numeric from big_data avg: 0,1200
>select two numerical fields from big_data 10x avg: 0,1000
>select numeric field from bigdata arbitrary textfield selection 10x avg:0,1062
>100-K short SQL index-filtered calls avg:11,8300
>
>SQLITE TEST - python 3.8 sqlite3 - RAM usage: dynamically moving 20 and 100 Mb 
>database is diskbased - since loading from scratch might be time-consuming
>data is fed into default python structures
>
>select count(*) 10x avg:0.0156 sec.
>select count(*) arbitrary textfield selection 10x avg:0.0484 sec.
>select * from big_data in basic python 10x avg:0.8562 sec.
>select single numeric from big_data in numpy array 10x avg:0.1578 sec.
>select function from big_data 10x avg:0.1812 sec.
>select text and numeric from big_data 10x avg:0.2109 sec.
>select two numerical fields from big_data 10x avg:0.1969 sec.
>select top 10,000 * from big_data order by numeric no-index avg:0.2281 sec.
>select text_field,count(*) as cnt from big_data group by text_field order by 2 desc 0.1094 sec.
>select numeric_field from big_data arbitrary textfield selection 0.1406 sec.
>100-K short SQL index-filtered calls avg: time:27.0928 sec.
>
>DUCKDB TEST -python 3.8 duckdb alpha version - RAM usage: 600 Mb / more when not constrained by OS
>data is fed into python structures or optimized numpy arrays and pandas dataframes
> 
>build in memory structure 0.0000
>load 100k lines 10 MB csv - 1x: 0.5937345027923584 sec.
>build btree 0.0781
>select count(*) 10x avg:0.0000 sec.
>select count(*) arbitrary textfield selection 10x avg:0.0031 sec.
>select * from big_data in basic python 10x avg:0.4745 sec.
>select * from big_data in numpy array 10x avg:0.1125 sec.
>select * from big_data into pandas frame 10x avg:0.1531 sec.
>select single numeric from big_data in numpy array 10x avg:0.0031 sec.
>select single numeric from big_data ordered in numpy array 10x avg:0.0812 sec.
>select function from big_data in numpy array 10x avg:0.0219 sec.
>select text and numeric from big_data in numpy array 10x avg:0.0250 sec.
>select two numerical fields from big_data in numpy array 10x avg:0.0047 sec.
>select top 10,000 * from big_data order by numeric no-index in numpy array avg:0.0922 sec.
>select top 10,000 * from big_data order by numeric no-index in pandas frame avg:0.1000 sec.
>select text_field,count(*) as cnt from big_data group by text_field order by 2 desc in numpy array 0.0156 sec.
>select text_field,count(*) as cnt from big_data group by text_field order by 2 desc in pandas frame 0.0156 sec.
>select numeric_field from big_data arbitrary textfield selection 0.0469 sec.
>100-K short SQL index-filtered calls avg: time:33.2645 sec.
>

Répondre

Fil

Voir

Click here to load this message in the networking platform