Replacing rushmore in 2020? The duck option:-)

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Replacing rushmore in 2020? The duck option:-)

Message

From

20/02/2020 04:39:45

Daniel Gadenne
France

19/02/2020 11:45:31

Thomas Ganss
Main Trend
Frankfurt, Germany

General information

Forum:

Visual FoxPro

Category:

Other

Title:

Re: Replacing rushmore in 2020? The duck option:-)

Miscellaneous

Thread ID:

01673095

Message ID:

01673116

Views:

Hi Thomas

You will find a more detailed output below. It contains the results for the VFP engine, still really great, the ubiquitous and equally impressive sqlite3 and the new OLAP-on-the-workstation beast in the making, duckdb.

Sorry that's no real benchmark, just a few tests that try to validate python + an embedded sql-based local db engine as a replacement to VFP + cursors + index in-memory scenarii. The database scenario is pretty simple: ONE SINGLE TABLE with 100,000 records - 10Mb text resource upload at runtime except for sqlite where there is no immediate "warp-speed" upload from python as far as I check (I may be wrong).

All these results, fox, sqlite, the new duck, are indeed equally impressive. But, should duckdb continue to improve in terms of performance and usability, I'd certainly be inclined to believe it could become the perfect fit for large data analytics on the desktop. It is designed to take advantage of the plentiful memory structure of our recent machines. And I like that!

Regards, Daniel

FOXPRO test - plain old VFP9 - RAM usage: 60 Mb in all cases
data is fed into VFP arrays

build in memory structure  0,0200
load 100K text file 0,5200
build btree 0,1100
select count(*) from big_data 10x avg: 0,0400
select count(*) from big_data filtered on text 10x avg: 0,0500
select * from big_data 10x avg: 0,3100
select single numeric from big_data 10x avg: 0,0900
select single numeric from big_data ordered 10x avg: 0,1600
select top 10,000 * from big_data order by numeric no-index 10x avg: 0,4200
select text_field,count(*) as cnt from big_data group by text_field order by 2 desc 10x avg: 1,4500
select text and numeric from big_data avg: 0,1200
select two numerical fields from big_data 10x avg: 0,1000
select numeric field from bigdata arbitrary textfield selection 10x avg:0,1062
100-K short SQL index-filtered calls avg:11,8300

SQLITE TEST - python 3.8 sqlite3 - RAM usage: dynamically moving 20 and 100 Mb 
database is diskbased - since loading from scratch might be time-consuming
data is fed into default python structures

select count(*) 10x avg:0.0156 sec.
select count(*) arbitrary textfield selection 10x avg:0.0484 sec.
select * from big_data in basic python 10x avg:0.8562 sec.
select single numeric from big_data in numpy array 10x avg:0.1578 sec.
select function from big_data 10x avg:0.1812 sec.
select text and numeric from big_data 10x avg:0.2109 sec.
select two numerical fields from big_data 10x avg:0.1969 sec.
select top 10,000 * from big_data order by numeric no-index avg:0.2281 sec.
select text_field,count(*) as cnt from big_data group by text_field order by 2 desc 0.1094 sec.
select numeric_field from big_data arbitrary textfield selection 0.1406 sec.
100-K short SQL index-filtered calls avg: time:27.0928 sec.

DUCKDB TEST -python 3.8 duckdb alpha version - RAM usage: 600 Mb / more when not constrained by OS
data is fed into python structures or optimized numpy arrays and pandas dataframes
 
build in memory structure 0.0000
load 100k lines 10 MB csv - 1x: 0.5937345027923584 sec.
build btree 0.0781
select count(*) 10x avg:0.0000 sec.
select count(*) arbitrary textfield selection 10x avg:0.0031 sec.
select * from big_data in basic python 10x avg:0.4745 sec.
select * from big_data in numpy array 10x avg:0.1125 sec.
select * from big_data into pandas frame 10x avg:0.1531 sec.
select single numeric from big_data in numpy array 10x avg:0.0031 sec.
select single numeric from big_data ordered in numpy array 10x avg:0.0812 sec.
select function from big_data in numpy array 10x avg:0.0219 sec.
select text and numeric from big_data in numpy array 10x avg:0.0250 sec.
select two numerical fields from big_data in numpy array 10x avg:0.0047 sec.
select top 10,000 * from big_data order by numeric no-index in numpy array avg:0.0922 sec.
select top 10,000 * from big_data order by numeric no-index in pandas frame avg:0.1000 sec.
select text_field,count(*) as cnt from big_data group by text_field order by 2 desc in numpy array 0.0156 sec.
select text_field,count(*) as cnt from big_data group by text_field order by 2 desc in pandas frame 0.0156 sec.
select numeric_field from big_data arbitrary textfield selection 0.0469 sec.
100-K short SQL index-filtered calls avg: time:33.2645 sec.

Map

View

Click here to load this message in the networking platform