Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Replacing a Fox with a Duck... Beware that's a killer be
Message
From
05/04/2023 05:40:29
 
 
To
02/04/2023 16:24:42
John Ryan
Captain-Cooker Appreciation Society
Taumata Whakatangi ..., New Zealand
General information
Forum:
Python
Category:
Other
Miscellaneous
Thread ID:
01686381
Message ID:
01686451
Views:
43
Hi John,

Thanks for following on this non-vfp thread. Yes I am transitioning out to a new data-mangling environment after so many years, starting fox 1985 or 1984 if I remember correcty. That's long ago. Worth taking the time to discuss the matter a bit!

>Right... since with 600 million rows, if there's (say) 200 bytes per row then presumably (unless my math is way off!) you'll need around 110GB Memory

I did check the benchmarking figures provided by the duckdb. I tend to trust them based on what they promised and delivered over their first two years of existence. I started to look at them before they became a tech star on github and elsewhere. You can trust them both as persons and as a organization (not really funded by the way!).

Back to tests, I just grasped that their base machine is a formidable apple machine with brilliant memory access. You certainly understand why! Both the amount of memory at hand and the speed of access will matter here.

You may be right on this 0.6 sec time. There may be some errors here. they may be some sort of "lazy loading" here. But as a whole, you run a test and I expect you will be blown away in terms of "the time it takes to load csv (or better parquet) files from scratch and sql-mangle them!

One thing is for sure, the "big data" tech is now accommodating "zero-copy integration". This happens here between "DuckDB and "Apache Arrow". And this is an impressive tech that is starting to power AI, genomic research, data science and more. Interesting times as the say:-)

An already ancient introduction to "zero copy":
https://towardsdatascience.com/apache-arrow-read-dataframe-with-zero-memory-69634092b1a

The "zero-copy" integration between this modern days' fox (duckdb) and arrow (I'll use it in the coming years);
https://duckdb.org/2021/12/03/duck-arrow.html

Numpy, numba, duckdb, arrow.. I have learnt quite a few things dip my toe in the recent big data water à la python recently :-)

Daniel
Previous
Next
Reply
Map
View

Click here to load this message in the networking platform