Level Extreme platform
Subscription
Corporate profile
Products & Services
Support
Legal
Français
Windows 11. Pleasant surprise for everybody?
Message
From
03/12/2021 03:47:51
 
 
To
02/12/2021 14:08:14
John Ryan
Captain-Cooker Appreciation Society
Taumata Whakatangi ..., New Zealand
General information
Forum:
Visual FoxPro
Category:
Other
Miscellaneous
Thread ID:
01682764
Message ID:
01682896
Views:
82
HI John,

"So am I correct to imagine Duckdb as a temporary repository - like a VFP cursor- for data munging, rather than a data store? Except available on Linux and other ecosystems as well as Windows?"

You could certainly say so. The difference is the speed at which you can "start a full and consistent multitable sql engine from scratch", i-e in literally a fraction of a second.

You can also feed the beast with poorly organized data of course in a very flexible (and possibly massive) way with in-memory data that you have prepared on your own from python upwards via pyarrow or pandas.

Pandas, already more than 10 years old, is the old beast (both for download and upload). Pyarrow is the new kid in the block, a lean "big-data-level" bridge in its python version. It, pyarrown knows next to no limit in terms of data size. Call it a data lake feeder. Check this (whilst aware of db size...):

https://duckdb.org/2021/12/03/duck-arrow.html

Imagine loading your csv-s at a much higher speed, then crunching as well much faster if you run your call accross gigabytes. Handling record-centric searches is paradoxically not faster, even a bit slower than on a typical record-centric sql engine such as sqlite, rushmore or plain vanilla Oracle, sql server engines. Why? That's clear: data is organized and, possibly compressed, at column-level...

This is really a new enabling technology fully available from python, R, rust and a couple of dev environments. Alas no fox support ! The community ain't that strong anymore alas to build such a bridge into duckdb:-

This "sort-of-dbf-like engine" is much faster overall for sure. But the capability down the line, once downloaded, at cursor level (numpy arrays instead of vfp data storage level: cursor or fox array) is also way easier.

By the way, I recently tested cython, sort of equivalent (but improved) fll-building to unlock c-speed level in python (python interpreter speed is really vfp-like). Not bad if you need "pedal-to-the-metal" cpu speed. But that's still bit too C-ish for me.... So yesterday I also tested - dev has not started here! - multiprocessor support for complex data crunching from the python interpreter with numba.

Impressive, i could run a multicore-enhanced numba-llvmlited calculation algo that really at speeds I'd never dreamed to achieve. 100% core utilization with full llvm support. More than C level speed that I could reasonably achieve since the parallelization code was mostly transparent. Well I'd never dream this in vfp or any other dev robust interpreter-based envlronment! When you check the llvemite project on github, you understand what this speed is built upon :-)

"FWIW, Mr Chen's latest effort extends maximum dbf (and cursor) size into the thousands of terabyte range- bigger than any currently available hard drive. But still insecure, unfortunately."

Glad to hear this news. A valuable resource for our community! Yep I still have a big vfp-based app as my dev main effort...

Daniel
Previous
Reply
Map
View

Click here to load this message in the networking platform