Replacing a Fox with a Duck... Beware that's a killer beast!

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Replacing a Fox with a Duck... Beware that's a killer beast!

Message

From

17/03/2023 05:52:13

Daniel Gadenne
France

All

General information

Forum:

Python

Category:

Other

Title:

Replacing a Fox with a Duck... Beware that's a killer beast!

Miscellaneous

Thread ID:

01686381

Message ID:

01686381

Views:

108

Likes (1)

Marco Plaza

Hi all of you die-hard VFPers still lurking here:-) Here is a short VFP-er view on current python "big data" ecosystem

I have been quietly lurking for over a year and half into the recent changes in the current python "big data" ecosystem. As a die-hard VFP-er. The need, well mine at this stage! But I hope I can share it with the customers of a current VFP workstation-based poor man's OLAP application.

The need to replace the database engine that currently drives this application:
1) data is stored in zipped files,
2) it is opened as our old friend temporary "vfp cursors".

The performance was really more than decent, in view of competition when the project was started 20 years ago. Now it is clearly not up the growing requirements in terms of data mangling (mostly massive statistical calculations on massive data sets).

I have been following the python big data ecosystem for quite a while and recently discovered what could definitely deliver an "order of magnitude" booster to our projects. I really mean a booster like the ones we discovered when:
1- fox was invented,
2- server-based relational engines became popular and affordable.

This year onwards, I intend to start using a combination of:
- the "pyarrow" ecosystem as a way to load data in memory fast (and save of course),
- the "duckdb" engine as a replacement of our our dear old "fox",
- "python" of course as the glue (in our case, highly demanding dataviz).

You can load gigabytes from disk-based data into an in-memory local sql engine in a snap. What is a snap here? Seconds. You can also load megabytes in fractions of the times in mentioned as well. This in a smart and effective way into an interpreter-centric environment python. Not that far from a vfp prompt... Yep that's for analytics, certainly not transactional systems of course!

As a VFP-er and a relational db user, the "order of magnitude" speedups I have been through are mind boggling to say the least. I have already mentioned what makes these "local engines" so impressive. But I am even more impressed by the day. The technology is still moving forward fast these days:
- smart vector-based memory handling,
- impressive "zero-copy" data access,
- clever use of multicore engines: your CPU will get hot!
- as a result of the previous points sheer OLAP power, not your mother's DBMS nor your grandpa's fox!

The good news this technology is affordable:
1) the "speed boost" is not strictly related to recent hardware. Old legacy boxes will also take advantage of this stuff. But of course these technology will of course make a smart use of recent multicore CPUs and large volume of RAM!
2) most of this ecosystem is essentially "open source", free of use within comfortable MIT alike licensing schemes. I recently went to the first "duckdb" conference. And a lot more is coming in terms of workstation-based power within this evolving "big data" python ecosystem.

Stay tuned, the workstation is coming back!

Daniel

Map

View

Click here to load this message in the networking platform