Dragn,
>Just to check - I think I have observed earlier, in maybe 2.0 or 2.6, that it used three-byte pointers until some point (merge level 2 or whatever it was called) and then switched to four-byte pointers. Now I see four-byte pointers on even small cdxes. I figure they changed it long ago without notifying me :)
Internal to the CDX it might not use 4 byte pointers, especially when disk space was not so readily available I'm sure they were saving every possible byte they could. That constraint has largely been lifted now with the current capacity of drives.
>I figure this comes from reserved space in each page. This could actually be used to roughly calculate the percentage of this reserved space - since your fields were largely uncompressable, their overall size can be calculated, but only roughly. What I think is inducing an error into this calculation would be the increased number of level 1, level 2 etc nodes which should be there now that it splits pages earlier.
We'll have to spelunk some more into the CDX to find out the answers.
>Apart from our .cdx files being bulkier now, I think we've gained some speed and stability.
I agree.