Audit trail thought-experiment

Level Extreme platform

Subscription

Corporate profile

Products & Services

Support

Legal

Français

Audit trail thought-experiment

Message

From

13/10/2001 05:14:02

Peter Stordiau
Heart Informatisering B.V.
Barneveld, Netherlands

12/10/2001 10:21:11

Robert Ladyman
File-Away Limited
Perthshire, United Kingdom

General information

Forum:

Politics

Category:

Other

Title:

Re: Audit trail thought-experiment

Miscellaneous

Thread ID:

00567478

Message ID:

00568135

Views:

Robert,

Yesterday I was that "fast" in replying, that I hardly can imagine my lines were readable; Now without responding to your response, I'll try to be some clearer on what I did in this far past (intrinsic time-based system) and what we use now for audit-trail. Remember, both have very different purposes, but the audit-trail could be used for the time-based system :

Time based system

This was in fact rather simple, in the objective back there was to see how a normal database could be turned into a database which could provide it's data at any point in past - or future (!) time. BTW it was written in dBaseIII+ (yea, that long ago).

As this was a normal (small) system, I adjusted all the indexes (primary and foreign) into having a date/time as the rightmost part. Obviously this implies the date/time is not an entity, though formally it might be. Might. It is always difficult with things like date/time to see whether it's an entity or not, but since real-time acticity must allow for any date/time i.e. not needing to be pre-defined, IMO it is just not an entity. That this is right is prooven by the technical solution, being the only solution within the consistency of *this* system (see below).

Now note this has nothing to do with transactions or so, and that just *any* record (via any index) is made unique (though rightmost) by the date/time its content was created (= deleted as well).

A system like this only appends records, and *never* does any replace or delete, hence grows and grows.

So, it's just the normal amount of tables as before.

Now each Seek, Skip, Go Top etc. have to be replaced by "some" time-integrated facility. F.e., assuming that the normal situation would be retrieving the current (i.e. system date/time), a Seek looks for the Key needed, concatenated by the current date/time (1/1000 sec), and assuming this time will not exist (chance is about zilt), there will be an eof, and a Go Recno(0) - Skip - 1 will give the most (!) current record for the current date/time.
Note, for now I'm not talking about Seek's on a leftmost part of any logical key.
So, keep in mind that this exact same way of working allows for records (with the same key) to be there with newer date/time, IOW being put there for future. Example : a system administering you getting a new job next month, whereas this system allows for retrieval of who works where next month, just by setting the retrievel date/time one month further. Thus, this functionality is intrinsicly there, usually needing extra entities and extra functionalities.
Obviously there is no difference at all in the technical system in setting the retrieval date/time to now, next month, or one year ago, precised by the 1/1000 sec when needed.

Of course -and also intrinsicly- there is an audit-trail in there as well, though extra sorting will be needed on the date/time fields, or extra indexes on them. Note that this will not be real-time retrieveable, because all the transactions are scattered over the various tables, and some run will be needed to form the audit-trail.

Audit-trail

The one we created, and use right now (in our ERP-app), was not so much created for "audit-trail", but more for data-replication / duplication to other servers. In the end of course, this just gives the audit-trail functionality, including roll-forward, and also roll-back.
Just because it was designed for the recplication, it anticipates on as less datacom as possible, knowing that a 1M per day transaction system must be able to blast the transactions to other systems per this same day.

Although we don't use any DBC or remote tables (so, just normal native dbf), we are able to define any table as "transactional", allowing for as less overhead as possible, knowing that each table defined as transactional will imply the call a function at each append, replace and delete.
Within a table each field can be defined as transactional as well, and therein each "data" can again be defined as transactional or not (will be at key-level).
This has not so much to do with your audit-trail Robert, but the next has (answering some question as well) :

The function dealing with the registring of the transaction, is called before the transaction itself. So what I say is, before an append is being performed, this function is being called, dealing with the previous transaction. Why ?

Just because of what you say or imply : the most current version of a record is in the according table itself, and need to be in any transaction record, as long as the run putting out the transactions to the other system, knows that any latest version is in the normal table indeed.
This has to do also with the Delete transaction; when the function dealing with the transaction would be called after the delete, the old version of the record would be unknown. BTW, please note that our transaction system is completely transparent to the developer, and all is dealt with at the system level, but one thing : the call of the function, which just not contains any data from the record in a parameter or whatever. So, it's just the call of the function, which sorts out stuff by itself, knowing the actual alias, knowing the current recno() (not when appending -> I don't explain that in here). So, a delete couldn't be coped with when the function was called afterwards.

Now not explaining the technical details further (there is very much to say on it, knowing that it took a few manyears to create this all), the function called of course creates this real audit-trail;
This one table contains changed data only, but for one table-record it contains all the fields changed by this one replace etcetera. Since this implies a variable record-length, it's all put in an ascii file (low-level writing), giving an overhead of less than 1/10 sec for the dealing with one replace etc.
So, one transaction record contains the changed fields and data for one table only, and is made unique by the date/time the replace etc. was performed, including the user-id as well and of course the transaction type (i/c/d).

Note that this way of working forever imply some "difficulties" i.e. redundancy in the run putting out the transactions (audit-trail list) because one part comes from the transaction table, and one part comes from the normal tables (most current version).

A few years ago we started some module for data-mining, which was indeed using the transaction table as "the database", knowing that history-data was important to such a module. So yes, this one table can just as well be used as "the database", though having the latest version of any record in there is rather convenient then;
Forming an actual record comes to starting at the bottom for the table concerned, working the way up until an insert is encountered, or all the fields have been encountered; Depending on the amendmends made to the one record, this takes more or less time obviously.
Again this transaction table could be used for a time-based system, just by not starting at the bottom (index on date/time) but seeking for the date/time etc, looking upwards again for all the field-data.

Note that the roll-forward as well as the roll-back features imply that the times-based features must be in there, though not being a real-time facility.

General

Why I was to fast yesterday ?
Because my lines about attributes becoming entities in fact were about a mixture of the two systems. Thus, my dBase system didn't have that, and my transaction system has it, but doesn't imply the time-based features implemented (well, it is not really used as such).

Last note : never start a one-table thing as a whole database with native dbf, because you will run into the 2GB limit. I mean, your transactions - or audit-trail is allowed to be cleared once in a while, while the DB obviously is not ...

Robert, what you can do with the above I don't know, but again, I am this enthusiast as well, and I'm 100 % sure you can achieve what you put in your thoughts. Personally I would go for the normal table system of the objective was the time-based system, giving the audit-trail as well. Note that re-creating a database out of transactions only is illogical with the respect to development, and you just loose a dimension (the table). But that too, IMO will just work and for 100 % sure will work fast enough as long as all fields are put into the transaction record (eliminating the need for collecting all the field-data, being there now in the one transaction record), and only the transaction record has to be found in the same way as described under my time-based system (Seek, Go Recno(0) - Skip -1).
And hey, if you put your thoughts into physical anyhow, don't forget to incorporate the date/time in your Seek. IOW you would have the time-based with about the same effort.

Regards,
Peter

>Peter,
>
>It's nice to hear from a fellow enthusiast - working all the time doesn't always leave space in the day for thinking about the theory behind the work one does.
>
>Your suggested approach is very interesting - the approach I took was to have an audit-trail record *for each field*, almost as you state, with an identical time-stamp for all the fields participating in the same transaction (i.e. the datetime field is not unique) the additions being the control information (including the PK of the record being trailled). This is not truly normalised as the timestamp is not unique, in order to gather together all the values for a transaction. Having thought about this, making the timestamp the primary key (and hence unique) would be valid, assuming the transaction to be serialisable and atomic (which is already possible with current technology): to reconstruct the record would be a case of filling the field-values for a virtual record by scanning back from a time T[now] to a maximum depth of T[insert] for the virtual primary key.
>
>Also of course, you are right, the 'OLD' value does NOT need to be stored. Neither, in fact, does the type of transaction (I/U/D) if we use a NULL/DELETED value for all the deleted fields: an 'insert' is an update to a non-existant record. All that is needed is datetime/table/field/value/virtualPK/user.
>
>I was trying to think of how to eliminate the table.field combination, but I think this needs to be retained: one might also need to identify the originating database, so to generalise (using 'handle' to indicate some combination of table/field-name, etc.:
>
>datetimeStamp/handle/value/virtualPK/User
>
>The HANDLE field could be dropped by having it as a record in the table, prior to the recorded changes, as could the virtualPK (and user).
>
>We end up with:
>datetimestamp/Value
>
>So, INSERT INTO XYZ (custname) VALUES ('fred') WHERE pk=99
>
>would become (when stored):-
>
>10-OCT-2001#12:01:60505 user:brian
>10-OCT-2001#12:01:60506 XYZ
>10-OCT-2001#12:01:60507 99
>10-OCT-2001#12:01:60508 custname
>10-OCT-2001#12:01:60509 fred
>
>5 records per field, in effect.
>
>Hmmm..."ThoughtExperiment Server version 1.0" need only have one table and one field-type (all values could be stored as strings). I can feel an article coming on.....
>
>One difficulty with this is, as you state, that the transactions must be scanned to build the record. From your description, you propose storing a timestamp as a foreign key into a table for the attributes of a record - is that correct?
>
>So:
>
>Table XYZ
>PK, custname (value is 10-OCT-2001#12:01:60509)
>
>Value table
>10-OCT-2001#12:01:60509 fred
>
>... and one can build the record that way. If I've got this right, how are you finding previous values?
>
>Alternatively, are you insterting into the same table (XYZ) with a datetimestamp+PK as the primary key? That's interesting in that it is very effiecient even with current technology.
>
>
>
>>
>>Robert,
>>
>>Just normalize your database to the time-aspect, as if all entities are made unique to the date/time; when this is worked out, each attribite will become an entity, and the data in the former attribute will be the attribute itself. So, all is shifted one "dimension". What do I say ?
>>
>>Well, that you are IMO completely right on this theoretical thoughts, and that is't only a very different solution, being optimized more than what I said above, having all the data constantly duplicated from the logical point of view. Yeah, logical, but not technical; Remember, where I say that all attributes become entities, each instance of the attribute's data will be in a new record, being practically the same as your audit-trail records.
>>Now one addition you can make :
>>
>>When my "approach" is taken as a basis (being sort of more logical), you your keys will have the date/time in it, while your way allows for a physical record-sequence, the newest at the bottom. Anyhow, what I'd like to say is that my approach allows for this beautiful time-bases system, in where a COMPLETE DATABASE can be viewed at a certain moment in time, giving the 100 % consistent situation at that time. F.e., this 8 years ago where you were married to your ex wife Marianne {g} at this same time your neighbour from that time had 1 child only. This can all be solved at system-level, the developers not knowing about it.
>>Note that your audit-trail can cause the same functionality, though intrinsicly you have to collect all the transaction-records to get to the actual point of the data. Maybe I'm to fuzzy (not wanting to be extensive at the same time), but in the end you too can Seek at starting points, going to Recno(0) (etc.) to get the nearest records, once you put the appropriate indexes on the table(s).
>>
>>In a far past I created a test-system working in this way, and you won't believe your eyes what info will be in the database just because of the time-based phenomenon. Of course records will never be deleted, and a ogical Delete just adds another record (per attribute), namely "blank", or : not there.
>>
>>Excuse me for this fast fuzzy writing, but I'm just an enthousiast for these kind of things.
>>BTW, right now our main app contains an audit trail of all, indeed allowing for the reconstruction (or duplication, replication) of all (roll forward). So of course your are right.
>>
>>Cheers,

Heart-Profit

Map

View

Click here to load this message in the networking platform