devRant - A fun community for developers to connect over code, tech & life as a programmer

Persisterising derived values. Often a necessary evil for optimisation or privacy while conflicting with concerns such as auditing.

Password hashing is the common example of a case considered necessary to cover security concerns.

Also often a mistake to store derived values. Some times it can be annoying. Sometimes it can be data loss. Derived values often require careful maintenance otherwise the actual comments in your database for a page is 10 but the stored value for the page record is 9. This becomes very important when dealing with money where eventual consistency might not be enough.

Annoying is when given a and b then c = a + b only b and c are stored so you often have to run things backwards.

Given any processing pipeline such as A -> B -> C with A being original and C final then you technically only need C. This applies to anything.

However, not all steps stay or deflate. Sum of values is an example of deflate. Mapping values is an example of stay. Combining all possible value pairs is inflate, IE, N * N and tends to represent the true termination point for a pipeline as to what can be persisted.

I've quite often seen people exclude original. Some amount of lossy can be alright if it's genuine noise and one way if serving some purpose.

If A is O(N) and C reduces to O(1) then it can seem to make sense to store only C until someone also wants B -> D as well. Technically speaking A is all you ever need to persist to cater to all dependencies.

I've seen every kind of mess with processing chains. People persisting the inflations while still being lossy. Giant chains linear chains where instead items should rely on a common ancestor. Things being applied to only be unapplied. Yes ABCBDBEBCF etc then truncating A happens.

Extreme care needs to be taken with data and future proofing. Excess data you can remove. Missing code can be added. Data however once its gone its gone and your bug is forever.

This doesn't seem to enter the minds of many developers who don't reconcile their execution or processing graphs with entry points, exist points, edge direction, size, persistence, etc.