Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "padding the numbers"
-
Was just thinking of building a command line tool's to ease development of some of my games assets (Just packing them all together) and seeing as I want to use gamemaker studio 2 thought that my obsession with JSON would be perfect for use with it's ds_map functions so lets start understanding the backend of these functions to tie them with my CL tool...
*See's ds_map_secure_save*
Oh this might be helpful, easily save a data structure with decent encryption...
*Looks at saved output and starts noticing some patterns*
Hmm, this looks kinda familiar... Hmmm using UTF-8, always ends with =, seems to always have 8 random numbers at the start.. almost like padding... Wait... this is just base64!
Now yoyogames, I understand encryption can be hard but calling base64 'secure' is like me flopping my knob on the table and calling it a subtle flirt...6 -
You know what's cool working for a company that uses Github for version control? My contributions on my profile are going through the roof and I expect will make me look like an open source hero!
-
Hey, been gone a hot minute from devrant, so I thought I'd say hi to Demolishun, atheist, Lensflare, Root, kobenz, score, jestdotty, figoore, cafecortado, typosaurus, and the raft of other people I've met along the way and got to know somewhat.
All of you have been really good.
And while I'm here its time for maaaaaaaaath.
So I decided to horribly mutilate the concept of bloom filters.
If you don't know what that is, you take two random numbers, m, and p, both prime, where m < p, and it generate two numbers a and b, that output a function. That function is a hash.
Normally you'd have say five to ten different hashes.
A bloom filter lets you probabilistic-ally say whether you've seen something before, with no false negatives.
It lets you do this very space efficiently, with some caveats.
Each hash function should be uniformly distributed (any value input to it is likely to be mapped to any other value).
Then you interpret these output values as bit indexes.
So Hi might output [0, 1, 0, 0, 0]
while Hj outputs [0, 0, 0, 1, 0]
and Hk outputs [1, 0, 0, 0, 0]
producing [1, 1, 0, 1, 0]
And if your bloom filter has bits set in all those places, congratulations, you've seen that number before.
It's used by big companies like google to prevent re-indexing pages they've already seen, among other things.
Well I thought, what if instead of using it as a has-been-seen-before filter, we mangled its purpose until a square peg fit in a round hole?
Not long after I went and wrote a script that 1. generates data, 2. generates a hash function to encode it. 3. finds a hash function that reverses the encoding.
And it just works. Reversible hashes.
Of course you can't use it for compression strictly, not under normal circumstances, but these aren't normal circumstances.
The first thing I tried was finding a hash function h0, that predicts each subsequent value in a list given the previous value. This doesn't work because of hash collisions by default. A value like 731 might map to 64 in one place, and a later value might map to 453, so trying to invert the output to get the original sequence out would lead to branching. It occurs to me just now we might use a checkpointing system, with lookahead to see if a branch is the correct one, but I digress, I tried some other things first.
The next problem was 1. long sequences are slow to generate. I solved this by tuning the amount of iterations of the outer and inner loop. We find h0 first, and then h1 and put all the inputs through h0 to generate an intermediate list, and then put them through h1, and see if the output of h1 matches the original input. If it does, we return h0, and h1. It turns out it can take inordinate amounts of time if h0 lands on a hash function that doesn't play well with h1, so the next step was 2. adding an error margin. It turns out something fun happens, where if you allow a sequence generated by h1 (the decoder) to match *within* an error margin, under a certain error value, it'll find potential hash functions hn such that the outputs of h1 are *always* the same distance from their parent values in the original input to h0. This becomes our salt value k.
So our hash-function generate called encoder_decoder() or 'ed' (lol two letter functions), also calculates the k value and outputs that along with the hash functions for our data.
This is all well and good but what if we want to go further? With a few tweaks, along with taking output values, converting to binary, and left-padding each value with 0s, we can then calculate shannon entropy in its most essential form.
Turns out with tens of thousands of values (and tens of thousands of bits), the output of h1 with the salt, has a higher entropy than the original input. Meaning finding an h1 and h0 hash function for your data is equivalent to compression below the known shannon limit.
By how much?
Approximately 0.15%
Of course this doesn't factor in the five numbers you need, a0, and b0 to define h0, a1, and b1 to define h1, and the salt value, so it probably works out to the same. I'd like to see what the savings are with even larger sets though.
Next I said, well what if we COULD compress our data further?
What if all we needed were the numbers to define our hash functions, a starting value, a salt, and a number to represent 'depth'?
What if we could rearrange this system so we *could* use the starting value to represent n subsequent elements of our input x?
And thats what I did.
We break the input into blocks of 15-25 items, b/c thats the fastest to work with and find hashes for.
We then follow the math, to get a block which is
H0, H1, H2, H3, depth (how many items our 1st item will reproduce), & a starting value or 1stitem in this slice of our input.
x goes into h0, giving us y. y goes into h1 -> z, z into h2 -> y, y into h3, giving us back x.
The rest is in the image.
Anyway good to see you all again.27 -
Crypto. I've seen some horrible RC4 thrown around and heard of 3DES also being used, but luckily didn't lay my eyes upon it.
Now to my current crypto adventure.
Rule no.1: Never roll your own crypto.
They said.
So let's encrypt a file for upload. OK, there doesn't seem to be a clear standard, but ya'know combine asymmetric cipher to crypt the key with a symmetric. Should be easy. Take RSA and whatnot from some libraries. But let's obfuscate it a bit so nobody can reuse it. - Until today I thought the crypto was alright, but then there was something off. On two layers there were added hashes, timestamps or length fields, which enlarges the data to encrypt. Now it doesn't add up any more: Through padding and hash verification RSA from OpenSSL throws an error, because the data is too long (about 240 bytes possible, but 264 pumped in). Probably the lib used just didn't notify, silently truncating stuff or resorting to other means. Still investigation needed. - but apart from that: why the fuck add own hash verification, with weak non-cryptographic hashes(!) if the chosen RSA variant already has that with SHA-256. Why this sick generation of key material with some md5 artistic stunts - is there no cryptographically safe random source on Windows? Why directly pump some structs (with no padding and magic numbers) into the file? Just so it's a bit more fucked up?
Thanks, that worked.3