Vision for your dev career?

Week 176 Group Rant

Demolishun

6y

I had a splash of inspiration. I would like to develop a method for analyzing unknown bitstreams of data. The method would involve determining the format of the data by trial and error machine learning algorithms. This would allow determining data types and byte formats and meanings of streams of data. Could be useful in data forensics. I would call the method: heuristic translation machine learning. I am currently developing code that does this. It will be fun to learn about reinforcement algorithms.

joke/meme

wk176

rant

goal

Ranter

Comments

2

netikras

35628

6y

´man file´
:)
1

NoToJavaScript

4691

6y

Cool ! But change the name "heuristic translation machine learning". DO we want a second HTML ;p ?
0

Yamakuzure

1758

6y

Machine Learning is roughly about pattern matching.

The patterns come from known data.

This seems a bit different than looking what *unknown* data might do or consists of.

And yes, 'file' does this already without needing terabytes of storage, dozens of CPUs and Gigabytes of RAM.
Although it only identifies known formats, of course.
1

Demolishun

38364

6y

@Yamakuzure The data in question is a bitstream. Not a bytestream. I have a source of data that it is not known how the bits are encoded or if they even have a complete byte in some cases. I intend to search for possibly characters in latin character set (the stream is old, like 40 to 50 years old) and possibly other datatypes. I have no confidence in the data being complete even for individual bytes. Yes, for data that has bytes that are unknown then other tools are available.
1

Yamakuzure

1758

6y

@Demolishun Also try EBCDIC if that data is that old.

Related Rants

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service