Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "datapoints"
-
This is the third part of my ongoing series "The Ballad of the Six Witchers and the Undocumented Java Tool".
In this part, we have the massive Battle of Sparks and Storms.
The first part is here: https://devrant.com/rants/5009817/...
The second part is here: https://devrant.com/rants/5054467/...
Over the last couple sprints and then some, The Witcher Who Writes and the Butchers of Jarfile had studied the decompiled guts of the Undocumented Java Beast and finally derived (most of) the process by which the data was transformed. They even built a model to replicate the results in small scale.
But when such process was presented to the Priests of Accounting at the Temple of Cash-Flow, chaos ensued.
This cannot be! - cried the priests - You must be wrong!
Wrong, the Witchers were not. In every single test case the Priests of Accounting threw at the Witchers, their model predicted perfectly what would be registered by the Undocumented Java Tool at the very end.
It was not the Witchers. The process was corrupted at its essence.
The Witchers reconvened at their fortress of Sprint. In the dark room of Standup, the leader of their order, wise beyond his years (and there were plenty of those), in a deep and solemn voice, there declared:
"Guys, we must not fuck this up." (actual quote)
For the leader of the witchers had just returned from a war council at the capitol of the province. There, heading a table boarding the Archpriest of Accounting, the Augur of Economics, the Marketing Spymaster and Admiral of the Fleet, was the Ciefoh Seat himself.
They had heard rumors about the Order of the Witchers' battles and operations. They wanted to know more.
It was quiet that night in the flat and cloudy plains of Cluster of Sparks and Storms. The Ciefoh Seat had ordered the thunder to stay silent, so that the forces of whole cluster would be available for the Witchers.
The cluster had solid ground for Hive and Parquet turf, and extended from the Connection River to farther than the horizon.
The Witcher Who Writes, seated high atop his war-elephant, looked at the massive battle formations behind.
The frontline were all war-elephants of Hadoop, their mahouts the Witchers themselves.
For the right flank, the Red Port of Redis had sent their best connectors - currency conversions would happen by the hundreds, instantly and always updated.
The left flank had the first and second army of Coroutine Jugglers, trained by the Witchers. Their swift catapults would be able to move data to and from the JIRA cities. No data point will be left behind.
At the center were thousands of Sparks mounting their RDD warhorses. Organized in formations designed by the Witchers and the Priestesses of Accounting, those armoured and strong units were native to this cloudy landscape. This was their home, and they were ready to defend it.
For the enemy could be seen in the horizon.
There were terabytes of data crossing the Stony Event Bridge. Hundreds of millions of datapoints, eager to flood the memory of every system and devour the processing time of every node on sight.
For the Ciefoh Seat, in his fury about the wrong calculations of the processes of the past, had ruled that the Witchers would not simply reshape the data from now on.
The Witchers were to process the entire historical ledger of transactions. And be done before the end of the month.
The metrics rumbled under the weight of terabytes of data crossing the Event Bridge. With fire in their eyes, the war-elephants in the frontline advanced.
Hundreds of data points would be impaled by their tusks and trampled by their feet, pressed into the parquet and hive grounds. But hundreds more would take their place. There were too many data points for the Hadoop war-elephants alone.
But the dawn will come.
When the night seemed darker, the Witchers heard a thunder, and the skies turned red. The Sparks were on the move.
Riding into the parquet and hive turf, impaling scores of data points with their long SIMD lances and chopping data off with their Scala swords, the Sparks burned through the enemy like fire.
The second line of the sparks would pick data off to be sent by the Coroutine Jugglers to JIRA. That would provoke even more data to cross the Event Bridge, but the third line of Sparks were ready for it - those data would be pierced by the rounds provided by the Red Port of Redis, and sent back to JIRA - for good.
They fought for six days and six nights, taking turns so that the battles would not stop. And then, silence. The day was won, all the data crushed into hive and parquet.
Short-lived was the relief. The Witchers knew that the enemy in combat is but a shadow of the troubles that approach. Politics and greed and grudge are all next in line. Are the Witchers heroes or marauders? The aftermath is to come, and I will keep you posted.4 -
In 2015 I sent an email to Google labs describing how pareidolia could be implemented algorithmically.
The basis is that a noise function put through a discriminator, could be used to train a generative function.
And now we have transformers.
I also told them if they looked back at the research they would very likely discover that dendrites were analog hubs, not just individual switches. Thats turned out to be true to.
I wrote to them in an email as far back as 2009 that attention was an under-researched topic. In 2017 someone finally got around to writing "attention is all you need."
I wrote that there were very likely basic correlates in the human brain for things like numbers, and simple concepts like color, shape, and basic relationships, that the brain used to bootstrap learning. We found out years later based on research, that this is the case.
I wrote almost a decade ago that personality systems were a means that genes could use to value-seek for efficient behaviors in unknowable environments, a form of adaption. We later found out that is probably true as well.
I came up with the "winning lottery ticket" hypothesis back in 2011, for why certain subgraphs of networks seemed to naturally learn faster than others. I didn't call it that though, it was just a question that arose because of all the "architecture thrashing" I saw in the research, why there were apparent large or marginal gains in slightly different architectures, when we had an explosion of different approaches. It seemed to me the most important difference between countless architectures, was initialization.
This thinking flowed naturally from some ideas about network sparsity (namely that it made no sense that networks should be fully connected, and we could probably train networks by intentionally dropping connections).
All the way back in 2007 I thought this was comparable to masking inputs in training, or a bottleneck architecture, though I didn't think to put an encoder and decoder back to back.
Nevertheless it goes to show, if you follow research real closely, how much low hanging fruit is actually out there to be discovered and worked on.
And to this day, google never fucking once got back to me.
I wonder if anyone ever actually read those emails...
Wait till they figure out "attention is all you need" isn't actually all you need.
p.s. something I read recently got me thinking. Decoders can also be viewed as resolving a manifold closer to an ideal form for some joint distribution. Think of it like your data as points on a balloon (the output of the bottleneck), and decoding as the process of expanding the balloon. In absolute terms, as the balloon expands, your points grow apart, but as long as the datapoints are not uniformly distributed, then *some* points will grow closer together *relatively* even as the surface expands and pushes points apart in the absolute.
In other words, for some symmetry, the encoder and bottleneck introduces an isotropy, and this step also happens to tease out anisotropy, information that was missed or produced by the encoder, which is distortions introduced by the architecture/approach, features of the data that got passed on through the bottleneck, or essentially hidden features.4
