Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "serialisation"
-
Okay, story time.
Back during 2016, I decided to do a little experiment to test the viability of multithreading in a JavaScript server stack, and I'm not talking about the Node.js way of queuing I/O on background threads, or about WebWorkers that box and convert your arguments to JSON and back during a simple call across two JS contexts.
I'm talking about JavaScript code running concurrently on all cores. I'm talking about replacing the god-awful single-threaded event loop of ECMAScript – the biggest bottleneck in software history – with an honest-to-god, lock-free thread-pool scheduler that executes JS code in parallel, on all cores.
I'm talking about concurrent access to shared mutable state – a big, rightfully-hated mess when done badly – in JavaScript.
This rant is about the many mistakes I made at the time, specifically the biggest – but not the first – of which: publishing some preliminary results very early on.
Every time I showed my work to a JavaScript developer, I'd get negative feedback. Like, unjustified hatred and immediate denial, or outright rejection of the entire concept. Some were even adamantly trying to discourage me from this project.
So I posted a sarcastic question to the Software Engineering Stack Exchange, which was originally worded differently to reflect my frustration, but was later edited by mods to be more serious.
You can see the responses for yourself here: https://goo.gl/poHKpK
Most of the serious answers were along the lines of "multithreading is hard". The top voted response started with this statement: "1) Multithreading is extremely hard, and unfortunately the way you've presented this idea so far implies you're severely underestimating how hard it is."
While I'll admit that my presentation was initially lacking, I later made an entire page to explain the synchronisation mechanism in place, and you can read more about it here, if you're interested:
http://nexusjs.com/architecture/
But what really shocked me was that I had never understood the mindset that all the naysayers adopted until I read that response.
Because the bottom-line of that entire response is an argument: an argument against change.
The average JavaScript developer doesn't want a multithreaded server platform for JavaScript because it means a change of the status quo.
And this is exactly why I started this project. I wanted a highly performant JavaScript platform for servers that's more suitable for real-time applications like transcoding, video streaming, and machine learning.
Nexus does not and will not hold your hand. It will not repeat Node's mistakes and give you nice ways to shoot yourself in the foot later, like `process.on('uncaughtException', ...)` for a catch-all global error handling solution.
No, an uncaught exception will be dealt with like any other self-respecting language: by not ignoring the problem and pretending it doesn't exist. If you write bad code, your program will crash, and you can't rectify a bug in your code by ignoring its presence entirely and using duct tape to scrape something together.
Back on the topic of multithreading, though. Multithreading is known to be hard, that's true. But how do you deal with a difficult solution? You simplify it and break it down, not just disregard it completely; because multithreading has its great advantages, too.
Like, how about we talk performance?
How about distributed algorithms that don't waste 40% of their computing power on agent communication and pointless overhead (like the serialisation/deserialisation of messages across the execution boundary for every single call)?
How about vertical scaling without forking the entire address space (and thus multiplying your application's memory consumption by the number of cores you wish to use)?
How about utilising logical CPUs to the fullest extent, and allowing them to execute JavaScript? Something that isn't even possible with the current model implemented by Node?
Some will say that the performance gains aren't worth the risk. That the possibility of race conditions and deadlocks aren't worth it.
That's the point of cooperative multithreading. It is a way to smartly work around these issues.
If you use promises, they will execute in parallel, to the best of the scheduler's abilities, and if you chain them then they will run consecutively as planned according to their dependency graph.
If your code doesn't access global variables or shared closure variables, or your promises only deal with their provided inputs without side-effects, then no contention will *ever* occur.
If you only read and never modify globals, no contention will ever occur.
Are you seeing the same trend I'm seeing?
Good JavaScript programming practices miraculously coincide with the best practices of thread-safety.
When someone says we shouldn't use multithreading because it's hard, do you know what I like to say to that?
"To multithread, you need a pair."18 -
I've optimised so many things in my time I can't remember most of them.
Most recently, something had to be the equivalent off `"literal" LIKE column` with a million rows to compare. It would take around a second average each literal to lookup for a service that needs to be high load and low latency. This isn't an easy case to optimise, many people would consider it impossible.
It took my a couple of hours to reverse engineer the data and implement a few hundred line implementation that would look it up in 1ms average with the worst possible case being very rare and not too distant from this.
In another case there was a lookup of arbitrary time spans that most people would not bother to cache because the input parameters are too short lived and variable to make a difference. I replaced the 50000+ line application acting as a middle man between the application and database with 500 lines of code that did the look up faster and was able to implement a reasonable caching strategy. This dropped resource consumption by a minimum of factor of ten at least. Misses were cheaper and it was able to cache most cases. It also involved modifying the client library in C to stop it unnecessarily wrapping primitives in objects to the high level language which was causing it to consume excessive amounts of memory when processing huge data streams.
Another system would download a huge data set for every point of sale constantly, then parse and apply it. It had to reflect changes quickly but would download the whole dataset each time containing hundreds of thousands of rows. I whipped up a system so that a single server (barring redundancy) would download it in a loop, parse it using C which was much faster than the traditional interpreted language, then use a custom data differential format, TCP data streaming protocol, binary serialisation and LZMA compression to pipe it down to points of sale. This protocol also used versioning for catchup and differential combination for additional reduction in size. It went from being 30 seconds to a few minutes behind to using able to keep up to with in a second of changes. It was also using so much bandwidth that it would reach the limit on ADSL connections then get throttled. I looked at the traffic stats after and it dropped from dozens of terabytes a month to around a gigabyte or so a month for several hundred machines. The drop in the graphs you'd think all the machines had been turned off as that's what it looked like. It could now happily run over GPRS or 56K.
I was working on a project with a lot of data and noticed these huge tables and horrible queries. The tables were all the results of queries. Someone wrote terrible SQL then to optimise it ran it in the background with all possible variable values then store the results of joins and aggregates into new tables. On top of those tables they wrote more SQL. I wrote some new queries and query generation that wiped out thousands of lines of code immediately and operated on the original tables taking things down from 30GB and rapidly climbing to a couple GB.
Another time a piece of mathematics had to generate all possible permutations and the existing solution was factorial. I worked out how to optimise it to run n*n which believe it or not made the world of difference. Went from hardly handling anything to handling anything thrown at it. It was nice trying to get people to "freeze the system now".
I build my own frontend systems (admittedly rushed) that do what angular/react/vue aim for but with higher (maximum) performance including an in memory data base to back the UI that had layered event driven indexes and could handle referential integrity (overlay on the database only revealing items with valid integrity) or reordering and reposition events very rapidly using a custom AVL tree. You could layer indexes over it (data inheritance) that could be partial and dynamic.
So many times have I optimised things on automatic just cleaning up code normally. Hundreds, thousands of optimisations. It's what makes my clock tick.4 -
I've been staffed on a old ongoing project, first day.
0. Compatibility has to be guaranteed down till IE9... ppf.
1. Front end made in XHTML+JS(jQuery)... bah, ok.
2. XHTML+JS is actually generated by PHP5.4, not a line is actually statically served... beh, funny, ok.
3. PHP files are the output of an XSLT transform of a bunch of XMLs... meh, seriously? Oooook.
4. XMLs are the product of the serialisation of a truck of stateful JavaEE6 DTOs populated magically (undocumented) with data coming from a SQL DB... WTF mode!!!
5. Session logics lives within PHP-land at point 2, front end makes ajax calls here that propagates to another WS out of our control that triggers -somehow- (undocumented) our Java backend at point 4 to generate new XMLs and then reach front end again. Kill me now.
Boss: look... it's too slow for the client, it's too heavy on our servers: fix it. Ah, and we sold 85% test coverage by October. You're the man for the job. (I'm a Node.js fullstacker and right now there's not even a testing scaffold, ofc).
Me: prod is on Linux or Windows?
Boss: RHEL7.
Me: rm -rf / as root. Done.
Boss: I know I know...
Me: ...
I think time has come...6 -
API team: "Hey they have a client demo on Monday! What about changing all of our object IDs from int to string on a Friday night without notifying our only subscribers?"1
-
Why do people keep inventing new object serialisations when there is already JSON and XML? Why do people think that a custom format and parser would be better than an industry standard and a parser that has years of development improvements? :-S
How would you teach / punish these people?2 -
XML and JSON endpoints while forcing Content-Type Text/Plain instead of allowing Web API to handle the serialisation. The f were these devs thinking.8
-
I just spent two days bson serialising a dictionary of dictionaries for mongodb in c#.
Finally got it running, hell yeah!!