Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
Get a devDuck
Rubber duck debugging has never been so cute! Get your favorite coding language devDuckBuy Now
Search - "clustering"
Hi Lead Architect,
Oh? You want me to explain how database clustering works? I guess you're just testing me because I'm new and junior.
Oh, and also explain how load balancing works? And what a bastion host is?
What's the architectural intent of this project? Let's have a look at the documentation and diagrams you have been creating of your designs.
You don't have any? That's okay, you've only been leading the architect team on this project for a year now.
Why don't you just keeping asking the most junior dev on the team about how the fuck you are supposed to do your job. As if I know how to do your job when I have zero training and am just expected to know everything.
Oh, its 3pm and you're heading to the pub. That's cool, I'll just guess what I need to build.2
Some people are really getting high on this Agile shit. Probably because they learned some new bullshit bingo phrases - and it suits them: lots of vapory talk and expensive meetings and others will have to do the work anyway, while they can circlejerk on how to have shorter iterations to improve the time to market, increase the business value, inspect and adapt to faster deliver a minimal viable product - yeah, do the agile transformation, update to the digital age, you noobs. Throwing around some catchy phrases will let you compete with Google? Maybe need some blockchain or machine learning?
While you are clustering your post its, the coders who keep the ship afloat, sit in their legacy code base that's so bitrot they are mainly doing bugfix releases without a single feature for three fucking years. Consider this.5
I am a web developer and assigned in a project as Infrastructure Engineer AND Penetration Tester because no one is available. I survived that hellish experience, i learned clustering and other advance stuff on my own, studying even late at night, no training..just youtube videos. PM (who is currently has little to no involvement in this stage) has very little appreciation in what im doing(research, server estimates, diagramming, documentation, planning)2
Clustering high dimensional data - SOS!
I have to cluster documents. I computed the tf*idf, but I have ~45K docs with 28K words.
I did minibatch kmeans which works alright, but everything else runs forever. I need to compare 2 clustering algorithm, so I need at least another one that works. In this life time.
Graphs and clustering changed my perception of the world. I caught myself thinking of random shit in clusters and edges, and now I realized I have been doing this for a while.
•I caught myself thinking the following a few minutes ago:
While taking a shower, for unexplained reasons, I started visualizing different groups of friend’s throughout my life and how their connection/relationship to its group (centroid AKA leader) had a significant influence on how each individual behaved. After doing this for five groups - I proceeded to label them in classes of behaviors and noticed why each friend behaved a certain way. Wtf right? 😂4
RavenDB was by far the worst document storage "solution" I have ever had the displeasure of working with.
- Loading data crashed the service.
- Queries crashed the service.
- Monitoring applications crashed the service.
- It didn't support clustering or HA of any kind.
- Sometimes it just worked for no good reason.
- Often it broke for completely random reasons.11
If you ever have to deal with distributed computing in Java, jgroups.org and Atomix is your best friends. Use them.
Unless you hate your coworkers, then just invent a custom gossip protocol...
A very long rant.. but I'm looking to share some experiences, maybe a different perspective.. huge changes at the company.
So my company is starting our microservices journey (we have a 359 retail websites at this moment)
First question was: What to build first?
The first thing we had to do was to decide what we wanted to build as our first microservice. We went looking for a microservice that can be used read only, consumers could easily implement without overhauling production software and is isolated from other processes.
We’ve ended up with building a catalog service as our first microservice. That catalog service provides consumers of the microservice information of our catalog and its most essential information about items in the catalog.
By starting with building the catalog service the team could focus on building the microservice without any time pressure. The initial functionalities of the catalog service were being created to replace existing functionality which were working fine.
Because we choose such an isolated functionality we were able to introduce the new catalog service into production step by step. Instead of replacing the search functionality of the webshops using a big-bang approach, we choose A/B split testing to measure our changes and gradually increase the load of the microservice.
Next step: Choosing a datastore
The search engine that was in production when we started this project was making user of Solr. Due to the use of Lucene it was performing very well as a search engine, but from engineering perspective it lacked some functionalities. It came short if you wanted to run it in a cluster environment, configuring it was hard and not user friendly and last but not least, development of Solr seemed to be grinded to a halt.
Elasticsearch started entering the scene as a competitor for Solr and brought interesting features. Still using Lucene, which we were happy with, it was build with clustering in mind and being provided out of the box. Managing Elasticsearch was easy since there are REST APIs for configuration and as a fallback there are YAML configurations available.
We decided to use Elasticsearch since it provides us the strengths and capabilities of Lucene with the added joy of easy configuration, clustering and a lively community driving the project.
Even bigger challenge? Which programming language will we use
What we’ve noticed during researching various languages is that almost all actions done by the catalog service will boil down to the following paradigm:
- Execute a HTTP call to fetch some JSON
- Transform JSON to a desired output
- Respond with the transformed JSON
Actions that easily can be done in a parallel and asynchronous manner and mainly consists out of transforming JSON from the source to a desired output. The programming language used for the catalog service should hold strong qualifications for those kind of actions.
Another thing to notice is that some functionalities that will be built using the catalog service will result into a high level of concurrent requests. For example the type-ahead functionality will trigger several requests to the catalog service per usage of a user.
To us, PHP and .NET at that time weren’t sufficient enough to us for building the catalog service based on the requirements we’ve set. Eventually we’ve decided to use Node.js which is better suited for the things we are looking for as described earlier. Node.js provides a non-blocking I/O model and being event driven helps us developing a high performance microservice.
The beauty of microservices and the isolation it provides, is that you can choose the best tool for that particular microservice. Not all microservices will be developed using Node.js and Elasticsearch. All kinds of combinations might arise and this is what makes the microservices architecture so flexible.
Even when Node.js or Elasticsearch turns out to be a bad choice for the catalog service it is relatively easy to switch that choice for magic ‘X’ or component ‘Z’. By focussing on creating a solid API the components that are driving that API don’t matter that much. It should do what you ask of it and when it is lacking you just replace it.
Many more headaches to come later this year ;)3
App for hitchhiking: map for places(with clustering) is almost ready. Other features like offline work, routes and more are on the way!4
Whats the fucking purpose of our companys dev test and prod env. Dev always only has a single instance. Sometimes clustered services run as cluster on test. Producing headaches because the clustering behaviour couldnt be seen on a single instance and Prod lacks all the nice deployment tools off dev/test. Fuck thinking you could dev then test and prod without any major reconfiguration and headaches. And all because the Storage costs is RETARDEDLY expensive because the backup EVERYTHING with ridiculess overkill. That results in headaches when requesting new servers. Took an old Workstation from the shelves and made it my vm slave so at least i could reliably deploy to test.. Fuck this process
Hi guys, for school I need to use OPTICS (clustering algorithm) on at least two data sets for a project. For the first one, I'm thinking of doing some classic document clustering. But for the second one, I'd like to do something a bit funnier. I'm thinking of clustering faces. The goal would be to cluster faces that are similar, and the dataset would have many images of the same person to see if the clustering works. I think I must reduce the size of the image first.
Do you think it's feasible and how would you prepare the data, would you feed directly the matrix representation of the image?
"[Elasticache isn't a managed service because] You are still choosing instance size, setting up replication and clustering more or less without their help"
From their website:
"Amazon offers a fully managed Redis service, Amazon ElastiCache for Redis"
Just because it's configurable doesn't mean it's not managed.
working with Mapbox and so far everything works except displaying custom svg pointers for single points while clustering the multiples.
the documentation is only semi useful.
and my work-at-home coworker keeps meowing at me like that guy who wants to talk about TV shows whenever I'm working
I want a nap!
What do you think would be the effect of giving out awards for the best open source code on GitHub or whatever?
My theory is that awards would actually make people to stop working on less significant repositories (that clearly cannot win the awards) and focus on the major repositories, the most starred and forked ones so as to get a share of the prizes. Maybe there would be times when commits(which are way better than the current code) are not merged onto the main branch coz doing so would introduce another coder in sharing the prize. The clustering of everyone's efforts on the major repositories would leave the less significant but useful repositories neglected. Can't say the number of times I have copied code from these repositories. I think awards would be disruptive to the open-source Ecosystem. Am high and am out ppl. Go savage the comments. Wait do such awards exists...haha.2