Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "data streaming"
-
The solution for this one isn't nearly as amusing as the journey.
I was working for one of the largest retailers in NA as an architect. Said retailer had over a thousand big box stores, IT maintenance budget of $200M/year. The kind of place that just reeks of waste and mismanagement at every level.
They had installed a system to distribute training and instructional videos to every store, as well as recorded daily broadcasts to all store employees as a way of reducing management time spend with employees in the morning. This system had cost a cool 400M USD, not including labor and upgrades for round 1. Round 2 was another 100M to add a storage buffer to each store because they'd failed to account for the fact that their internet connections at the store and the outbound pipe from the DC wasn't capable of running the public facing e-commerce and streaming all the video data to every store in realtime. Typical massive enterprise clusterfuck.
Then security gets involved. Each device at stores had a different address on a private megawan. The stores didn't generally phone home, home phoned them as an access control measure; stores calling the DC was verboten. This presented an obvious problem for the video system because it needed to pull updates.
The brilliant Infosys resources had a bright idea to solve this problem:
- Treat each device IP as an access key for that device (avg 15 per store per store).
- Verify the request ip, then issue a redirect with ANOTHER ip unique to that device that the firewall would ingress only to the video subnet
- Do it all with the F5
A few months later, the networking team comes back and announces that after months of work and 10s of people years they can't implement the solution because iRules have a size limit and they would need more than 60,000 lines or 15,000 rules to implement it. Sad trombones all around.
Then, a wild DBA appears, steps up to the plate and says he can solve the problem with the power of ORACLE! Few months later he comes back with some absolutely batshit solution that stored the individual octets of an IPV4, multiple nested queries to the same table to emulate subnet masking through some temp table spanning voodoo. Time to complete: 2-4 minutes per request. He too eventually gives up the fight, sort of, in that backhanded way DBAs tend to do everything. I wish I would have paid more attention to that abortion because the rationale and its mechanics were just staggeringly rube goldberg and should have been documented for posterity.
So I catch wind of this sitting in a CAB meeting. I hear them talking about how there's "no way to solve this problem, it's too complex, we're going to need a lot more databases to handle this." I tune in and gather all it really needs to do, since the ingress firewall is handling the origin IP checks, is convert the request IP to video ingress IP, 302 and call it a day.
While they're all grandstanding and pontificating, I fire up visual studio and:
- write a method that encodes the incoming request IP into a single uint32
- write an http module that keeps an in-memory dictionary of uint32,string for the request, response, converts the request ip and 302s the call with blackhole support
- convert all the mappings in the spreadsheet attached to the meetings into a csv, dump to disk
- write a wpf application to allow for easily managing the IP database in the short term
- deploy the solution one of our stage boxes
- add a TODO to eventually move this to a database
All this took about 5 minutes. I interrupt their conversation to ask them to retarget their test to the port I exposed on the stage box. Then watch them stare in stunned silence as the crow grows cold.
According to a friend who still works there, that code is still running in production on a single node to this day. And still running on the same static file database.
#TheValueOfEngineers2 -
9 days.
9 fucking days without internet.
9 fucked up days with access to a national intranet with the only accessible things being websites with privacy-respect policy of facebook, with all your unencrypted data streaming under dictator hands.6 -
Alright, so my previous rant got a way better response than I expected! (https://devrant.io/rants/832897)
Hereby the first project that I cannot seem to get started on too badly :/.
DISCLAIMER: I AM NOT PROMOTING PIRACY, I JUST CAN'T FIND A SUITABLE SERVICE WHICH HAS ALL THE MUSIC I WANT. I REGULARLY BUY ALBUMS. before everyone starts to go batshit crazy regarding piracy, this is legal in The Netherlands for personal use. I think that supporting the artists you love is very good and I actually regularly pay for albums and so on but:
- I want all the music from about every artist in my scene. Either on Deezer or on Spotify this is not available and I'm not gonna get them both (they both have about half of the music I want). Their services are awesome but I'm not going to pay for something if I can't listen to all the music I like, hell even some artists (on deezer mostly) only have half their music on there and it's mostly not better on Spotify.
- I'd happily buy all albums because I love supporting the artists I love but buying everything is just way too fucking much."Get a premium music streaming subscription!" - see the first point.
You can either agree or disagree with me but that's not what this rant is about so here we go:
The idea is to create a commandline program (basically only needs to be called by a cron job every day or so) which will check your favourite youtube (sorry, haven't found a suitable non-google youtube replacement yet) channels every day through a cronjob and look for new uploads. If there are, it will download them, convert them to MP3 or whatever music format you'd like and place them in the right folder. Example with a favourite artist of mine:
1. Script checks if there are any new uploads from Gearbox Digital (underground raw hardstyle label).
2. Script detects two new uploads.
3. Script downloads the files (I managed to get that done through the (linux only or also mac?) youtube-dl software) and converts them to mp3 in my case (through FFMPEG maybe?).
4. Script copies them to the music library folder but then the specific sub-folder for Gearbox Digital in this case.
You should be able to put as many channels in there as you want, I've tried this with the official YouTube Data API which worked pretty fine tbh (the data gathering through that API). The ideal case would be to work without API as youtube-dl and youtube-dlg do. This is just too complicated for me :).
So, thoughts?43 -
I'm thinking about doing a live coding stream on twitch this saturday, late afternoon or evening (CET).
I've never done a live stream before.
Do you have any suggestions or interests?
I'm thinking about something like a small RESTful API with Angular4/TypeScript (frontend, single page application) and CraftCMS/PHP (backend) with somebasic theory about HTTP requests / response, redirecting, data transfer and interfaces et cetera...
The duration will be around 2-4 hours, maybe longer if I have enough Mate & Beer.
But it's all just an idea at the moment. 😉
I will create an empty project for the stream on my Github and push to it during streaming, so you can pull it live or later.17 -
I've optimised so many things in my time I can't remember most of them.
Most recently, something had to be the equivalent off `"literal" LIKE column` with a million rows to compare. It would take around a second average each literal to lookup for a service that needs to be high load and low latency. This isn't an easy case to optimise, many people would consider it impossible.
It took my a couple of hours to reverse engineer the data and implement a few hundred line implementation that would look it up in 1ms average with the worst possible case being very rare and not too distant from this.
In another case there was a lookup of arbitrary time spans that most people would not bother to cache because the input parameters are too short lived and variable to make a difference. I replaced the 50000+ line application acting as a middle man between the application and database with 500 lines of code that did the look up faster and was able to implement a reasonable caching strategy. This dropped resource consumption by a minimum of factor of ten at least. Misses were cheaper and it was able to cache most cases. It also involved modifying the client library in C to stop it unnecessarily wrapping primitives in objects to the high level language which was causing it to consume excessive amounts of memory when processing huge data streams.
Another system would download a huge data set for every point of sale constantly, then parse and apply it. It had to reflect changes quickly but would download the whole dataset each time containing hundreds of thousands of rows. I whipped up a system so that a single server (barring redundancy) would download it in a loop, parse it using C which was much faster than the traditional interpreted language, then use a custom data differential format, TCP data streaming protocol, binary serialisation and LZMA compression to pipe it down to points of sale. This protocol also used versioning for catchup and differential combination for additional reduction in size. It went from being 30 seconds to a few minutes behind to using able to keep up to with in a second of changes. It was also using so much bandwidth that it would reach the limit on ADSL connections then get throttled. I looked at the traffic stats after and it dropped from dozens of terabytes a month to around a gigabyte or so a month for several hundred machines. The drop in the graphs you'd think all the machines had been turned off as that's what it looked like. It could now happily run over GPRS or 56K.
I was working on a project with a lot of data and noticed these huge tables and horrible queries. The tables were all the results of queries. Someone wrote terrible SQL then to optimise it ran it in the background with all possible variable values then store the results of joins and aggregates into new tables. On top of those tables they wrote more SQL. I wrote some new queries and query generation that wiped out thousands of lines of code immediately and operated on the original tables taking things down from 30GB and rapidly climbing to a couple GB.
Another time a piece of mathematics had to generate all possible permutations and the existing solution was factorial. I worked out how to optimise it to run n*n which believe it or not made the world of difference. Went from hardly handling anything to handling anything thrown at it. It was nice trying to get people to "freeze the system now".
I build my own frontend systems (admittedly rushed) that do what angular/react/vue aim for but with higher (maximum) performance including an in memory data base to back the UI that had layered event driven indexes and could handle referential integrity (overlay on the database only revealing items with valid integrity) or reordering and reposition events very rapidly using a custom AVL tree. You could layer indexes over it (data inheritance) that could be partial and dynamic.
So many times have I optimised things on automatic just cleaning up code normally. Hundreds, thousands of optimisations. It's what makes my clock tick.4 -
I spent the last three weeks+ (literally THREE full weeks, weekends too) building something I thought was really cool, powerful, and useful. Made a blog post, posted a giant thread on the company Twitter.
Literally one person gave it a like.
I don't know why I give shit anymore, cuz nowadays if it isn't about getting rich quick, cHaTgPt, or some other made up hype, no one cares. Apparently I shouldn't either...
Meanwhile my 16 GIGABYTE RAM MAC, yes 16 GIGABYTE RAM can't even hold power while plugged in, and I'm still clowning around with an ancient iPhone 6 (actually one of my mom's old iphones) that barely stays above 20% battery for more than an hour...
And FINALLY, my FUCKING ISP is for sure screwing me, since I've been doing some hard core data streaming and broadcasting, even though I pay $60+ month for that shit it, keeps dropping out, shit doesn't load.... I mean wtf this isn't 1990 dialup AOL anymore
When I step back I just feel like the worlds biggest loser, maybe the world's biggest 🤡7 -
It's a job description for an entry level Software Developer role at Apple.
Basically they want
- A Front end dev
- A back end dev
- An iOS expert
- A Gimp/photoshop designer
- A streaming data expert
- An Oracle/Teradata expert
- A Kafka expert ( Not shown in the picture)
And on top of that the guy should be able to learn new tech faster.
Do they want a developer or a fucking terminator?5 -
Protip: always account for endianness when using a library that does hardware access, like SDL or OpenGL :/
I spent an hour trying to figure out why the fucking renderer was rendering everything in shades of pink instead of white -_-
(It actually looked kinda pretty, though...)
I'd used the pixel format corresponding to the wrong endianness, so the GPU was getting data in the wrong order.
(For those interested: use SDL_PIXELFORMAT_RGBA32 as the pixel type, not the more "obvious" SDL_PIXELFORMAT_RGBA8888 when making a custom streaming SDL_Texture)5 -
I've made the json protocol. It's a protocol containing only json. No http or anything.
To parse an json object from a stream, you need a function that returns the length of the first object/array of all your received data. The result of that function is to get the right chunk of the json to deserialize.
For such function, json needs to be parsed, so I wrote that function in C to be used with my C server and Python client. I finally implemented a C function into python function that has a real benefit / use case. Else you had to validate but by bit by the python json parser and that's slow while streaming. Some messages are quite big.
Advantage of this protocol is that it's full duplex.
I'm very happy!36 -
Why is chromecast so stupid?
So I'm on vacation, in another country, in a hotel. I took my chromecast and downloaded offline music to be able to use the TV for at least some entertainment.
Wifi is with login, chromecast doesn't support that. And it has isolation so I was prepared that it won't work through the hotel wifi. So I used another phone to create a Hotspot, but with no internet because roaming here is crazy expensive.
I thought that would work, but chromecast simply refuses to work if there is no internet access.
Why does it need internet if I'm streaming locally anyway?
So I temporarily activate data roaming, and hooray it works, so I quickly disconnect because I have no idea what this shit of a device will start sending to Google and how much I'm gonna pay for it. It works for 10 minutes then it crashes and needs internet again.
Most useless piece of crap I ever bought.
Should have brought my RPi instead but it's busy keeping my home alive and well while I'm gone. Should have ordered in bulk.13 -
! rant
Sorry but I'm really, really angry about this.
I'm an undergrad student in the United States at a small state college. My CS department is kinda small but most of the professors are very passionate about not only CS but education and being caring mentors. All except for one.
Dr. John (fake name, of course) did not study in the US. Most professors in my department didn't. But this man is a complete and utter a****le. His first semester teaching was my first semester at the school. I knew more about basic programming than he did. There were more than one occasion where I went "prof, I was taught that x was actually x because x. Is that wrong?" knowing that what I was posing was actually the right answer. Googled to verify first. He said that my old teachings were all wrong and that everything he said was the correct information. I called BS on that, waited until after class to be polite, and showed him that I was actually correct. Denied it.
His accent was also really problematic. I'm not one of those people who feel that a good teacher needs a native accent by any standard (literally only 1 prof in the whole department doesn't), but his English was *awful*. He couldn't lecture for his life and me, a straight A student in high school, was almost bored to sleep on more than one occasion. Several others actually did fall asleep. This... wasn't a good first impression.
It got worse. Much, much worse.
I got away with not having John for another semester before the bees were buzzing again. Operating systems was the second most poorly taught class I've ever been in. Dr John hadn't gotten any better. He'd gotten worse. In my first semester he was still receptive when you asked for help, was polite about explaining things, and was generally a decent guy. This didn't last. In operating systems, his replies to people asking for help became slightly more hostile. He wouldn't answer questions with much useful information and started saying "it's in chapter x of the textbook, go take a look". I mean, sure, I can read the textbook again and many of us did, but the textbook became a default answer to everything. Sometimes it wasn't worth asking. His homework assignments because more and more confusing, irrelavent to the course material, or just downright strange. We weren't allowed to use muxes. Only semaphores? It just didn't make much sense since we didn't need multiple threads in a critical zone at any time. Lastly for that class, the lectures were absolutely useless. I understood the material more if I didn't pay attention at all and taught myself what I needed to know. Usually the class was nothing more than doing other coursework, and I wasn't alone on this. It was the general consensus. I was so happy to be done with prof John.
Until AI was listed as taught by "staff", I rolled the dice, and it came up snake eyes.
AI was the worst course I've ever been in. Our first project was converting old python 2 code to 3 and replicating the solution the professor wanted. I, no matter how much debugging I did, could never get his answer. Thankfully, he had been lazy and just grabbed some code off stack overflow from an old commit, the output and test data from the repo, and said it was an assignment. Me, being the sneaky piece of garbage I am, knew that py2to3 was a thing, and used that for most of the conversion. Then the edits we needed to make came into play for the assignment, but it wasn't all that bad. Just some CSP and backtracking. Until I couldn't replicate the answer at all. I tried over and over and *over*, trying to figure out what I was doing wrong and could find Nothing. Eventually I smartened up, found the source on github, and copy pasted the solution. And... it matched mine? Now I was seriously confused, so I ran the test data on the official solution code from github. Well what do you know? My solution is right.
So now what? Well I went on a scavenger hunt to determine why. Turns out it was a shift in the way streaming happens for some data structures in py2 vs py3, and he never tested the code. He refused to accept my answer, so I made a lovely document proving I was right using the repo. Got a 100. lol.
Lectures were just plain useless. He asked us to solve multivar calculus problems that no one had seen and of course no one did it. He wasted 2 months on MDP. I'd continue but I'm running out of characters.
And now for the kicker. He becomes an a**hole, telling my friends doing research that they are terrible programmers, will never get anywhere doing this, etc. People were *crying* and the guy kept hammering the nail deeper for code that was honestly very good because "his was better". He treats women like delicate objects and its disgusting. YOU MADE MY FRIEND CRY, GAVE HER A BOX OF TISSUES, AND THEN JUST CONTINUED.
Want to know why we have issues with women in CS? People like this a****le. Don't be prof John. Encourage, inspire, and don't suck. I hope he's fired for discrimination.11 -
Had a bad day at work :( They gave me this code for some obscure streaming job and asked me to complete it. Only after 3 days did I realize that the LLD given to me was incorrect as the data model was updated. Another 2 more days, I was able to debug the code and run it successfully— I was able to parse the tables and generate the required frame but not able to stream it back to the output topic as per the LLD. That’s where I needed help but none of my emails/messages were replied to. The main guy who is pretty technical scheduled a code review session with me— I expected that I would run the code and he would spot it something I might’ve missed and why my streaming function isn’t working. Instead, what happened was that he grilled me on each and every line of the code (which had some obscure tables queried) and then got super mad at me saying “Why are we having this code review session if your code is not complete?”. I’m like bruh, you asked for it, and yes, the main parsing logic is done and I’m just having this issue in the last part. And he’s like “Why didn’t you tell me earlier?”. Wtf?! I left at least 5 emails and a dozen messages. He’s like this has to go live on Monday, and I’m like Ok, I’ll work in the weekend. And he’s like “Don’t tell me all these things! You’re not doing me a favor by working on weekends! How am I to ask my colleagues to connect with you separately on Saturday/Sunday? You should have done the on the weekdays itself. What were you doing this whole week?”. Bruh, I was running the code multiple times and debugging it using print statements. All while you were ignoring my attempts to reach out to you. SMH 🤦♂️ I can go on and on about this whole saga.4
-
FUCK YOU VODAFONE!
My internet broke down the weekend, 4 minutes after I left my house on Friday at 9.04am.
Issue resolved after work at 9pm, broke down half an hour later.
Total silence. Providers status page gave 500 response.
THE NEXT DAY: Woke up because Netflix was working at 8.26am again.
Great, I get up. Can't wait to do anything I want again! I can code, game, watch porn, possibilities are endless!
I get stuff to breakfast. Come back and guess what? NO INTERNET!
Got confirmation of existence of problems at noon. Something with streaming, possibly fixed by Monday.
Okay maybe streaming, but shouldn't the other stuff then work? I process.
Used up my mobile data. Tried again, now it works, but...
.. why.. so.. slow? It was worse than 3g. Time to get out lynx to Google.
It's Sunday. I woke up pissed, got myself a Coffee and tried to get some offline work done. I sipped, closed my eyes for a moment and opened chrome.
IT WORKED! omg, fuck yes! I almost cried I was that happy - can you believe it?
All fine an dandy Monday. So surely the streaming thing will also be gone now, cool.
Today's day is Tuesday. Guess why I am writing this.
Fuck these apprentice nigga cunts graaaaa!!! Or their infrastructure will finally break down.
How is it even possible to do any work on a very important node at a time where probably everyone wants to chill out with the web?7 -
Randomly one day, out of the blue:
Echelons: You now have Workspace, and it’s a requirement you use it. Make it successful because we paid and are paying a large dollar amount for it, and our competitors have reported success with it. We want email communications companywide eliminated by 50% within the first 60 days.
Management: Ok, excellent! We want to do XYZ.
Echelons: Nope, can’t do any of that.
Management: Ok, how about a, b, and c?
Echelons: Nope, nope, nope.
Management: Alright, let’s try 1,2, and 3.
Echelons: Nope, not possible.
Management: What can we do then? We need further direction at this point.
Echelons: One group for all departments, posts, and attachments only. PDF, .jpeg, .png files only. Everyone in the company must be registered within seven days and using the platform. Only mobile devices allowed.
Management: We have almost 10,000 employees, and the SSO aspect alone could take weeks and months.
Echelons: Insignificant as Facebook said it should be easy to deploy. Also, every post not created by admin will need to be manually approved and done so within 5-10 minutes after its submission 24/7, 365.
Management: Ok, solved. A little shaky, but it’s working. Can we increase the number of admins and moderators?
Echelons: Only 1700 employees have registered; the app has been up 14 days now? What’s wrong? Where’s the engagement? Effective immediately, all members of management must be creating and starting 4 to 7 posts daily, including weekends.
Management: Our registration process with the SSO client isn’t smooth and clean across all devices. We had to implement training to overcome this. Can we increase the number of admins and moderators? Can we make all members of management either administrators or at least moderator? Can we at least turn on live streaming and video formats?
Echelons: No! 10 admin and mods max. Yes to streaming and video.
Echelons: Progress update, please. Include ROI timeline and impactful usage data. This must to pay for itself in the first six months and continue to pay for itself long term, along with showing XYZ company-wide growth quarterly.
Echelons: Hello?
Echelons: Hello?
Having Workplace shoved down your throat has been an interesting experience. Anyone have any exciting ideas or examples to share on what they have utilized with Workplace and increased employee engagement?7 -
- Finish "Introduction to algorithms"
- Learn some genetic algorithms
- Get my hands dirty on reinforcement learning
- Learn more about data streaming application (My currently app is still using plain stupid REST to transport image). I don't know, maybe Kafka and RabbitMQ.
- Learn to implement some distributed system prototypes to get fitter at this topic. There must be more than REST for communicating between components.
- Implementing a searching module for my app with elastic search.
- Employ redis at sometime for background tasks.
- Get my handy dirty on some operating system concepts (Interprocess Communication, I am looking at you)
- Take a look at Assembly (I dont want to do much with Assembly, maybe just want to implement one or two programs to know how things work)
- Learn a bit of parallel computing with CUDA to know what the hell Tensorflow is doing with my graphic card.
- Maybe finishing my first research paper
- Pass my electrical engineering exam (I suck at EE)1 -
mann... either i am dumb or my team is a bunch of excited monkeys.
for last 6 months my senior and this contract dev (both in Android) have been fussing about adding coroutine flows in our codebase: how our codebase "needs" it and how flows will help our codebase become "better"
when i asked them why, they gave me even more shit about hot flows cold flows, state flows, and how ots the latest "solution" from google.
So today, while going through another existential crises in my free time, i decided to understand what these "flows" are.
and from what i understand, it is mainly for cases in which there os actively changing data and we want to get latest updates without any event or trigger, like those streaming datas , chat messages, location etc.
but we are a freaking insurance app! user presses a button and we make an api call! what is the fucking problem here that isn't being solved by good old livedata and coroutines? There isn't any "live" api in app as far as i know and even if there is the code should be modified for 1 such api.
why fuck the whole codebase for a usecase that isn't applicable for 99% of APIs?
also, if a flow is going to auto trigger and call api, how are we supposed to control it? like say there is a offers api(there isn't) which gives us the latest offer products to show user for 5 seconds then refresh. for this i will simply returrn
flow{
while(true){
emit (offer api results)
delay(5000)
}
}
but this is an infinite polling api! how to stop it when say user pressed a cross button or did some other interaction?
it seems useless as fuck.. i can achieve a more controllable polling using the same while loop in different location or some other solution that won't require me adding this wierd api5 -
I am having an introspective moment as a junior dev.
I am working in my 3rd company now and have spent the avg amount of time i would spent in a company ( 1- 1.5 years)
I find myself in similar problems and trajectories:
1. The companies i worked for were startups of various scales : an edtech platform, an insurance company (branch of an mnc) and a b2b analytics company
2. These people hire developers based on domain knowledge and not innovative thinking , and expect them to build anything that the PMs deem as growth/engagement worthy ( For eg, i am bad at those memory time optimising programming/ ds/algo, but i can make any kind of android screen/component, so me and people like me get hired here)
3. These people hire new PMs based on expertise in revenue generation and again , not on the basis of innovative thinking, coz most of the time these folks make tickets to experiment with buttons and text colors to increase engagement/growth
4. The system goes into chaos mode soon since their are so many cross operating teams and the PMs running around trying to boss every dev , qa and designer to add their changes in the app.
5. meanwhile due to multiple different teams working on different aspects, their is no common data center with up to date info of all flows, products and features. the product soon becomes a Frankenstein monster.
6. Thus these companies require more and more devs and QAs which are cogs in the system then innovative thinkers . the cogs in the system will simply come, dimwittingly add whatever feature is needed and goto home.
7. the cogs in system which also start taking the pain of tracking the changes and learning about the product itself becomes "load bearing cogs" : i.e the devs with so much knowledge of the product that they can be helpful in every aspect of feature lifecycle .
8. such devs find themselves in no need for proving themselves , in no need for doing innovative work and are simply promoted based on their domain knowledge and impact.
My question is simply this : are we as a dev just destined to be load bearing cogs?
we are doing the work which ideally a manager should be doing, ie maintaining confluence docs with end to end technical as well as business logic info of every feature/flow.
So is that the only definition of a Software Engineer in a technical product?
then how come innovations happen in companies like meta Microsoft google open ai etc?
if i have to guess as a far observer, i would say their diversity in different fields helps them mix and match stuff and lead to innovative stuff.
For eg, the android os team in google has helped add many innovative things in google cloud product and vice versa.
same is with azure and windows . windows is now optomissed to run in cloud machines when at one point it was just a horrible memory hogging and slow pc OS
for small companies, 1 ideology/product/domain is their hero ideology/product/domain .
an insurance company tries to experiment with stuff related to insurances,health,vehicles,and the best innovations they come up with is "lets give user a discount in premium if they do 5000 steps a day for an year".
edtech would say "lets do live streaming for children apart from static videos"
but Android team at google said , "since ai team is doing so well, lets include ai in various system apps and support device level models" ~ a much larger innovation as 2 domains combined to make a product
The small companies are not aiming to be an innovative product, they are just aiming to be a monopoly product. and this is kinda sad2 -
I recoded a REST endpoint that transfers large amounts of data from our db using a streaming response so it doesn't crash the server...
Pretty easy... Mostly just needed someone that knew wtf it was or has a bit of curiosity and asks questions... rather than just keep on doing what everyone else is doing...
Who hasn't seen logs updating in near real time in TeamCity, Jenkins... for the last 5yrs+... No one else ever wondered how it's done?
So yes solving a production issue with old technology and being called a genius... I guess is pretty satisfying? -
what are the basics I should know about "data streaming" for working on video streaming companies as a future senior backend Golang developer?3
-
I am trying to extract data from the PubSub subscription and finally, once the data is extracted I want to do some transformation. Currently, it's in bytes format. I have tried multiple ways to extract the data in JSON format using custom schema it fails with an error
TypeError: __main__.MySchema() argument after ** must be a mapping, not str [while running 'Map to MySchema']
**readPubSub.py**
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
import json
import typing
class MySchema(typing.NamedTuple):
user_id:str
event_ts:str
create_ts:str
event_id:str
ifa:str
ifv:str
country:str
chip_balance:str
game:str
user_group:str
user_condition:str
device_type:str
device_model:str
user_name:str
fb_connect:bool
is_active_event:bool
event_payload:str
TOPIC_PATH = "projects/nectar-259905/topics/events"
def run(pubsub_topic):
options = PipelineOptions(
streaming=True
)
runner = 'DirectRunner'
print("I reached before pipeline")
with beam.Pipeline(runner, options=options) as pipeline:
message=(
pipeline
| "Read from Pub/Sub topic" >> beam.io.ReadFromPubSub(subscription='projects/triple-nectar-259905/subscriptions/bq_subscribe')#.with_output_types(bytes)
| 'UTF-8 bytes to string' >> beam.Map(lambda msg: msg.decode('utf-8'))
| 'Map to MySchema' >> beam.Map(lambda msg: MySchema(**msg)).with_output_types(MySchema)
| "Writing to console" >> beam.Map(print))
print("I reached after pipeline")
result = message.run()
result.wait_until_finish()
run(TOPIC_PATH)
If I use it directly below
message=(
pipeline
| "Read from Pub/Sub topic" >> beam.io.ReadFromPubSub(subscription='projects/triple-nectar-259905/subscriptions/bq_subscribe')#.with_output_types(bytes)
| 'UTF-8 bytes to string' >> beam.Map(lambda msg: msg.decode('utf-8'))
| "Writing to console" >> beam.Map(print))
I get output as
{
'user_id': '102105290400258488',
'event_ts': '2021-05-29 20:42:52.283 UTC',
'event_id': 'Game_Request_Declined',
'ifa': '6090a6c7-4422-49b5-8757-ccfdbad',
'ifv': '3fc6eb8b4d0cf096c47e2252f41',
'country': 'US',
'chip_balance': '9140',
'game': 'gru',
'user_group': '[1, 36, 529702]',
'user_condition': '[1, 36]',
'device_type': 'phone',
'device_model': 'TCL 5007Z',
'user_name': 'Minnie',
'fb_connect': True,
'event_payload': '{"competition_type":"normal","game_started_from":"result_flow_rematch","variant":"target"}',
'is_active_event': True
}
{
'user_id': '102105290400258488',
'event_ts': '2021-05-29 20:54:38.297 UTC',
'event_id': 'Decline_Game_Request',
'ifa': '6090a6c7-4422-49b5-8757-ccfdbad',
'ifv': '3fc6eb8b4d0cf096c47e2252f41',
'country': 'US',
'chip_balance': '9905',
'game': 'gru',
'user_group': '[1, 36, 529702]',
'user_condition': '[1, 36]',
'device_type': 'phone',
'device_model': 'TCL 5007Z',
'user_name': 'Minnie',
'fb_connect': True,
'event_payload': '{"competition_type":"normal","game_started_from":"result_flow_rematch","variant":"target"}',
'is_active_event': True
}
Please let me know if I m doing something wrong while parsing the data to JSON. Also, I am looking for examples to do data masking and run some SQL within Apache Beam4