Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "database links"
-
Oh, man, I just realized I haven't ranted one of my best stories on here!
So, here goes!
A few years back the company I work for was contacted by an older client regarding a new project.
The guy was now pitching to build the website for the Parliament of another country (not gonna name it, NDAs and stuff), and was planning on outsourcing the development, as he had no team and he was only aiming on taking care of the client service/project management side of the project.
Out of principle (and also to preserve our mental integrity), we have purposely avoided working with government bodies of any kind, in any country, but he was a friend of our CEO and pleaded until we singed on board.
Now, the project itself was way bigger than we expected, as the wanted more of an internal CRM, centralized document archive, event management, internal planning, multiple interfaced, role based access restricted monster of an administration interface, complete with regular user website, also packed with all kind of features, dashboards and so on.
Long story short, a lot bigger than what we were expecting based on the initial brief.
The development period was hell. New features were coming in on a weekly basis. Already implemented functionality was constantly being changed or redefined. No requests we ever made about clarifications and/or materials or information were ever answered on time.
They also somehow bullied the guy that brought us the project into also including the data migration from the old website into the new one we were building and we somehow ended up having to extract meaningful, formatted, sanitized content parsing static HTML files and connecting them to download-able files (almost every page in the old website had files available to download) we needed to also include in a sane way.
Now, don't think the files were simple URL paths we can trace to a folder/file path, oh no!!! The links were some form of hash combination that had to be exploded and tested against some king of database relationship tables that only had hashed indexes relating to other tables, that also only had hashed indexes relating to some other tables that kept a database of the website pages HTML file naming. So what we had to do is identify the files based on a combination of hashed indexes and re-hashed HTML file names that in the end would give us a filename for a real file that we had to then search for inside a list of over 20 folders not related to one another.
So we did this. Created a script that processed the hell out of over 10000 HTML files, database entries and files and re-indexed and re-named all this shit into a meaningful database of sane data and well organized files.
So, with this we were nearing the finish line for the project, which by now exceeded the estimated time by over to times.
We test everything, retest it all again for good measure, pack everything up for deployment, simulate on a staging environment, give the final client access to the staging version, get them to accept that all requirements are met, finish writing the documentation for the codebase, write detailed deployment procedure, include some automation and testing tools also for good measure, recommend production setup, hardware specs, software versions, server side optimization like caching, load balancing and all that we could think would ever be useful, all with more documentation and instructions.
As the project was built on PHP/MySQL (as requested), we recommended a Linux environment for production. Oh, I forgot to tell you that over the development period they kept asking us to also include steps for Windows procedures along with our regular documentation. Was a bit strange, but we added it in there just so we can finish and close the damn project.
So, we send them all the above and go get drunk as fuck in celebration of getting rid of them once and for all...
Next day: hung over, I get to the office, open my laptop and see on new email. I only had the one new mail, so I open it to see what it's about.
Lo and behold! The fuckers over in the other country that called themselves "IT guys", and were the ones making all the changes and additions to our requirements, were not capable enough to follow step by step instructions in order to deploy the project on their servers!!!
[Continues in the comments]26 -
Hello devRanters! A new update on the privacy website as I've had time to work on that last weekend.
For the picture: everything except for the images (although the URI's are in the database) is coming directly from the DB!
Also got a thingy working which can show one app on a single page including it's pro's/cons etc that you see on the main page but also (still have to write that part though and no screenshots yet as I've only done the backend!) sources (links to proof etc), a description and a guide on how to use that app/service!
I'm finally getting somewhere :)29 -
The gift that keeps on giving... the Custom CMS Of Doom™
I've finally seen enough evidence why PHP has such a bad reputation to the point where even recruiters recommended me to remove my years of PHP experience from the CV.
The completely custom CMS written by company <redacted>'s CEO and his slaves features the following:
- Open for SQL injection attacks
- Remote shell command execution through URL query params
- Page-specific strings in most core PHP files
- Constructors containing hundreds of lines of code (mostly used to initialize the hundreds of properties
- Class methods containing more than 1000 lines of code
- Completely free of namespaces or package managers (uber elite programmers use only the root namespace)
- Random includes in any place imaginable
- Methods containing 1 line: the include of the file which contains the method body
- SQL queries in literally every source file
- The entrypoint script is in the webroot folder where all the code resides
- Access to sensitive folders is "restricted" by robots.txt 🤣🤣🤣🤣
- The CMS has its own crawler which runs by CRONjob and requests ALL HTML links (yes, full content, including videos!) to fill a database of keywords (I found out because the server traffic was >500 GB/month for this small website)
- Hundreds of config settings are literally defined by "define(...)"
- LESS is transpiled into CSS by PHP on requests
- .......
I could go on, but yes, I've seen it all now.12 -
CEO - So... We'll have a new side project, it's a small thing, I spoke with a guy, he needs just a small thing...
Me - Ok....... So, what do you need me for?
CEO - Not much, this guy started a project but wants your help on a small part, and you already did something similar for us, so it should be fast, just copy and paste and change a little bit. You probably know better than me.
Me - *Sigh* ... So `friend`, what do you need from me?
Friend - So, I made a crawler that is storing some information on a local file, I just need to add multiprocessing and multithreading, a producer-consumer system with a queue so I can automatically add new links every now and then, a failback system, so when a process doesn't finish, it should be re-queued and crawled later, store all the information on a database cluster (that is not set up), [......]
Me - And when is this supposed to be up and running?
Friend - Your CEO told me you could do this by the weekend. Can we finish this by Friday?
Me - *facepalm* FML13 -
Rant, but also want to know, how the fuck do you accidentally drop a database link. In production. On the most important database. Housing our biggest applications.8
-
So we ordered a piece of software from external software house becouse I was low on time and we needed it asap.
So. Long story short, their software was bugged as hell, they deny all the bugs and they have their BDD that they done and anything we say about it like "feature XYZ is broken on firefox" they will deny it "becouse it wasn't on BDD" or "let's get on call" (in which +- 6-7 people participate from their side and we of course have to pay them for this...)
So they fixed like 20% of bugs (mostly trivials/minors) Application is fairly small scope. You have integration with like 3 endpoints on arbitary API, user registration/login, few things to do in database (mainly math running from cron).
They done it in ASP so I don't know the language and enviroment so can't just fix it myself.
2 days ago (monday) they annoyed me to point where I just started to break things. For starters I found that every numeric input is vunrable to integer overflow (which is blocker). I figured most of fields are purefect opportunity to XSS (but I didn't bother to do JS... anything but not JS...). I figured I can embed into my name/surname/phone (none validated) anything in HTML...
So for now we have around 25 bugs, around 15 of them are blockers.
They figured it's somehow our fault that it's bugged and decided to do demo with us to show off how perfectly it works. I'm happy to break their demos. I figured I will register bunch users that have name - image with fixed/absolute position top:0;left:0 width/height 100% - this will effectively brick admin panel
Also I figured I can do some addotional sounds in background becouse why not. And I just dont know what to put in. It links to my server for now so I can freely change content of bricked admin panel.
I have curl's ready to execute in case they reset database.
I can put in GIFs or heck, even videos, dosen't really matter. Framework escapes some things for them so at least that. But audio/image/video works.
Now I have 2 questions:
- what image + audio combo will work the best (of course we need to keep it civil). Im thinking finding some meme with bugs or maybe nuclear logo image with some siren sound
- am I evil person?
Edit:
I havent stated this clearly:
"There is no BDD that describes that if user inserts malicious input server should deny it" - that's almost literally what we get from them....11 -
Woohoo!!! I made it to 1000++s :) Now I feel less newbie-like around here :)
So... I don't want to shit-post, so in gratitude to all you guys for this awesome community you've built, specially @trogus and @dfox, I'll post here a list of my ideas/projects for the future, so you guys can have something to talk about or at least laugh at.
Here we go!
Current Project: Ensayador.
It's a webapp that intends to ease and help students write essays. I'm making it with history students in mind, but it should also help in other discipline's essay production. It will store the thesis, arguments, keywords and bibliography so students can create a guideline before the moment of writting. It will also let students catalogue their reads with the same fields they'd use for an essay: that is thesis, arguments, keywords and bibliography, for their further use in other essays. The bibliography field will consist on foreign keys to reads catalogued. The idea is to build upon the models natural/logical relations.
Apps: All the apps that will come next could be integrated in just one big app that I would call "ChatPo" ("Po" is a contextual word we use in my country when we end sentences, I think it derived from "Pues"). But I guess it's better to think about them as different apps, just so I don't find myself lost in a neverending side-project.
A subchat(similar to a subreddit)-based chat app:
An app where people can join/create sub-chats where they can talk about things they are interested in. In my country, this is normally done by facebook groups making a whatsapp group and posting the link in the group, but I think that an integrated app would let people find/create/join groups more easily. I'm not sure if this should work with nicknames or real names and phone numbers, but let's save that for the future.
A slack clone:
Yes, you read it right. I want to make a slack clone. You see, in my country, enterprise communications are shitty as hell: everything consists in emails and informal whatsapp groups. Slack solves all these problems, but nobody even knows what it is over here. I think a more localized solution would be perfect to fill this void, and it would be cool to make it myself (with a team of friends of course), and hopefully profit out of it.
A labour chat-app marketplace:
This is a big hybrid I'd like to make based on the premise of contracting services on a reliable manner and paying through the app. "Are you in need of a plumber, but don't know where to find a reliable one? Maybe you want a new look on your wall, but don't want to paint it yourself? Don't worry, we got you covered. In <Insert app name> you can find a professional perfect to suit your needs. Payment? It's just a tap away!". I guess you get the idea. I think wechat made something like this, I wonder how it worked out.
* Why so many chat apps? Well... I want to learn Erlang, it is something close to mythical to me, and it's perfect for the backend of a comms app. So I want to learn it and put it in practice in any of these ideas.*
Videogames:
Flat-land arena: A top down arena game based on the book "flat land". Different symmetrical shapes will fight on a 2d plane of existence, having different rotating and moving speeds, and attack mechanics. For example, the triangle could have a "lance" on the front, making it agressive but leaving the rest defenseless. The field of view will be small, but there'll be a 2d POV all around the screen, which will consist on a line that fills with the colors of surrounding objects, scaling from dark colors to lighter colors to give a sense of distance.
This read could help understand the concept better:
http://eldritchpress.org/eaa/...
A 2D darksouls-like class based adventure: I've thought very little about this, but it's a project I'm considering to build with my brothers. I hope we can make it.
Imposible/distant future projects:
History-reading AI: History is best teached when you start from a linguistic approach. That is, you first teach both the disciplinar vocabulary and the propper keywords, and from that you build on causality's logic. It would be cool to make an AI recognize keywords and disciplinary vocabulary to make sense of historical texts and maybe reformat them into another text/platform/database. (this is very close to the next idea)
Extensive Historical DB: A database containing the most historical phenomena posible, which is crazy, I know. It would be a neverending iterative software in which, through historical documents, it would store historical process, events, dates, figures, etc. All this would then be presented in a webapp in which you could query historical data and it would return it in a wikipedia like manner, but much more concize and prioritized, with links to documents about the data requested. This could be automated to an extent by History-reading AI.
I'm out of characters, but this was fun. Plus, I don't want this to be any more cringy than it already is.12 -
Client: can you filter boats by location?
Me: Let me see... As you know, there are three remote systems that feed data into your database. I'd have to make a connection between the location records. But I can't rely on coordinates, name, ID or anything else. You'd have to manually create those links for me by remote systems records IDs. Telling me that record XY from system A is identical to record YX from system B, etc...
Client: How many records are we talking about?
Me: 504.
Three days later...
Client: Got it, is that enough for you in excel?
Me: Let me see... Very nice work, I can work with that.
Client: I almost died on it!
An hour later...
Me: Got it, test it and let's run it on the production version.
Client: It works beautifully.
A minute later...
Can we filter the ships by ports?
Me: Let me see... Yes, it's theoretically possible, but it's the same situation as with places...
Client: How many records are we talking about?
Me: 12,647.
Skype relayed to me the sound of something heavy falling, something grunting. Something dying.3 -
So I started getting email notifications telling me about transactions made using my credit card. But I DON'T have a credit card in the first place.
Instead of trying to call customer care and pressing an endless array of buttons, I drive to the bank. I tell them the situation and they check every database they have but they couldn't find any trace of a card connected to my account. Turns out their database somehow had cross-links in their database.
How does the one of the biggest banks in the country possibly have such an issue. Worst part is that it's been a day and they still haven't fixed it -_-7 -
Ok, so I basically spent my weekend trying to work out why the fuck my python docker container would not connect to my mariadb docker container. Tried fucking everything, bridged network, host network, links (even though theyre deprecated), you name it. It would NOT WORK!
In my despair I finally turned to StackOverflow. There I was told 5min after posting the question that the reason was probably that mysql is a quite heavy service, which takes a bit to start up.
I thought to myself "Oh, get the fuck outta here, that can't be it, shit's way too easy to work!"
I tried it nevertheless by adding a 10sec delay before querying the database AND THE MOTHERFUCKING PIECE OF SHIT ACTUALLY WORKS!! So, I essentially just lost a weekend because I was too impatient... I think I'm gonna punch some trees now.4 -
Well those are the FAQs... But where the Fuck are the answers?
I guess it's just oracle being oracle
And yes I have checked on mobile and desktop with 4 different browsers.
And no those aren't links
https://oracle.com/database/...7 -
I really think there should be a subject in every CS course to teach us how to handle/work-under Grade-A assholes and dumbfucks. Not that it would help, but atleast warn us on what we are getting into.
In my opinion, development is not *that* hard or frustrating but is made so by these shitty people. But again, what do I know.
I was scolded by my boss for using for-loop to iterate through an array recently. Apparently for-loop is not used in real world projects and this iteration should be done "in-memory". My colleagues and I are still trying to understand and process that.
I was asked to add fitbit integration to a project within 2 hours just because I had "already done it a week ago" in *another* project. Luckily, it was then given to a "senior" developer who took 4 days for it and essentially copy-pasted my work without much changes, ofcourse it stopped working every now and then.
I am given unreal deadlines on my tasks, on technologies I haven't worked on before, and then expected to churn out production ready code with no bugs in them.
My boss literally just sends me the links of 1st three google results on the problems I encounter and report, after humiliating me ofcourse. Yes, I did google it and yes I went through all I could find from Google forums to GitHub issues. When the library/plugin author himself says that this feature is not yet available, don't expect me to develop it in 2 hours you dumbfuck.
And for the love of God, please stop changing the data model every single day and justify it with agile development. Think before making any changes to it. Ever heard of Join queries? Foreign keys? Or any other basic database concepts.
We reached a point where each branch in the repo had different data model. Not kidding. And we were a team of just 4 developers. Atleast inform us when you change models after discussing it with your shit for knowledge "senior" developer, so we don't have to redo it all over again. The channels on slack are not for sharing random articles only.
I am just waiting to complete my year here.
I should have known what I got myself into the day he asked me to remove the comments I had added to explain what my code does. Why you ask? Because "we don't write comments". -
When your co-worker uses needless terminology. It’s your day off and you’re texting from bed.
cw: Do you have access to the email client?
me: You mean the work email? Yes.
cw: Did you set up access to the database or an FTP protocol for userX?
me: You mean an admin account? Yes.
cw: Were we planning on adding more calls to action on projectX?
me: You mean site links? Yes.2 -
Imagine being a software engineer:
You invent a new CMS called "WordPress"
and you decide to store all internal links as complete URLs in the fucking database.
PEAK BRAIN USAGE22 -
Dev Diary Entry #56
Dear diary, the part of the website that allows users to post their own articles - based on an robust rights system - through a rich text editor, is done! It has a revision system and everything. Now to work on a secure way for them to upload images and use these in their articles, as I don't allow links to external images on the site.
Dev Diary Entry #57
Dear diary, today I finally finished the image uploading feature for my website, and I have secured it as well as I can.
First, I check filesize and filetype client-side (for user convenience), then I check the same things serverside, and only allow images in certain formats to be uploaded.
Next, I completely disregard the original filename (and extension) of the image and generate UUIDs for them instead, and use fileinfo/mimetype to determine extension. I then recreate the image serverside, either in original dimensions or downsized if too large, and store the new image (and its thumbnail) in a non-shared, private folder outside the webpage root, inaccessible to other users, and add an image entry in my database that contains the file path, user who uploaded it, all that jazz.
I then serve the image to the users through a server-side script instead of allowing them direct access to the image. Great success. What could possibly go horribly wrong?
Dev Diary Entry #58
Dear diary, I am contemplating scrapping the idea of allowing users to upload images, text, comments or any other contents to the website, since I do not have the capacity to implement the copyright-filter that will probably soon become a requirement in the EU... :(
Wat to do, wat to do...1 -
I've been working on the ecommerce website from hell for over a year now. I should have heard the alarm bells when the studio who were running the project took a month to pay my deposit but still expected me to start working, but I explained that I wouldn't start without some form of security and they were cool with it, so I carried on.
It started off as a simple build with simple products, no product variations etc and a few links on the designs which appeared to lead to external links, and checkout and cart pages were nowhere to be seen. It wasn't a big money job so I just build them in as plain and straightforward as I could, in line with how the rest of the site looked. They then changed their mind about how they wanted these to look, and added loads of functionality to the site throughout the build, so by the end of the line, the scope of work had completely changed. I also had loads of disagreements in terms of design and useability, as their designs straight-up weren't going to function otherwise, plus every round of changes meant that I had to prolong the job further and fit it around work for other clients.
Fastforward a few more months and I get sent a really angry email with some of the client's complaints, including one that raised an issue with the user journey, and the finger of blame was pointed at me. The user journey had been a part of the designs from the start, and this was never raised as an issue for A WHOLE YEAR. They then said that it had to go live on Monday (three days after they sent email with these huge new structural changes). I told them I could no longer work on the project but was happy to waive the rest of my fee (3/4 of the total fee, when I had essentially completed the site, minus 2 minor bugs), so they could find another developer in the limited time they had. At first they refused to hire another developer, claiming that it would be too expensive, which made no sense, as for a few minor fixes and out of scope additions he could get paid a wage that would have otherwise paid for the majority of the work I had done on the site. I stood my ground and finally they found someone, so I sent over all of the files and database to their new developer and asked him to give me a heads up when I could remove the staging site from my server. The next day, I received an email from the studio asking me to fix some bugs the developer was requesting I fix so he could carry on with the site. They were basically asking me to work more, for free, to enable him to walk off with the majority of the money and do less work. They also forwarded a suuuuuper shitty, condescending email from him, listing all the things he thought was wrong with the site (he even listed 'no favicon' although they'd never supplied a graphic for this). He also wrote a paragraph at the bottom EXPLAINING MY JOB TO ME and telling me:
I get the feeling you like to write Javascript, while being one of the easiest languages to learn, it can also be one of the hardest to master. While I applaud you for writing Vanilla JS, it looks like you have a general problem with structuring your application.
Not sure if I'm being oversensitive here but it felt so patronising, and i couldn't even go for an angry walk to get it out my system because of social distancing lol.
Let a girl quarantine in peace!!!!!!2 -
Anyone else here who needs to deal with GDPR on the software level? I'll go nuts until we're compliant in every aspect.
I've been developing a consent library for the last few days. It even automatically links expressions of explicit consent to current screenshots of the relevant forms (because you need to do that too), and past records are immutable. Well, unless the whole database gets fucked somehow, then it's not.3 -
Without a doubt it has to be the internal company search engine/file finding tool @thewamz and I wrote.
The company has a wide UNC network with files scattered all over the place and they need a way to keep track of where the files get moved to (they can and do get moved). The original tool was written in Java/Tomcat and didn't use any frameworks or utilities beyond custom written ones, no orms, and the SQL was just raw strings. The program didn't take into account that files might be moved or deleted so it never removed anything from the database, it just kept adding files and never removing them.
It however never stores files itself, just links to files elsewhere on the UNC network.
It took six months to get it into what might be a stable beta or release candidate state. The user interface is good, very simple and intuitive, the whole thing was rewritten in python/django, there were issues with utf 8 (and mysql not fully supporting utf 8 in its own utf 8 mode), we added a regex search mode (which was sorely lacking), the search used to take up to fifteen minutes however we sped it up to less than a minute (worst case when a user simply puts "^$" as the regex search). It has a multi threaded design which does some checks to ensure it doesn't spawn too many threads and get stuck in constant Gil switching. Still some bugs to fix, like moving the processing of results returned by the server in a web worker so that the content widget doesn't lock up processing millions of search results and moving the back end to use asynchronous python might gain a performance boost. But on the whole I think the system is ready to replace the older system that all the users are frustrated with and constantly complain about.
However the annoying bit is... How to actually get the new system online, while I am responsible for the development of tools and their maintenance, I am not responsible for their initial deployment and that means I have no idea when (or even if) my new tool will even ever be released :/ -
So recently i got a message from aa person asking how to (these are exact words) ,
:break into insta's database using Sqlmap"
I then proceeded to tell them to "f*ck of ya c*nt ".
Afterwords it inspired me to write this rant
annoying classmates:" hahaha GuYS bEtER wAtcH OuT he's GonnaA hack Us"
me: " yea I can program I also do some ethical hacking and cybersecurity "
annoying classmates: "hahaH Bro your a Hacker OhHHhHHOOO BrO CaN yoU hACk inSta FoR mE I NEEd MoRe FolloWeRs "
me:" tf no one that's illegal and two it's waste of my time "
annoying classmates: "BrOooo CaN yoU gEt Me SoMe HacKs fOr CsGo"
me: "can you just please f*ck off , i'm not hacking for you everything you've asked me is extremely unethical and a huge waste of time, Also if you suck so bad at a game you need to cheat I recommend just stopping "
annoying classmates: "DUdE whAt ToolS dO i HVAE to DownLOad To Be A haCkEr"
me: *trying hard not to murder them* " I told you to f*ck off"
being a hackers isn't downloading tools it isn't typing at 90wpm into a terminal with green font its not about games or fame or anything its about coming up with creative solutions to problems , thinking outside the box its about individuality and breaking from the heard , looking at things from a different viewpoint,
it's about endlessly seeking knowledge.
It's about freedom though creation that's what being a hacker originally was. But because of big media and movie company's (and script kiddies) people now confuse hacker with cracker and think of us as jobless fat kids sitting in a dark room in there parents house breaking into bank accounts and buying drugs on the dark web (which people see to think there a hacker just because they can open tor browser. they then proceed to use google to look up "fresh onion links 2020") .
My classmates and really my generation has a huge case of smooth brain. They a think we can just look at someone and hack them they also seem to think using a gratify link to get a persons up is hacking and using the inspect element is hacking and that opening a terminal is hacking ! AHHHHHHHHHHHHHHHHHHHHH"
Anyways ima end this here thanks for reading :)5 -
My last rant did not go down well 😂
I need to clarify.
If you have no communication directly to another dev.... And your creation directly impacts theirs .... And you don't tell the middle man ... You are a twat especially if your aware of the launch that's basically right now !
To be clear.... This guy built their site and database everything .
I had to connect part of their new app to the site... A few pages they wanted in the app.
He changed the links ... On purpose I think to screw over the launch of the app to make me look bad
I can not communicate with him... The CEO hates him and won't talk to him either
So what am I to do?
Not be pissed off about a spiteful dickhead?
Of course I'm pissed
To make people understand you don't send out a lethal update that can fuck up the servers without telling no one... That their might be technical issues -
I've been working in web development and design since 2012 and, while I've worked with others in my field along the way, I've not forged any lasting friendships... maybe I'm just a shitty person? I've burned some bridges that's for sure. Anyway, it all boils down to: I have no one to bitch to when I want to stab someone in the abdomen over frustrations at my current job.
Enter devRant...
I'm coming up on the year anniversary at my new position and there is still a lot to take in. I replaced a "web guy" who had been doing it for 20 years. Anyway, his stuff is all a mess, and what's worse is that the problems I'm seeing stretch far beyond my own responsibilities. I'm in a group of "tech" people who have all been here for a decade or more, and they're almost all like the guy I replaced: set in their ways and years out of date.
There is one gentlemen who is managing a database application and each site links to his ASP (not .NET) pages. Each of these pages looks like the website they were linked from. He showed me how he achieves this and it's just insane - he uses a bunch of files (basically just text files) that make up different pieces of the page to recreate the look/feel of the website on this his server - just to serve the information from the database. God forbid the website changes 'cause then all his little files need to change.
When I suggested that I can just query the database myself and display all that information on the actual site instead of doing all these redundant steps, I get "I think we should stick with the way we've been doing it for now."
*stab stab stab*4 -
TLDR;
Side project update.
Made simple nlp library in python and published it’s first version to open source.
Now I can feed it with parsed pdf text.
See rant https://devrant.com/rants/2192388/...
Why ?
Cause during reading book about nltk I couldn’t find simple extendible way to provide support for polish language and I wanted to abstract stemming, word normalization, tokenizer etc. so I can provide ex. different conditions for separate text files and don’t write much code what is an asset when you work solo.
It’s about 12GB of pdf public accessible law data I am trying to handle ( at first ) which is about 35000 files from last 90 years.
So far I automated downloading web pages and pdf documents from them. Extracting data from web pages and saving it to database. Extracting text from pdf files. I have about 5-6 projects to do all of it above maybe at the end I will put it to some workflow manager like Luigi or just run it by cronjob.
First thing for website version 1.0 part is find correlation between all documents inside law text using nlp library by building custom conditions. Then just generate directory structure and html files with links between documents.
Website version 2.0 is already in my mind but it will be creepy to make it and will take at least 1-2 months and I want to publish fast.
I have some pdfs with only images instead of text and tesseract worked quite good with them so maybe I will try to process them when everything go live.
Learned a lot about pdf as now I know that font in pdf is not always providing unicode characters ( stupid form of obfuscation) so when you extract text you need to build glyph vector to text map for every font.
Pdf is full vector representation - just like svg - what is logic if you think a bit and know that some printers are running using postscript.
Let’s hope next update will be about flutter mobile app which started all of shit above. It’s almost ready ( except getting data from api I am trying to do and logo for release version ). It’s last piece of puzzle.3 -
When your team can't change from the oracle database because the dba team will be out jobs and database links will be lost (people havent heard of apis)... :feelsoracleman:
-
Has anyone heard about Bench and Frappe framework?
It is a very good python, open source framework.
It sets up almost everything for you when you want to start a project.
I found this framework, when I was searching for an open source ERP built on python, since Odoo went commercial. -_-
After searching, I found Frappe framework. It is a small community, but the framework has potential in my opinion.
Just wanted to pinpoint this very good framework.
Now every time I want to start a new project, I do not need to to all the set ups, like database, user authentication, user permission, etc. The framework does it for you.
NOTE: I am not and I do not have any connection with the devs and this company. I am just a user of this framework, and just wanted to suggest to you and take a look.
Links: https://frappe.io/ , https://erpnext.com/4