Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "ocr data"
-
I realize I've ranted about this before, but...
Fuck APIs.
First the fact that external services can throw back 500 errors or timeouts when their maintainer did a drunk deploy (but you properly handled that using caching, workers, retry handlers, etc, right? RIGHT?)...
Then the fact that they all speak a variety of languages and dialects (Oh fuck why does that endpoint return a JSON object with int keys instead of a simple array... wait the params are separated with pipe characters? And the other endpoint uses SOAP? Fuck I need to write another wrapper class around the client...)
But the worst thing: It makes developers live in this happy imaginary universe where "malicious" is not a word.
"I found this cloud service which checks our code style" — hmm ok, they seem trustworthy. Hope they don't sell our code, but whatever.
"And look at this thing, it automatically makes database backups, just have to connect to it to DigitalOcean" — uhhh wait...
"And I just built this API client which sends these forms to be OCR processed" — Fuck... stop it... there are bank accounts numbers on those forms... Where's that API even located? What company?
* read their privacy policy *
"We can not guarantee the safety of your personal data, use at your own risk [...] we are located in Russia".
I fucking hate these millennial devs who literally fail to get their head out of the cloud.
Somehow they think it's easier to write all these NodeJS handlers and layers around some API, which probably just calls ImageMagick + Tesseract on the other side.
If I wasn't so fucking exhausted, I'd chop of their heads... but they're like hydra, you seal one privacy breach and another is waiting to be merged, these kids just keep spewing their crap into easy packages, they keep deploying shitty heroku apps... ugh.
😖8 -
OCR (The exam board for my course) are fucking thick in the head when it comes to anything computing.
- I get a mark or two for saying open source software is worse than thier propritary counterparts
- ALL open source software forks must also be make open source. They spend so much time going over the legal stuff BUT HAVE NEVER HEARD OF OPEN SOURCE LICENCING!
- One exam paper had a not gate picture with 2 inputs...
- I have to differentiate between portable and handheld! YOU MEAN HANDHELD DEVICES ARE NOT PORTABLE!?!!?!?
- In level 2 education, OCR say 1 MB = 1024 KB - In level 3, they say 1 MB = 1000 KB, and 1 MiB = 1024 KiB, and expect you to differentiate. Why do you expect the wrong answer in level 2!?
- INFORMATION FORMATS AND STYLES ARE COMPLETELY DIFFERENT THINGS! If you look up synonyms for "style", "form" is there, and if you look up synonyms for "format", "style" is there.
- When asked for storage devices, I have to say "smartphone", "tablet", "desktop PC" - I mean yeah they store data but when you ask me for storage devices I will say "hard disk drive", "solid state drive", "SD card", etc. >.>
I could probably go on an on about this...
I sure do love being asked to copy-paste existing HTML/JS/CSS and being asked to just tweak it here and there, and then wait for other people's incompetence in copy-pasting... I sure do love being stuck with this sort of "education" ._.4 -
My company just acquired another company from some losers.
Gotta load their pittance database onto our thing.
Their entire "Technology Department" is one old fart.
One even older fart runs their accounting.
I asked the IT boomer for their accounting data.
He tells me to get the head accountant.
The head accountant says they do not have any historical accounting data.
I threaten to call the (equivalent of the) IRS on them.
They give up, admit that they do have some historical data. But they attempt to pull a "malicious compliance" on me, send me a pallet full of old receipts, on paper.
I do what I have done one hundred times before, I go to the closest community college (equivalent) and ask/bribe a teacher to offer the most trustworthy kids some pretty pennies to scan all those files for me.
A dozen of them barely took a week to do it using their not-so-bad camera phones.
It all for about the same price as a couple of older-but-still-good iPhones.
Then it's on to some simple OCR and data normalization tasks.
This morning I had another meeting with the losers, the first since I told them their "data" had just arrived in the mail (but a couple weeks after that). They log in for the meeting all smug, thinking we would ask for more time to load their data, and it would be my team's fault for any delays.
Then the regional business evaluator logs in and said he reviewed their financials yesterday and we have a lot to talk about.
I will remember their "just got punched in the gut" faces forever :)7 -
Me: why are we paying for OCR when the API offers both json and pdf format for the data?
Manager: because we need to have the data in a PDF format for reporting to this 3rd party
Me: sure, but can we not just request both json and PDF from the vendor (it’s the same data). send the json for the automated workflow (save time, money and get better accuracy) and send the PDF to the 3rd party?
Manager: we made a commercial decision to use PDF, so we will use PDF as the format.
Me: but ...4 -
Just another big rant story full of WTFs and completely true.
The company I work for atm is like the landlord for a big german city. We build houses and flats and rent them to normal people, just that we want to be very cheap and most nearly all our tenants are jobless.
So the company hired a lot of software-dev-companies to manage everything.
The company I want to talk about is "ABI...", a 40-man big software company. ABI sold us different software, e.g. a datawarehouse for our ERP System they "invented" for 300K or the software we talk about today: a document management system. It has workflows, a 100 year-save archive system, a history feature etc.
The software itself, called ELO (you can google it if you want) is a component based software in which every company that is a "partner" can develop things into, like ABI did for our company.
Since 2013 we pay ABI 150€ / hour (most of the time it feels like 300€ / hour, because if you want something done from a dev from ABI you first have to talk to the project manager of him and of course pay him too). They did thousand of hours in all that years for my company.
In 2017 they started to talk about a module in ELO called Invoice-Module. With that you can manage all your paper invoices digital, like scan that piece of paper, then OCR it, then fill formular data, add data and at the end you can send it to the ERP system automatically and we can pay the invoice automatically. "Digitization" is the key word.
After 1.5 years of project planning and a 3 month test phase, we talked to them and decided to go live at 01.01.2019. We are talking about already ~ 200 hours planning and work just from ABI for this (do the math. No. Please dont...).
I joined my actual company in October 2018 and I should "just overview" the project a bit, I mean, hey, they planned it since 1.5 years - how bad can it be, right?
In the first week of 2019 we found 25 bugs and users reporting around 50 feature requests, around 30 of them of such high need that they can't do their daily work with the invoices like they did before without ELO.
In the first three weeks of 2019 we where around 70 bugs deep, 20 of them fixed, with nearly 70 feature requests, 5 done. Around 10 bugs where so high, that the complete system would not work any more if they dont get fixed.
Want examples?
- Delete a Invoice (right click -> delete, no super deep hiding menu), and the server crashed until someone restarts it.
- missing dropdown of tax rate, everything was 19% (in germany 99,9% of all invoices are 19%, 7% or 0%).
But the biggest thing was, that the complete webservice send to ERP wasn't even finished in the code.
So that means we had around 600 invoices to pay with nearly 300.000€ of cash in the first 3 weeks and we couldn't even pay 1 cent - as a urban company!
Shortly after receiving and starting to discussing this high prio request with ABI the project manager of my assigned dev told me he will be gone the next day. He is getting married. And honeymoon. 1 Week. So: Wish him luck, when will his replacement here?
Deep breath.
Deep breath.
There was no replacement. They just had 1 developer. As a 40-people-software-house they had exactly one developer which knows ELO, which they sold to A LOT of companies.
He came back, 1 week gone, we asked for a meeting, they told us "oh, he is now in other ELO projects planned, we can offer you time from him in 4 weeks earliest".
To cut a long story short (it's to late for that, right?) we fought around 3 month with ABI to even rescue this project in any thinkable way. The solution mid February was, that I (software dev) would visit crash courses in ELO to be the second developer ABI didnt had, even without working for ABI....
Now its may and we decided to cut strings with ABI in ELO and switch to a new company who knows ELO. There where around 10 meetings on CEO-level to make this a "good" cut and not a bad cut, because we can't afford to scare them (think about the 300K tool they sold us...).
01.06.2019 we should start with the new company. 2 days before I found out, by accident, that there was a password on the project file on the server for one of the ELO services. I called my boss and my CEO. No one knows anything about it. I found out, that ABI sneaked into this folder, while working on another thing a week ago, and set this password to lock us out. OF OUR OWN FCKING FILE.
Without this password we are not able to fix any bug, develop any feature or even change an image within ELO, regardless, that we paid thausend of hours for that.
When we asked ABI about this, his CEO told us, it is "their property" and they will not remove it.
When I asked my CEO about it, they told me to do nothing, we can't scare them, we need them for the 300K tool.
No punt.
No finish.
Just the project file with a password still there today6 -
Obviously ai and autodocument recognition and data extraction is not usable yet
Excepting when it's a pdf not a scanned document or image
Ocr may be but shift the whole.image or bend it or remove a border from some white out
And then handwritten -
Need help with selecting a proper backend and website frameworks. After trying out a couple identity verification service providers we were dissapointed with their lack of support (takes weeks to do minimal changes).
So now we are having discussions about building in-house id verification system. We already have libraries for ios/android apps (ZOOM lib for face recognition and another lib for data extraction via OCR from document picture). So what we need is a proper backend and then a decent web framework with proper ux/ui design for our web/ios/android apps.
Currently thinking what kind of backend framework should we choose? Backend's main responsibility is for each client registered from website to assign an api key and to create a database/storage where his users would authenticate via clients app and upload a picture and a video.
Also wondering what kind of framework for website apps (main web app, dashboard app where we display pending verifications, and of course verification app) to choose. Should be go for angular?