My System Analysis professor wants to fail me because I refuse to store PDF files in the database in my project. He wa

Ranter

UltimateZero

499

Comments

10

linuxxx

142302

8y

Could just be me but as far as I know although its not best practice, it's widely used I thought :). Also, welcome!
9

magicMirror

10324

8y

Storing pdfs in databases makes sense, as it resolves a few problems: one, reduces space use for a large amount of small files, two, removes issues with limiting factors fdom the os like maximum number of files in the database. Databases have the BLOB datatypes just for this purpose. However, There are downsides...
5

GieltjE

1787

8y

Also solves a lot of issues with multi server setups and backups
20

NeatNerdPrime

4134

8y

Although his prof is right, he/she could have been more explanatory... He/she could explain why instead of just nagging it off with an air of 'i'm better than thou'...
1

UltimateZero

499

8y

I understand that I can store them as BLOBs, but I believe it's a very bad idea in this case. My project is a lame library management system prototype.. The PDFs are digital copies of books and documents. Saved on the server, served to a desktop client via HTTP.. You really think this would be a good case to store the files in the database?
5

linuxxx

142302

8y

@UltimateZero personally think so yeah. Also since letsencrypt is free and if time isn't much of the essence I'll recommend adding ssl!
2

UltimateZero

499

8y

Also yeah I'm only ranting because of the way he said it! Some people teach computer science classes like it's a history class, they don't welcome debates or alternative methods of doing something. That's the part I really hate.
3

UltimateZero

499

8y

@linuxxx thanks for the welcome btw!
How is it a good idea? The only way I imagine it'll go is: Each time a PDF is requested, it'll need to query the db, receive the whole file into memory, write to a temp file (?) and serve it as a normal static file would be served, byte range and all that..
2

linuxxx

142302

8y

@UltimateZero I don't know much details but it's cross platform proof just in case. There are other advantaged but can't remember those right now 😅
3

Naptic

593

8y

personally, I would store files < 1GB as BLOBs. Modern databases can handle them pretty darn good. + it's in the "same place" as the other application data.
4

elgringo

1950

8y

If is a school project follow the advice of the teacher, if is your personal project do what ever you thing is right.
2

Forside

1434

8y

There is this kind of cursor (don't know its name right now) which crawls over the query result instead of caching it all at once. Also you can serve the bytes as a binary file directly to the http client by manually setting the headers, so it doesn't have to be stored as a file before sending it to the user.
2

pmike

392

8y

Despite of who has the reason, the argument that the professor used was not the right one. He should teach, not enforce "hierarchy"...
2

SpectralKH

414

8y

He should give you a reason instead of a mouthful of "I'm better than you"
4

zshulu

375

8y

Sorry to hear about your non-constructive debate with your instructor.
Implementation choices always depend on the application and its expected behavior.
A quick example would be seamless data migration or fragmentation, e.g. for load balancing purposes. In such a case, storing blobs in a database would be rather convenient because it simplifies distribution logic.
I don't think a college project could use such an insight but it might seem that your instructor is desperately trying to "teach you a thing or two". Nonetheless, I advise you to go with the instructor's flow as they may have wrongly expressed a good intention.
1

Double-A

430

8y

You can just tell the Prof that he is (probably) wrong to the face. Ask him like "My idea was this and this, what do you think of it?" or "Couldn't we also do it like this... Or am I missing something?"
3

nmunro

3173

8y

I work for a company that has twenty years of pdf test files, because they built a pdf/postscript/xps interpreter.

I'm involved with trying to design and build a next gen regression test system that can handle the sheer number of files we deal with.

Trust me when I say we have major performance issues with file system access with the pdf files we have, we've done the research, crunched the numbers and we want to put the files in a database!
1

codePatrol

1679

8y

You could use a seperate file on the database for PDFs. Not sure which DB you're using but in SQL server that's pretty easy.

For the most part, "those who can't or don't want to do teach", although I had an amazing professor who became a professor because he had six kids and they were just about to go off to college so he became a professor for the free tuition. That dude was amazingly brilliant. Another became a prof once he retired. I think it all just depends.
1

codePatrol

1679

8y

@nmunro is the company one thats name would suggest it "saves paper"? If so, you guys have a really amazing product. I've seen some of the stuff it can do and was truly impressed.
1

nmunro

3173

8y

@ninjatini I have no idea what company you might be referring to but if you clicked my profile you'd see who I work for, it's no secret...
1

codePatrol

1679

8y

@nmunro ah. I figured no one would offer up that info so never even looked. I was referring to papersave. They do some crazy stuff with ocr and integration into CRM systems. Sounded pretty similar.
1

johnDoe

3201

8y

For the most part I always store files in the DB apposed to the file server. Depending on the data that's all being stored, using a tables foreign/primary key is a lot easier for referencing than the file name/storing the files path in the DB.
1

dell-user

248

8y

Years doesn't mean anything!
2

nmunro

3173

8y

@ninjatini Nope, we build a raster image processor, well... A few for different purposes, but we have thousands of suits that usually contain thousands or even hundreds of thousands of PDF files.

It's not exciting but this thread has given me a potential solution to save even more data... Since a pdf file is a header, binary data and an xref table, our database solution could strip out the header (since all our pdf files are the same 1.4 version, for compatibility) and if the binary data is at the same offset then even the xref table could be stripped out saving only the internal binary image in a binary blob and when the file is downloaded the header, binary blob and xref table could be stitched together and sent as a file.

I mean, I know the header and xref table is small, but it's repeated in every single file. Over about a billion files this would save some disk space!

Add Comment

undefined