Project that made you learn cool stuff?

Ranter

duckWit

5559

Comments

25

xewl

4055

7y

BLOB
Big Load Of Bullshit
2

scroach

1204

7y

Well what if you want to do a fulltext search?
4

duckWit

5559

7y

@scroach in my case, the text was compressed before being inserted into the db to save space. A full-text search would poorly perform in that regard because it would have to decompress each entry just to be able to read it and to see if it matched hits from a search. If you do that one row at a time, that's called RBAR: "Row By Agonizing Row".

Any design that does RBAR is poor, which is why you don't loop in SQL one record at a time (cursors, I'm looking at you). If you do, you are denying the query optimizer from choosing an execution plan leveraging the full power of SQL and your code will be slow.

Second point, it's best to use a full-text search engine for that. At the very least, a schema design previously mapping out all documents' words to match in an index would be better. Then there's a one-to-many relationship between the documents and all the index words they contain. Searches would be faster that way b/c entire doc contents are not scanned each time, just the index.
2

scroach

1204

7y

Oh lol I didn't understand that right XD of course then there's no good way of searching in db.
So you had 90gig compressed data? How did you reduce that so drastically? Was the compression shit?

I just don't see a generic reason why saving that much data in a database would be bad. If you save them in files you have to tackle them separately in backups and may run into inconsistencies.. but as always it mainly depends on use case and environment
0

duckWit

5559

7y

@scroach the size of the backup files of the database is what dropped dramatically. The backup files no longer have those giant transaction logs reflecting all those huge document insertions.

Also, saving them off to compressed files on disk is much better for us because we can leverage a third-party cloud storage provider who is cheaper than consuming our own server resources. Then, when one of the files is needed, we recall it back down to our server temporarily. It's a much better design.
0

Awlex

17532

7y

@xewl

Related Rants

Add Comment

rant

wk128