The hoops you have to go through to summarize a document even using LLMs with 100s of billions of parameters is insane.

Ranter

cuddlyogre

1556

Comments

1

Lensflare

21621

14h

If ocr works better than direct text, you could write a script to convert the text into an image with the preferred size.
Feels like treating the symptoms though.
1

cuddlyogre

1556

14h

@Lensflare I've got a python script that spits out the pages as pngs with the options of merging pages. It seems to work best with one page at a time and seems to work better than processing an equivalent amount of text. It seems to be easier on the context window as well.

I have a 32GB 5090. I've tried Gemma 3 27b, Mistral Small 3.2 24B, Qwen VL 30b, and they all seem to like processing images better than the text. I think it's because they scan the image each time instead of filling the context with the text, so it has more room to work with.

I put the entire thing into the Grok API and it does better, but it used up more than 3 million tokens in about 2 hours of testing, so that's not very scalable.

I am really looking forward to some unknown person becoming a trillionaire when they figure out non-proprietary AI processing hardware.
1

cuddlyogre

1556

14h

@Lensflare Reasoning vision models like Magistral read the entire document into the context, so it fills up pretty quickly, so I don't know how well it performs over several pages compared to non-reasoning models.
2

retoor

1098

13h

I understand everything you say.

The older messages you want to keep, make summaries of them. Summaries of summaries :P So just compressing.

But on how much documents are you talking? Consider gpt-4.1-nano or something, that has 1 million tokens context window, i flatout cheap, summarizes well imho.
1

Lensflare

21621

13h

@retoor

> I understand everything you say.

Ah, good. Because I understand nothing. It goes way over my head :)
1

cuddlyogre

1556

13h

@retoor I'm trying to get it working locally so that I don't have to send my or a client's stuff to a proprietary LLM in case they don't trust them or want to fine tune on their own data.

I'm testing with a 15k story I wrote so that I know for sure where it gets things wrong. The chunking strategy I have works decently even with 7b and lower models, with larger models giving more precise summaries. It stays fast and accurate this way, where sending the entire document at once leads to errors and hallucinations.

I've tested the chunking strategy on huge open models like OSS 120B, and one more that was even larger that I can't recall, on runpod, and the results aren't so much better than using a ~30B model that I can justify the cost, so I'm sticking with local.

I'm really hoping that using images can extend the context for general use. For non-reasoning models, this seems to be a promising experiment.
0

jestdotty

6625

13h

AI seems to take out all the interesting parts of a text and just make it really boring reductive stuff

I want reverse AI summaries. just the interesting parts
0

AlgoRythm

50312

5h

We worked with RAGs at work.

After a few months, we stopped working with RAGs at work.

It's a retarded system that cannot be perfected, only a % of accuracy will be achieved.

Semantic search SUCKS for specific information. For example, if you have a bunch of data that says "my phone number is xxx-xxx-xxyz" and then you ask "What is Sandy's phone number" it will say "I have no fucking clue!" because RAGs suck dick.

The best approach is hybrid - have a RAG that searches both a semantic index AND a traditional index. This way, you get both semantic and literal matches.

But it's still just throwing more money at the bullshit and hoping it grows into a flower. For us, it never did, and we got bored of spending money.

Good luck!

Add Comment

rant