chatgpt

Ranter

sleek

3390

Comments

3

Hazarth

9134

2y

Honestly, you can run Nous-Hermes-2-SOLAR-10.7B or even better Mixtral-8x7B on a local machine at this point using CPU. You will need a lot of RAM. Not sure how much SOLAR takes up, but Mixtral takes about 24GB or so (depending on the quant type) so your RAM will need to cover it.

Or if you try running it through llama.cpp you can offload part of it to your GPU so it has to fit RAM+GPU memory then

Eitherway, once you get it running, you can get a fully uncensored, fully offline LLM with really good output (both of those rival GPT3.5, with Mixtral getting close to GPT 4, except it's not multi-modal yet afaik) and even on CPU they run at several tokens a second which is perfectly usable (You can get SOLAR running almost instantly on a Mac M2 device)

I'd recommend getting an LLM running locally and see what you can get away with, its fun, open and free
0

NeatNerdPrime

4011

2y

@Hazarth links please! Relevant to interest

Yes I want to be spoonfed. Ty
3

Hazarth

9134

2y

@NeatNerdPrime If you're on Linux Simplest and fastest way to get running is by using Ollama:

https://ollama.ai/

You can then get either solar:

https://ollama.ai/library/solar

or Mixtral 8x7b:

https://ollama.ai/library/mixtral

You might figure out how to get it running on WSL2 on Windows yourself. Or you can try getting GPT4All, it doesn't have all the up-to-date models built-in, but you can always get Solar or Mixtral from HuggingFace (needs account)

Though best performance from my experience comes from using latest llama.cpp:

https://github.com/ggerganov/...

but you will have to compile that for yourself and it's just a CLI interface

Honestly best way to get LLMs running right now is Linux, so if you're on Windows you gonna have to do some extra work to get it running (except GPT4All, that just works, except for getting Solar or Mixtral, those will work, but you have to fetch them from HF first, in gguf format)
1

djsumdog

6868

2y

I've been messing around with GPT4ALL, which nicely packages a UI with automatically downloading models. You can also get additional models from huggingface dot io.
0

NeatNerdPrime

4011

2y

@Hazarth many thanks, the information is yummy!
0

novatomic

27

2y

@Hazarth

Hello bro, i want to ask. Is the model only use resources when we use it?
1

Hazarth

9134

2y

@novatomic depends in the runner. I think both ollama and gpt4all only use resources while running and they then free most of them. But if you're running using llama.cpp that keeps loaded in memory until you stop it.
1

TheBeardedOne

3348

2y

Yea no matter what I do, I can't get it to admit that the reason I'm single is because the Rothschilds paid every woman ever to ignore. Nor will it admit France was a mistake
0

max19931

355

2y

If i would use a LLM is to strip it down to use way less resources and to understand its inner workings!
0

sleek

3390

2y

@Hazarth ran llama2, it REFUSED to comply with my prompts :(

Its not about the prompts themselves i just want a completely open LLM with no restrictions, no firewall, no woke-filter

will try mixtral to see if it complies
0

max19931

355

2y

Most LLMs are just shit tons of matrix multiplication after all.

Neurons links can be represented by therir weight(propebility) as a giant table(matrix).

Related Rants

Add Comment

rant

ai