6

They keep training bigger language models (GPT et al). All the resear4chers appear to be doing this as a first step, and then running self-learning. The way they do this is train a smaller network, using the bigger network as a teacher. Another way of doing this is dropping some parameters and nodes and testing the performance of the network to see if the smaller version performs roughly the same, on the theory that there are some initialization and configurations that start out, just by happenstance, to be efficient (like finding a "winning lottery ticket").

My question is why aren't they running these two procedures *during* training and validation?

If [x] is a good initialization or larger network and [y] is a smaller network, then
after each training and validation, we run it against a potential [y]. If the result is acceptable and [y] is a good substitute, y becomes x, and we repeat the entire procedure.

The idea is not to look to optimize mere training and validation loss, but to bootstrap a sort of meta-loss that exists across the whole span of training, amortizing the loss function.

Anyone seen this in the wild yet?

Comments
  • 3
    Incidentally I'm surprised someone hasn't added 'tuneable' personality settings, like were demonstrated by TARS in Interstellar.

    If not simple, it should at least be straightforward.

    1. Do paramater-tuning based training on the model, so it can summarize various aspects of an input or generated output, along various numeric ratings, humor, candor, persuasion, wittiness, etc.

    2. when an output is generated in response to an input, run sentiment analysis.

    3. feed the result back in (not like auto-gpt) and rewrite to move the generated output closer to the desired global personality settings of the script.
  • 3
    @Wisecrack Modifying LLMs on the fly is uncommon. Personality traits are usually configured with very expensive and slow manual feedback. This is suspected to be part of the reason why Bing turned out a sociopath; Microsoft decided that they didn't have time for alignment.
  • 1
    @lorentz I'm thinking more along the lines of customizing the existing model with some additional training for sentiment analysis.

    You might have it generate a reply internal that then gets fed in, rated a "9/10 in formality" and then fed back in to the LLM with part of the prompt being "rewrite the following input to be 6/10 formal, with 3/10 rating for humor."

    Think of it as a top-k type setting, only done entirely within prompt instead of within training.

    I might actually try it considering I have access to gpt4.
  • 1
    @Wisecrack I see. I worry if you pipe answers through processing steps with something as coarse as a LLM the valuable information content will erode quickly.
  • 1
    @lorentz it might be interesting to do a step where you break the potential output into a series of bullet points, just the facts. And only then summarize, before applying sentiment analysis and rewriting.

    A final step might include feeding the out back in as input and trying to get matching bullet points out. If those match the initial bullet points semantically, then you would output the original rewritten summary, complete with sentiment transform.

    It's easily visualizable as a behavior tree api interacting as a sort of middleware with the core llm.
Add Comment