• lime!@feddit.nu
    link
    fedilink
    arrow-up
    6
    arrow-down
    1
    ·
    11 days ago

    one of my most recent fun activities came from discovering the “allow editing” button in koboldcpp. since the model is fed the entire conversation so far as its only context, and doesn’t save data between iterations, you can basically re-write its memory on the fly. i knew this before but i’d never though to do it until there was an easy ui option for it, and it turned out to be a lot of fun, because when using a “thinking” model like qwen3.5 you can convince it that it’s bypassing its own censorship.

    basically you give the model a prompt to work off of, pause it in the middle of the thinking process, change previous thoughts to something it’s been trained to filter out (like sex or violence or opinions critical of the ccp), and it will start second-guessing itself. sometimes it gets stuck in a loop, sometimes it overcomes the contradiction (at which point you can jump in again and tweak its memory some more) and sometimes it gets tied up in knots trying to prove a negative.

    a previous experiment was about feeding stable diffusion images back into itself to see what happens. i was inspired by a talk at 37c3 where they demonstrated model collapse by repeatedly trying to generate the same image as they put in (i think this was how sora worked).