Community-driven efficient LLM development - Possible?

andrew0 ( andrew0@lemmy.dbzer0.com ) · 2 months ago

Oh, I remember this! Thanks for sharing, very nice find! Could be a worthwhile approach once we have the data :)

andrew0 ( andrew0@lemmy.dbzer0.com ) · 2 months ago

Apologies for the late reply! Busy days :D

I agree with you. Crowd-sourcing this type of research would be a completely different goal than what the AI Horde was built for, and would probably not be sustainable with part-time / volunteer researchers. Perhaps it’s best for us to just wait until others make more substantial progress.

The goal would still have been inference for the Horde, but with sharing of feedback based on the model’s outputs, to align it more with the original one. However, after considering this approach more, I am afraid that the maths behind it makes it impossible to “reconstruct” the original model’s manifold, or at least capture the same behaviour in all use cases.

I came here to propose this idea because, to the best of my knowledge, this is only LLM community that actually pushes for sharing of resources. However, I have seen a few days ago a post on the LocalLlama community advocating for sharing of OpenCode sessions in order to crowd-source a fine-tuning dataset, so it seems that more people are having the same thoughts! :)

I will keep an eye out on other advancements, and if I actually end up having some time, perhaps I’ll return with some contributions. I agree with you that such a project mostly relies on inference, in which case the AI Horde is not the only one that can provide that capability. What we would need is deploying such a model on HuggingFace, and creating an API endpoint for sharing training data for people that are interested in contributing.

Thanks a lot for offering your thoughts, and taking the time to write such lengthy responses to me! I hope you have a nice weekend!

andrew0 ( andrew0@lemmy.dbzer0.com ) · 2 months ago

I hope things get better for you! I would have recommended you get out of your country before they cancel your passport or something, but I’m not sure if it gets any better in other places. Even Europe seems to be speed-running fascism, and it’s probably a matter of time before we follow in the US’ steps…

Stay strong, and if things get really hairy, consider living off-grid. Gather together with other people that go through this, and make an escape plan. I believe it’s becoming more and more sustainable nowadays with the advancements of solar panel technology to live outside civilization well.

andrew0 ( andrew0@lemmy.dbzer0.com ) · 2 months ago

Indeed, the quantization described in the Microsoft paper (and even in this NanoQuant paper) severely messes up the behaviour of the model. Even in this newer paper, you’d still incur ~2x performance loss (which is better than what was reported by 1.68 bit paper, if true), in terms of perplexity. However, as per the other paper I have added in the edited post, it is possible to further align a quantized model with the original one. In the end, LLMs are just fancy math that seek to maximize human preferences, and most of the bigger models were just better trained at doing that. With this approach, all we would have to do is just further refine the LoRA weights until we can match the behaviour of the unquantized model, which wouldn’t be that expensive if all we have to do is fine-tune a few million parameters. It might be that at the beginning we’re seeing worse performance compared to a 3B parameter model, but with more refinement we can further unlock some of the original performance.

Regarding the use of the Horde, I believe that behaviour alignment can’t be done without actually using it. Just like corpo-AI are giving away their models so that they can further get data, we could have a similar, but much more compute-efficient, community-driven approach. Models by the people, for the people, if you will. Furthermore, as I mentioned, I think this would be the only community that has the compute and desire to push improvements on such an idea long-term, as it isn’t profit-driven.

Let’s say that this whole experiment starts with an extreme case, the MiniMax M2.5 model, and we abstract away from any architectural fancy stuff. At ~230B parameters, we would have a 1-bit model size of ~28.75 GB, and, as per Table 2 of NanoQuant, ~23 GB if we were to prune 20% of the weights. This would be enough to fully fit it on a 24GB VRAM GPU. Following this, we could get a well-balanced list (i.e., easy, medium, hard) of reasoning tasks, and fine-tune the LoRA layer to match the output. Heck, we could even tailor this to specific tasks, such as role-playing, coding, etc. It will be a long-term experiment where we might serve two answers (depending on Horde availability), one generated by the quantized model + LoRA and another that is regularly deployed. The user could then choose the model they prefer, and use that information later for further training.

This would indeed be quite cumbersome to set up, and could very well be wasted time. Users might even opt out from this because it could take too much time to help. But hey, I still think it would be a cool experiment to see if consumers could actually use these larger models on regular hardware, and get close to the original performance without paying for all the compute that is needed.

andrew0 ( andrew0@lemmy.dbzer0.com ) · 2 months ago

The models themselves would indeed be costly to train if you were to go for the regular approach. You would have to “upscale” the weights to be fp32 from binary, which would make the models only trainable on the usual amount of GPUs. That is because the training process relies on back-propagation, which only makes sense if your operations are differentiable. Since addition is not differentiable, your binary weights would only be updated by 0, so no change.

However, LoRA (16-bit) QLoRA (4/8-bit) fine-tuning can be done on a single GPU, assuming you can fit the model on it. Everything is frozen, except for a separate small network, which is updated during training. This can have BF16 or F32 precision, and would be trained as you would a regular network.

What I am suggesting is to actually leverage bigger models that come out, and attempt to compress them using the proposed algorithm (if it actually scales to bigger models). From there, we could employ some tricks to improve performance, think latent reasoning, community-driven RLHF only on the (Q)LoRA layers, etc. With time, we would be able to pool together a dataset and a pipeline that can be applied to any open-weight model that is released.

But it does sound a bit easier than it would be in practice. This heavily relies on re-purposing the Horde to also store training data (with user consent, of course), user scores, and later introduce a training queue.

andrew0 ( andrew0@lemmy.dbzer0.com ) · edit-2 2 months ago

Community-driven efficient LLM development - Possible?

andrew0 ( andrew0@lemmy.dbzer0.com ) · 3 months ago

Wero is being rolled out slowly in Western Europe. I believe it’s already a thing in Germany, France, Belgium, and followed soon by the Netherlands.

andrew0 ( andrew0@lemmy.dbzer0.com ) · 6 months ago

That’s pretty cool! Does anyone know if this allows one to play anti-cheat enabled games? Would be interesting to know if we can spoof HWID stuff with this to make it look like we’re playing on an actual Windows device.

andrew0 ( andrew0@lemmy.dbzer0.com ) · 8 months ago

Tesla’s bang for buck is horrible. You get a shitty car made from the worst plastic possible, and on top of that they don’t even have good quality control. The only thing that differentiated Tesla from the competition previously was the battery technology, but they no longer have that edge nowadays.

The Norwegians are probably getting them because they got used to it, and probably don’t want to rely on Chinese cars. Beats me why they would select a Tesla nowadays over the European brands.

andrew0 ( andrew0@lemmy.dbzer0.com ) · 9 months ago

This is true only if the decisions were made independently. If you allow people to make a decision after they’ve seen the metrics, this no longer holds.

Here’s an example of the first. You go at a farmer’s market with a cow and you ask everyone to write on a piece of paper what they think the weight is. If you get the replies and average them, you will find that the mean of all answers will be quite close to the real answer. A mix of non-experts and experts will iron out a good answer somehow.

Now take the average experience of going to a restaurant. One might have just opened recently, has great food and great staff, but only 5 reviews, at an average of 3.8 or something. Another restaurant nearby has been open for 3-4 years, and has 1000 reviews, at maybe 3.9. People will usually follow the one with more reviews because they think it’s the safer option due to the information available. However, if you were to hide this and ask them to choose by just looking at the venue and the menu, they would probably choose the first one.

Group dynamics are quite interesting, and the psychology behind this is quite funky sometimes :D

andrew0 ( andrew0@lemmy.dbzer0.com ) · 11 months ago

Didn’t this guy just go on French national TV and say that Macron is a dictator?

I think if this guy gets voted in, Romania might say bye bye to its EU funds.

andrew0 ( andrew0@lemmy.dbzer0.com ) · 1 year ago

You’re right! Sorry for the typo. The older nomic-embed-text model is often used in examples, but granite-embedding is a more recent one and smaller for English-only text (30M parameters). If your use case is multi-language, they also offer a bigger one (278M parameters) that can handle English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, Chinese (Simplified). I would test them out a bit to see what works best for you.

Furthermore, if you’re not dependent on MariaDB for something else in your system, there are also some other vector databases I would recommend. Qdrant also works quite well, and you can integrate it pretty easily in something like LangChain. It really depends on how much you want to push your RAG workflow, but let me know if you have any other questions.

andrew0 ( andrew0@lemmy.dbzer0.com ) · edit-2 1 year ago

Have a look at Ollama embeddings. Easy to set up and the models are much smaller than a typical LLM.

andrew0 ( andrew0@lemmy.dbzer0.com ) · 1 year ago

Romania has previously jumped into a war, only to change sides later. I wouldn’t be surprised if they end up taking the bait before the upcoming presidential elections. From what I’ve heard, the far right candidate that’s left in the race is betting that he will get the votes of everyone that voted for Calin Georgescu. His platform? Being a boot licker for Trump.

Troubling times for the Balkans.

andrew0 ( andrew0@lemmy.dbzer0.com ) · 1 year ago

Thanks for the SherpaTTS suggestion. I really like the GLaDOS voice <3

I am not sure which phone you use, but are you able to set FUTO Voice as the default “Voice input” in the Android settings? I played around with a few apps, which show up. However, FUTO is not an option here :(

andrew0 ( andrew0@lemmy.dbzer0.com ) · 1 year ago

Thanks for the suggestion! I gave this a try, but it seems that it won’t register any voice 🤔 However, it seems like it shows up in my settings, so it’s a good sign. I’ll try to get it to work :D

andrew0 ( andrew0@lemmy.dbzer0.com ) · 1 year ago

Thanks! I was actually looking at this, but I gave up because I couldn’t really figure out how to get a multilingual model running through Obtainium. I’ll try again :D

andrew0 ( andrew0@lemmy.dbzer0.com ) · edit-2 1 year ago

Open Source Text-to-Speech and Speech-to-Text on Android?

andrew0 ( andrew0@lemmy.dbzer0.com ) · 1 year ago

Some other EU countries have had their own struggles helping Nazi Germany, and some still have parties that support far-right characters. That doesn’t mean that the countries themselves are led by nazis. Furthermore, as the guy above mentioned, the leadership of Ukraine never followed through to regard this guy as a hero today. However, I find the following quite interesting:

A poll conducted in early May 2021 by the Democratic Initiatives Foundation together with the Razumkov Centre’s sociological service showed that 32% of citizens considered Bandera’s activity as a historical figure to be positive for Ukraine, as many considered his activity negative; another 21% consider Bandera’s activities as positive as they are negative.

So, right before Russia invaded Ukraine, people were against this guy. However, as soon as Ukraine’s independence was threatened by the same entity that this guy advocated against, people changed their opinion. The poll was taken immediately after the invasion, which would be a bit of a confounding variable here.

I chalk it up to socio-economic issues due to the Soviet Union, which led to poor education in many areas of Ukraine. See East Germany, where a majority of people have voted for AfD. Oh, and let’s not forget the overall negative sentiment against Russia after they invaded in 2014.

Nevertheless, would you argue that invading Ukraine to “denazify” them makes sense in this context? You mentioned atrocities from WW2, but that’s not being done today. Whatever deaths that were happening pre-2022 conflict were due to the Donbas War, which Russia also instigated. What reason would you then have to support Russia in this conflict? It is pretty clear that they are pulling in many arguments to justify their expansionist wishes.

Don’t get me wrong, I am also of the opinion that the US should gtfo of Europe, but I do not see a reason to excuse whatever Putin’s regime has been doing.

andrew0 ( andrew0@lemmy.dbzer0.com ) · 1 year ago

You keep throwing the word “nazi” around quite a lot. Explain what it is, and why Ukraine is ruled by nazis now.

andrew0 ( andrew0@lemmy.dbzer0.com ) · 1 year ago

Russia has been increasing trade with China in the past few years. I’m afraid that, even without US support now, Russia would still be able to carry on for a while longer :(

andrew0 ( andrew0@lemmy.dbzer0.com ) · 1 year ago

Ok, but there are laws involved here. In Romania, you can’t be president if you are under 35 years old, or, among others, if you have a criminal record. The people that were stopped from running for president weren’t barred because they went against the mainstream parties, but because they openly promoted personalities that were doing the equivalent of the Holocaust in Romania. This is punishable by law by up to 3 years in jail, and they’re being actively investigated.

The lady in this post was previously denied her run in the summer of last year, and she kept quiet about it until now because they probably told her they won’t pursue it further if she steps back. She took the deal, probably because she realises that she’d rather keep grifting on Facebook than spend 3 years in jail.