Roadmap June 2023 #1729
Replies: 8 comments 12 replies
-
Are there Metal-like zero copy mechanisms in either of these frameworks? It seems like a necessity for IGPs.
Maybe the recent MeZO (forward pass only) training paper is relevant to this effort? https://github.com/princeton-nlp/MeZO |
Beta Was this translation helpful? Give feedback.
-
@niklaskorz i saw your comment here - any thoughts on the task "Add GPU backend prototypes following the Metal example" with Vulkan / WebGPU? |
Beta Was this translation helpful? Give feedback.
-
For llama_state, is it safe to say that all the states touched in What would be come of Context: I plan to put this behind a grpc service so per-client state is needed. Currently, state-switching is done via Edit: Actually there are also those metric variables like |
Beta Was this translation helpful? Give feedback.
-
do we consider to switch ggml.c to ggml.cpp so that we can leverage template instead of macro to simplify the code? |
Beta Was this translation helpful? Give feedback.
-
Hey could batch Inference be a task to add for next month maybe? I think it would really help with using this at scale. |
Beta Was this translation helpful? Give feedback.
-
https://github.com/ziwang-com/AGM/issues/155 |
Beta Was this translation helpful? Give feedback.
-
Hey @ggerganov, I'm having trouble finding references to text to speech in the new roadmap project. Is that still in the plan? |
Beta Was this translation helpful? Give feedback.
-
The latest update was @PABannier implementing Meta's Encodec codec with |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
New roadmap format as Github project: ggml : roadmap
Outdated below
Previous: Roadmap May 2023
News
The
ggml
project has been funded:Tasks
Refactoring pass
Didn't get to this in May - should do this in June
"There is a lot of code duplication in
ggml.c
which probably can be simplified with a good set of macros. The goal is to keep the code size manageable, while we avoid reaching "macro hell""Integrate recent efforts for training
Amazing work by @xaedes continues to impress: Train Text from scratch #1652
Ultimately, with the ability to train mini models, I am interested in making a small prototype of the following idea for faster inference: Combine large LLM with small LLM for faster inference #630 (comment)
Integrate recent efforts in improving the threading of
ggml
Some very good points and analysis in Fine tune MUL_MAT, new threading (spin+wait/notify), speedup q_f32 BLAS by splitting COMPUTE stage #1632
Will look into integrating most of the stuff into
ggml
to try and improve the CPU performance furtherExtend Metal shaders to support other quantizations + optimize performance
Currently, the Metal implementation supports just
Q4_0
andF16
. Also, the existing implementation is probably far from optimal. More info: llama : Metal inference #1642Very good field for contributions
Implement inference of new models
There are already some very interesting models that should be supported by
ggml
:Segment Anything Model (SAM)
Still working on the Encoder - progress is a bit slow due to several new operators involved, but I think it is slowly working out: examples : add sample SAM inference ggml#74
Falcon
Bark (text-to-speech)
Advance the community effort for unified
ggml
model formatThis work has been recently initiated and aims to provide a future-proof file format for
ggml
models:ggml : unified file format ggml#220
Add
llama_state
See past Roadmaps - have been postponing this for quite some time. See Roadmap May 2023 #1220 (reply in thread) if interested in giving it a try
Add GPU backend prototypes following the Metal example
For example, it would be interesting if we can add a WebGPU or Vulkan backends in a similar way as we did with Metal. I'm completely unfamiliar with the details of these frameworks, but I'm hoping that people might be interested in giving it a try
Beta Was this translation helpful? Give feedback.
All reactions