Roadmap June 2023 #1729

ggerganov · 2023-06-07T04:13:16Z

ggerganov
Jun 7, 2023
Maintainer

AlphaAtlas · 2023-06-07T07:17:12Z

AlphaAtlas
Jun 7, 2023

For example, it would be interesting if we can add a WebGPU or Vulkan backends in a similar way as we did with metal.

Are there Metal-like zero copy mechanisms in either of these frameworks? It seems like a necessity for IGPs.

Integrate recent efforts for training

Maybe the recent MeZO (forward pass only) training paper is relevant to this effort? https://github.com/princeton-nlp/MeZO

0 replies

sandorkonya · 2023-06-07T07:56:19Z

sandorkonya
Jun 7, 2023

@niklaskorz i saw your comment here - any thoughts on the task "Add GPU backend prototypes following the Metal example" with Vulkan / WebGPU?

0 replies

bullno1 · 2023-06-08T07:22:23Z

bullno1
Jun 8, 2023

For llama_state, is it safe to say that all the states touched in llama_copy_state_data should be isolated into a new struct?

What would be come of llama_copy_state_data and llama_set_state_data then? llama_serialize_state and llama_deserialize_state?

Context: I plan to put this behind a grpc service so per-client state is needed. Currently, state-switching is done via llama_set_state_data.

Edit: Actually there are also those metric variables like t_sample_us I guess those should be moved to llama_state too?

1 reply

bullno1 Jun 12, 2023

Ok, so I have started refactoring into llama_state.

I think I will leave metrics inside llama_context.

They are mostly informational and has no bearings on the output.

I don't want to duplicate all the sampling functions.
They are basically context-free and currently, the context is only used to record metrics.

The duplication is not that dramatic: https://github.com/bullno1/llama.cpp/blob/llama_state/llama.h

howard0su · 2023-06-09T01:01:48Z

howard0su
Jun 9, 2023
Collaborator

do we consider to switch ggml.c to ggml.cpp so that we can leverage template instead of macro to simplify the code?

7 replies

nightscape Jun 22, 2023

@philpax would switching to C++ break the current approach of https://github.com/rustformers/llm/ ?

philpax Jun 22, 2023

It wouldn't break it (especially if the ABI remains C), but adding C++ means that a C++ toolchain is required, which increases the number of build dependencies. We personally encountered this with our use of a bindings generator which required C++ and caused problems for some Fedora users.

I'd suggest keeping it in C and offering a C++ wrapper; that's what most libraries in this situation do when they want a C++-friendly interface.

wtarreau Jun 24, 2023

@hfassold

I would strongly advocate to go to C++, if only for the templates ...

On the opposite, C++ hinders contributions. Virtually every developer can understand and modify C as everything is explicit, there's no magic; but much less are able to even just parse C++ which is cryptic by nature. I would instead advocate for dropping the few bits of C++ from llama.cpp to make it a more portable and more accessible full-C project!

clort81 Jul 16, 2023

I would strongly advise to not

DeveloperPaul123 Sep 16, 2023

Switching to C++ would be great in my opinion. I don't do much work with C so switching to C++ would actually allow me to contribute. Right now, it's a bit more effort for me to reason about the code though I can't say I've spent too much time with it.

nivibilla · 2023-06-18T21:38:49Z

nivibilla
Jun 18, 2023

Hey could batch Inference be a task to add for next month maybe? I think it would really help with using this at scale.

3 replies

ggerganov Jun 19, 2023
Maintainer Author

Batch inference has been demonstrated as part of #1360:

https://github.com/ggerganov/llama.cpp/blob/16b9cd193965769089881bb8ec012fccca7b37b6/examples/baby-llama/baby-llama.cpp#L768-L777

Although I haven't played with the code in too much detail, so not 100% sure how feasible for general application it is yet.
If you give it a try, definitely let us know your experience

nivibilla Jun 19, 2023

Ah I see. Thanks !

nivibilla Jun 19, 2023

Btw, I don't know any cpp. So I'm as good as llama coding itself lol

ziwang-com · 2023-06-20T04:05:35Z

ziwang-com
Jun 20, 2023

https://github.com/ziwang-com/AGM/issues/155
llamacpp归一化与AGM阿格姆项目

0 replies

dennislysenko · 2023-07-13T17:03:37Z

dennislysenko
Jul 13, 2023

Hey @ggerganov, I'm having trouble finding references to text to speech in the new roadmap project. Is that still in the plan?

0 replies

ggerganov · 2023-07-14T08:34:27Z

ggerganov
Jul 14, 2023
Maintainer Author

The latest update was @PABannier implementing Meta's Encodec codec with ggml which was an important step forward. I don't think I'll be able to work on bark before finishing the existing roadmap, but if someone else is already working on it, I can add it to the roadmap and reference the respective project.

1 reply

PABannier Jul 14, 2023

I'm working on bark.cpp.

Roadmap June 2023 #1729

Uh oh!

Uh oh!

ggerganov Jun 7, 2023 Maintainer

New roadmap format as Github project: ggml : roadmap

Outdated below

News

Tasks

Replies: 8 comments · 12 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

howard0su Jun 9, 2023 Collaborator

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov Jun 19, 2023 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov Jul 14, 2023 Maintainer Author

Uh oh!

ggerganov
Jun 7, 2023
Maintainer

Replies: 8 comments 12 replies

howard0su
Jun 9, 2023
Collaborator

ggerganov Jun 19, 2023
Maintainer Author

ggerganov
Jul 14, 2023
Maintainer Author