• 2 Posts
  • 114 Comments
Joined 2 years ago
cake
Cake day: November 10th, 2023

help-circle


  • It sounds like a step further than open-webui; it’s an enterprise grade client-server model for access to agents, workflows, and centralized knowledge repositories for RAG.

    In addition to local chatbot for executive/admin use, I can see this being the backend for developers running Cursor or some other AI enhanced IDE, with local knowledge stores holding proprietary documents and running against local large models.

    I am also curious about time share and prioritization of resources; I assume it would queue simultaneous requests. Presumably this would let you more effectively pool local compute, rather than providing A100 GPUs to each developer that may sit unused when they’re not working.

    Edit: Somewhat impressively, this whole stack does not even include a local inference provider; so it does everything except local models right now, and requests are forwarded to cloud inference providers (Anthropic, OpenAI, etc). But it does have the backend started for rate limiting and queuing, and true “fully offline/local” is on the roadmap, just not there yet.


  • The more memory, the better. On a discrete GPU, you want to focus on the VRAM, but the Mac platforms have the benefit of integrated memory, which is shared between the system and graphics processor, so it can hold much larger models in memory than in VRAM alone.

    For comparison, an RTX 3080, which has 12GB of VRAM, gives a pretty decent token rate on models which fit in memory (typically 110 tok/s on a 7-11B parameter model like mistral 7b).

    However, a Mac mini 32GB could run a more advanced model like DeepSeek R1 with 32B parameters, at about 11 tok/s. However, when running the same 7B parameter model as the RTX 3080, it would still only generate 14-18 tok/s.

    So you really need to balance capability/more advanced models with speed. If you’re okay submitting a prompt and walking away, the Mac mini is great value due to its integrated memory. But if you’re just getting started, you may find it frustratingly slow.

    Finally, for comparison, most of the closed cloud models like Opus, ChatGPT, and MiniMax are closer to 700B parameters, an order of magnitude larger than what a layperson could run locally. These models are finally getting to the point where they’re ‘useful’ for complex tasks like unattended coding, but good luck getting 1TB of VRAM outside of a datacenter.


  • Did you have anyone in a hiring position review your resume? Resume writing is an entire skill, and often, they need to be tailored to the organization where you are applying to work.

    There are a number of other factors, depending on who you talked to; do they have positions available? Is there a hiring freeze? Does the person you are talking with know the job requirements?

    If you really know the office, there is almost certainly someone local with hiring authority, whose job it is to interface with the headquarters. You will need to apply through the HQ Human Resources system, but they may have some authority to pull your resume from the applicant pool, but generally, these are competitive positions and they are not allowed to directly hire.

    If they have contract opportunities, though, you should figure out who the vendor is and apply through the company’s website instead.
















  • This is not correct.

    The Linux kernel has had support for the NTFS file system since 2021. The issues detailed in the article you linked to explicitly refer to issues with Proton and Steam, which require characters that are illegal in the NTFS specification and symbolic links, which the spec does not support.

    Sure, you may bump up against these limitations in other apps, but it is a hard crash in Steam and Lutris, which is why the distro has the article.