The thing to remember is that it's a LANGUAGE model. It knows how to translate concepts between languages, which is really useful. But it is not an expert system.
This isn't correct. LLMs are language models, yes, but that description understates them, primarily, I think, by underestimating how much of the world is encoded in language. They could not generate reasonable-seeming output without also containing sophisticated models of the world, models that are almost certainly far broader and deeper than any expert system we've ever developed. And the newer LLMs aren't just LLMs, either. They have a reasoning overlay that enables them to reason about what they "know". This is actually extremely similar to how our brains work. The similarity is not accidental.
The proper way to use LLMs is as interfaces to other systems, rather than as standalone things.
Maybe, but I think the your description misstates the LLM's role in such a hybrid system. Rather than the LLM being "just" an interface, I think you would ask the LLM to apply its own knowledge and reasoning to use the expert system in order to answer your question. That is, I think the LLM ends up being more like a research assistant operating the expert system than a mere interface.
However, if you're going to do that, do you really even need the expert system? Its role is to provide authoritative information, but curating and compiling that sort of authoritative data is hard, and error-prone. You probably don't want the LLM to trust absolutely in the expert system, but to weigh what it says against other information. And if you're going to do that, why bother building the expert system? Just allow the LLM to search whatever information you'd have used to curate the data for constructing the expert system, and to compare that to knowledge implicit in its model and perhaps elsewhere.
Many of our current-generation systems have access to the web. I've been using Claude and found it extremely good at formulating search queries, analyzing the content of large numbers of relevant pages and synthesizing a response from what it found. It annotates its output with links to the sources it used, too, enabling me to check up on its conclusions (I've yet to find a significant mistake, though it does miss important bits on occasion). It could be better, could analyze a little more, but it's already shockingly good and I'm sure it will get rapidly better.
This seems like a much more sensible way to make LLMs better than by backing them with exhaustively-curated expert systems. Yes, they will make mistakes, similar to how a human research assistant would. But this approach will ultimately be easier to build, and more flexible.
As an aside, I had a very interesting experience with Claude the other day. I needed to analyze a bit of code to see whether or not it made a security vulnerability that I had already identified unreachable, or whether there was some input that could be crafted to provide the output the attacker needs in order to exploit the vuln. Claude did not immediately give me the right answer, but it pointed out important and relevant characteristics of part of the code and analyzed the result almost correctly. I pointed out some errors in its conclusions and it corrected its mistakes while pointing out an oversight I made. I pointed out some oversights and mistakes it made, and so on. Over the course of about 10 minutes, we jointly arrived at a full and completely correct answer (bugs in the bit of code did indeed block exploitation of the vuln) and a concise but rigorous proof of why our answer was correct.
In sum: The interaction was pretty much exactly like working with a smart human colleague on a hard problem. Neither of us were right in the beginning, we both drew (different) partially incorrect conclusions, and were fuzzy about different parts of the problem. In discussion we each pointed out flaws in the other's reasoning, and in the process both came to understand the problem better until we finally arrived at the correct conclusion. I contributed more than Claude did. I think I can safely say that at least in the area of code analysis, I'm smarter than Claude (though Claude is faster). But this really wasn't a case of "rubber ducking". Claude also contributed significant insights.
LLMs today are not just schochastic word generators (assuming that phrase ever had any real meaning). If you think they are, you haven't used them much.