Hello,
I am researching for a way to utilize WebLLM in a Chromium CEF browser instance and then be able to make REST API calls to it.
The idea is that since WebLLM which is using WebGPU runs very well in a browser across different hardware and OS types, that if I can start instances with different language models and make REST API call to that instance then this would allow me to run LLM's on many commodity systems and let the CEF instance manage the GPU across instances.
The running of WebLLM in CEF is not a problem, but trying to figure out what is needed to make REST API calls to that instance is the question and challenge.
Any thoughts would be greatly appreciated.
Thanks in advance
Hello,
I am researching for a way to utilize WebLLM in a Chromium CEF browser instance and then be able to make REST API calls to it.
The idea is that since WebLLM which is using WebGPU runs very well in a browser across different hardware and OS types, that if I can start instances with different language models and make REST API call to that instance then this would allow me to run LLM's on many commodity systems and let the CEF instance manage the GPU across instances.
The running of WebLLM in CEF is not a problem, but trying to figure out what is needed to make REST API calls to that instance is the question and challenge.
Any thoughts would be greatly appreciated.
Thanks in advance