< AI Infrastructure / >

End-to-End AI Infrastructure: Deploy Local LLMs Remotely

npm install @praevisio/labs

Complete solution for running large language models locally using Ollama

I took a detour down the rabbit-hole of local LLM's (worth it), and learned how to run models locally, serve responses to a GUI interface instead of the default terminal, how to tunnel via ngrok, and expose the backend to a public client.

It definitely felt like drinking water from a firehose, and I leveraged AI heavily during the process as there were so many moving pieces I was not familiar with.

Some of the more interesting things I learned along the way include:

Running LLM's locally with Ollama
Setting model hyperparameters
Chunking model responses
Model quantization
ngrok tunelling

If you're interested in doing this too, I've linked a 3-part walkthrough below. You can also check out the repo for some helpful documentation on how to choose the right model, install dependencies, configure your server, deploy your app, and manage model responses.

Part 1: Run LLMs Locally with Ollama CLI

Part 2: Handling Raw Bytes Stream from Ollama API Endpoint

Part 3: Exposing Your Local API for Remote Access w/ ngrok