Recently I started reading up on how to host my own Large Language Models (LLMs), and it's not entirely because I like messing with tech (which I do), but because of a practical problem: ChatGPT seems to have started to reduce its free tier usage limits a lot more. In the past, once you hit your quota of the latest chat model, it would just downgrade you to a less powerful model, but you could still continue using it for effectively an unlimited period of time. However, in the past few days, after some relatively casual use, I reached this point when you just could not use ChatGPT any more and had to wait for my usage limits to reset. While I wasn't doing anything particularly critical, this could have been quite disruptive had I been midway through one of my projects or needed some quick AI assisted thinking. For context, when I was learning Linux and docker, I think I could have easily hit a hundred or more queries in each session. So I suppose the question is, what's the best and most cost effective way to ensure uninterrupted access to a LLM? Self-hosting locally or via some cloud service? Any suggestions?