Quick guide to catch up and get started with AI image and text generation as of April 2023

Rob
4 min readApr 3, 2023

I have been out of the AI world for a few years as I’ve focued on infosec, but since I achieved my goal of obtaining the OSWE and a bunch of other certs, I’ve spent my past 2 weeks just catching up with AI. And what a wild 2 weeks it’s been! Let’s jump into it.

I have focused on self hosted AI tools, which I have ran on my Nvidia 2080 and 3080 — both which have 8 GBs of VRAM.

Image Generation

Stable Diffusion (SD) is the way to go. AUTOMATIC1111 has created a wonderful repository that can get you started right away:

The first thing you’ll want to know about is how to get more models. The best place to start (and stop) is https://civitai.com/. They have a huge variety of models, and many can be directly downloaded and placed in your model s folder and be ready to go. Model makers and users can also include the prompts they used to create their pictures so it is very easy to just pick one you like and use it as a starting point.

Reddit is also a good source of this. From using SD to create logos to cursed videos of Will Smith eating spagetti, the subreddit has a lot of inspiration and tutorials available.

While browsing civai, you might also notice the existence of LORAs — SD models that apply small changes to standard models. AUTOMATIC1111’s repo includes support for LORAs so you can incorporate them just as easily as a standard model.

The next step to improve your outputs is ControlNet. As the name suggests, this repo gives you tremendous control over what SD is able to make. From turning scribbles into images to posing characters, it gives users fine control over the outputs. Reddit users even found ways to incorporate layers of controlnet to fix both poses and hands on their outputs.

Scribbles to art

Text Generation and chatbots

Web UI’s

I’ve experimented with 3 different clients here, KoboldAI, TavernAI, and oobabooga’s webui, and it was a hard choice but I think I liked the latter the most, mainly because it was the only one that supported the leaked Facebook LLaMa model. If you manage to get your hands on it through various means (such as googling for it), the base model does not have the weights included which make it unsuitable to act as a chatbot, but Stanford created weights to address that purpose.

Unfortunately, they didn’t release them. Fortunately, the released how they did it and someone redid the process for the 7B model here: https://github.com/pointnetwork/point-alpaca

Since this is a leaked model, the repo has to tip toe around it a bit and use encrypted datasets, but I’ve had no issues with it and I was able to run it on a 3080 with 8 GB of VRAM, although it was quite slow and I ran out of memory often.

If you do not intend to use the Facebook LLaMa model, then I would recommend Kobold instead. It has a large repository of available models it can download and is very flexible. TavernAI is a bit more designed to be used with cloud instances of GPU and operates based on an API, but the amount of character cards available to get you started is astounding so do give it a go if you want an uncensored character.ai. However, those cards can be exported to oobabooga’s webui. Sometimes directly, otherwise via copy pasting the prompts in here and importing the JSON file.

I would recommend trying all of them, but so far I think oobabooga’s webui is primed to become the most comprehensive one.

Models

The leaked Facebook LLaMa model with the Stanford weights is the most comprehensive one but it has high hardware requirements and needs various workarounds described above to work. If you’re looking for something with a far easier learning curve, try out Pygmalion AI. It can be easily imported and downloaded directly from the Web UI’s mentioned above and the 6B model can be run even on 8GB VRAM, although with some minor loss in quality. If you are looking for more models, KoboldAI has a list of all the models it can easily import and download from the Web UI so feel free to experiment.

Conclusion

That’s it. With this quick tutorial, you should be ready to get started in a matter of hours. The rabbit hole goes much, much deeper, and there are ways to incorporate LLMs together via tools such langchain or extensions built into the UI. LLMs are a very exciting space to be in right now and the technology moves so fast that 2 months can seem like a decade ago in terms of capability. Go out there and experiment!

If you’ve enjoyed this article, feel free to follow me @robsware on Twitter.

--

--

Rob

Penetration tester, tinkerer, developer and AI enthusiast.