Artificial Intelligence (AI) is often presented like a complex field, the state of the art being impossible to understand, models too large to train, incredible work in progress moving forward that could change anything, yet a black box inscrutable for anyone except the selected few.
This is truly damaging to the field as it is a fascinating topic and even though indeed nobody can understand it all, we can all benefit from tinkering with it, learning from it and possibly even using it.
It's also often showcased as impractical for a "normal" person with their "normal" computer. Consequently everything must be done in the "cloud", far away from our scrutiny.
This is also damaging as it is simply false. Some very large models are indeed too large to run on a single computers but most, including what was considered the state of the art just a couple of years ago, can be run locally. In fact the trend to scale might be problematic for the entire field.
Regardless of all those limitations the goal here is to showcase that even though not everything can be done on your desktop, a lot can. Composing from that and learning how it works can help to reconsider a potential feeling of helplessness.
Not only can you self-host AI models, use them, adapt them, but there is a whole community and set of tools to help you do so. This movement itself is very encouraging. AI does not have to be a black box. Your digital life does not have to be owned by someone else, even for the state of the art.
PS: this is also aligned with my own naive heuristic explicited since 2020 : avoiding gadgets or services (free or not) that increase inequality by design, through technology or business model or both, would be a good starting point.
See also AgainstPoorArtificialIntelligencePractices for a less technical piece with 5 simple recommendations.
Own page on ML started in 2011, cf MachineLearning?action=diff#diff1320660278 with other page on own wiki in 2008, cf Seedea:Seedea/AImatrix?action=diff#diff1221994306
A lot more on https://x.com/search?q=%40utopiah%20openai
Prompted by recent (July 2024) news on Microsoft and Google completely busting their own energy goals due to AI.
Familiarity with self-hosting, e.g Linux command line, see Shell, and containerization, e.g Docker. Ideally familiarity with Python the most popular work in AI is currently often done in that Programming language.
A desktop with a proper graphic card is recommended. Some solutions do not need it all while others need the last generation of GPU. It is also possible to rent such a configuration in the cloud if necessary, while insuring that the cloud provider has terms of services and overall practices aligned with your needs.
Linux desktop with latest generation NVIDIA GPU, Docker installed and running with NVIDIA support. Not that this can be done rootlessly to insure a bit more safety. Overall do remember to backup your data regardless of what you are trying.
Using Telegram bot I am able to query models from any device while my desktop is turned on, e.g here llama.cpp running Mistral https://twitter.com/utopiah/status/1720122249938628951 , and consequently considering a local first (no 3rd party relaying messages) via https://git.benetou.fr/utopiah/offline-octopus/issues/22
This way I can for example generate text from my mobile phone, being on the same network or not.
Chaining example of speech-to-text, LLM then text-to-speech for a "natural" (albeit slow) kind of "hands-free conversation" https://twitter.com/utopiah/status/1720475902218317930
This could also be done behind a VPN or relying on Tailscale without using Telegram or any chat program. For now using a chat porogram makes sharing on a mobile phone a lot more convenient.
I had several heated discussions on social networks (cf e.g https://lemmy.world/post/30563785/17396851 ) sparked recently by someone who didn't know the training cost (fair, we can't always know how everything we played with was made) but more importantly, did not care.
That was shocking to me. There are countless articles explaining that "AI", however you define it, however you might believe its impact might be, has a very tangible ecological impact. It takes a lot of connected computers with very energy consumings parts, GPUs, TPUs, etc, and a lot of water to cool them down.
What is even more important IMHO is that training is precisely the part of the entire process, between research, programming, training, fine-tuning, inference, etc that consume the most of energy. Assuming it does take about an order of magnitude more than the other steps, saying one does not "care" for it means they do not care for their environment.
I have a hard time believing that is possible so I assume that what they mean is that training is basically insignificant because it's offset by positive impact.
In order to betwee estimate that, potentially one could try to factor in
The hope here would be that to have few basic rules to faciliate chosing between equivalent models, or also potentially deciding not to train models that will not be used enough to offset their costs.
Finally, the heuristic would be :
Note that the argument is for future models and current models. Even though we can not get the energy back for trained models, selecting the most efficient one, energy-wise, might help set a trend for the training of future models.
There are already quite a few initiatives for that with e.g. coffee with Fair Trade Certification or ISO 14001, in electronics Fair Materials, etc.
The point being that there are already mechanisms for feedback in other fields and in ML there are already model cards with a co2_eq_emissions field, so why couldn’t feedback also work in this field?
tts_models/fr/mai/tacotron2-DDC
for French voice generation for an XR pedagogical small game
docker run --rm -v ~/tts-output:/root/tts-output -v ~/.tts-models/:/root/.local/share/tts/ ghcr.io/coqui-ai/tts-cpu --text "Allez, on recommence." --out_path /root/tts-output/hello.wav --model_name tts_models/fr/^Ci/tacotron2-DDC
with volumes for output and to preserve models (faster synthesis time)
~/Prototypes/FlagEmbedding/wikidata.py
~/Prototypes/find-combinable-brick/gensim-tfidf.py
(May 2023)
tlm suggest a command to get my ip
as query returned uname -a
"stream": false
window.ai
object
ollama
today, not bad at all for 1.1Gb
~/Prototypes/oakd-lite/bin/python3 ~/Prototypes/oakd-lite/depthai-python/examples/ColorCamera/autoexposure_roi.py
Short scripts in ~/bin
to make the different tools and their result to combine more conveniently.
stt
: using Whisper.cpp to convert an audio file to text
screenocr
: capture part of the screen then OCR the result then Web search
ocr-to-wikiembeddings
: capture part of the screen then semantic search it
monitor-voice-via-whispercpp-stream
: using Whisper.cpp interatively
get-pages-from-embeddings
: returns top10 wiki pages after embedding
santacoder
: complete a code prompt via HuggingFace API and clean output
Relying on
url-to-text
: uses readibility to get text content from a URL
yt-dlp
to get online videos with optionally Ffmpeg to get audio for transcription
Often models and inference comes from Python code. Using Flask provides an HTTP (or even HTTPS) endpoint that makes it easy to integrate with a frontend, e.g a WebXR page.
# consider a venv then ./bin/pip3 install flask uuid # res is the result from a Python function, any inference here with models already downloaded. import uuid from flask import request, Flask app = Flask(__name__) from flask import request @app.route('/toimage/<prompt>', methods=['GET']) def login(prompt): myuuid = uuid.uuid4().hex res[0].save('./static/'+myuuid+'.jpg','JPEG') return {'prompt': prompt, 'url': '/static/'+myuuid+'.jpg'} @app.route('/totext/<query>', methods=['GET']) def gen(query): return {'prompt': query, 'top10': res} if __name__ == '__main__': print('can then expose as https via e.g ngrok http 5000') app.run()
Note this could also use Gradio instead.