AI on your own hardware. What are the advantages and changes compared to cloud solutions?

- Advertisement -

The performance of artificial intelligence models is fundamentally affected not only by the choice of video card, but also by its video memory. In this review, we’ll focus on why it’s important to ensure models run on the GPU’s Tensor Cores and how memory pressure impacts overall performance. You can then read the entire article on the PCTuning website.

- Advertisement -

Performance claims

After installing and running your models for the first time, make sure the model is using your graphics card for acceleration, especially the Tensor core. If the model were mistakenly calculated, for example, on a processor, it would slow down significantly, and on some systems it would not even be able to start.

I ran the entire test on an Nvidia RTX 4070 graphics card with 12GB of VRAM. On it, AI models began writing almost immediately after entering a hint, and the total response time lasted in proportion to its length. He generated a simple answer in a matter of seconds and managed to write another text faster than I could read it. Images were generated from a few seconds to tens of seconds per image according to the selected quality parameters.

Here you should distinguish between the performance of the card itself, as well as the amount of video memory. Each model’s specifications will indicate video memory size requirements; what happens if you exceed it? Some models do not start, but for me, for example, the Mistral-small (22B) worked, it just responded more slowly.

The first limitation is memory; if there is not enough memory, the AI ​​model will not be able to fully utilize the Tensor cores. When the model is placed in VRAM, the performance of Tensor cores becomes limiting. This way you will see them fully loaded within a few seconds before the model stops responding in chat.

Low VRAM situation

In practice, the lack of memory looked like this: the VRAM of the video card was filled almost to the maximum, the video card was used only at the lower tens of percent, but the processor was used. However, it was not fully occupied, but about 70%. Thus, the model was computationally decomposed across the entire system, resulting in a noticeably slower response. The same behavior occurred on the Solar-Pro model, which also has 22B parameters.

Detailed instructions for running local AI and various examples can be found here..

Locally accelerated AI is easy to use and offers benefits such as increased privacy and cloud independence. How to launch AI and what…

Source :Indian TV

- Advertisement -

Subscribe

Related articles

Players sue Ubisoft | Zing

The reason is the situation around the racing game...

Meet the Czech survival Middle Ages: Peasants and Knights – INDIAN read more

Czech independent studio Neronian via Czech-Slovak Direct spoke about...

LEAVE A REPLY

Please enter your comment!
Please enter your name here