You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| `GO_TAGS` | | Go tags. Available: `stablediffusion`|
371
371
| `HUGGINGFACEHUB_API_TOKEN` | | Special token for interacting with HuggingFace Inference API, required only when using the `langchain-huggingface` backend |
372
-
| `EXTRA_BACKENDS` | | A space separated list of backends to prepare. For example `EXTRA_BACKENDS="backend/python/diffusers backend/python/transformers"` prepares the conda environment on start |
372
+
| `EXTRA_BACKENDS` | | A space separated list of backends to prepare. For example `EXTRA_BACKENDS="backend/python/diffusers backend/python/transformers"` prepares the python environment on start |
373
373
| `DISABLE_AUTODETECT` | `false` | Disable autodetect of CPU flagset on start |
374
374
| `LLAMACPP_GRPC_SERVERS` | | A list of llama.cpp workers to distribute the workload. For example `LLAMACPP_GRPC_SERVERS="address1:port,address2:port"` |
375
375
@@ -475,15 +475,15 @@ If you wish to build a custom container image with extra backends, you can use t
475
475
```Dockerfile
476
476
FROM quay.io/go-skynet/local-ai:master-ffmpeg-core
477
477
478
-
RUN PATH=$PATH:/opt/conda/bin make -C backend/python/diffusers
478
+
RUN make -C backend/python/diffusers
479
479
```
480
480
481
481
Remember also to set the `EXTERNAL_GRPC_BACKENDS` environment variable (or `--external-grpc-backends` as CLI flag) to point to the backends you are using (`EXTERNAL_GRPC_BACKENDS="backend_name:/path/to/backend"`), for example with diffusers:
482
482
483
483
```Dockerfile
484
484
FROM quay.io/go-skynet/local-ai:master-ffmpeg-core
485
485
486
-
RUN PATH=$PATH:/opt/conda/bin make -C backend/python/diffusers
@@ -525,3 +525,8 @@ A list of the environment variable that tweaks parallelism is the following:
525
525
526
526
Note that, for llama.cpp you need to set accordingly `LLAMACPP_PARALLEL` to the number of parallel processes your GPU/CPU can handle. For python-based backends (like vLLM) you can set `PYTHON_GRPC_MAX_WORKERS` to the number of parallel requests.
527
527
528
+
### Disable CPU flagset auto detection in llama.cpp
529
+
530
+
LocalAI will automatically discover the CPU flagset available in your host and will use the most optimized version of the backends.
531
+
532
+
If you want to disable this behavior, you can set `DISABLE_AUTODETECT` to `true` in the environment variables.
0 commit comments