-
Notifications
You must be signed in to change notification settings - Fork 77
Database support #134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @a18090 Thank you for using HomeGallery and sharing your issues here. And congratulations you are the first person I know who is using 600k media files. That is incredible! I am using only 100k and a friend is using 400k. So you are leading the list! So you report that your database is corrupt? You can rebuild your database via the cli running the importer via The rebuild will rescan your files, extract all meta files and create previews (and skipping it if they already exist) and is rebuilding the database. See internals of the doc for details of the building blocks. Since I do not have experience with 600k that would be the answer for the theory. If it will work on your machine I do not know. Regarding another database backend like mysql etc: The design decision for the database was to not using a standard RDBMS database but load the whole information into the browser to have a snappy user experience. This decision comes with a tradeoff: It can not scale endless. So I am curious how your experience is with 600k media? I would appreciate if you share this information:
Hope that helps |
Hi @xemle 1. How is your general setup (OS or docker) 2.Your main work flow (desktop or mobile) 3.How is the loading time of your database? 4.General user experience doing searches in the UI (are the result shown "fast" enough)? Regarding video conversion, I am wondering if I can try to load the source file directly if it is a LAN environment. The program can only process the exif data. The main problems encountered can be referred to as follows: Thanks again xemle, I don't know JavaScript, I tried AI to help me with database support, but I found it ridiculous since I only know Python programming. |
Hi @a18090 thank you for the detailed answer. Were you able to fix your database issue?
I am surprised that the tag image works but others not. It needs some investigation why it is partially working.
Currently serving original files such as videos are not supported even if the browser could playback the videos. This issue has been addressed several times like #96, #25. Since I develop this gallery for my needs I do have more old non native supported video formats for the browser and like more to separate the original files from the gallery files. Maybe some day there will be a plugin system where it will be easy to extend and customize this functionality.
Please try to reduce the
There is a search term for it:
Have you tried to run
So what is your biggest pain currently with HomeGallery? Can you work with it? Did you tried other self hosted galleries? How do they perform on you large media set? What features do you prefer in HomeGallery? Which features do you like in others? |
Hi @xemle My database doesn't seem to have been repaired, but it's not a serious problem, I'll probably exclude the video files next time I rebuild so it works faster (since I have almost 110,000 videos) haha. I am curious about the two entry methods of "year" and "tags", but I will try these two pages again the next time I re-import, and then I will take a look at the log and chrome responses. The video problem is not very serious. After all, I will reduce the bit rate and preset when converting to increase the conversion speed. I'm going to try reducing api.server.concurrent to 1 and test again. I like HomeGallery's face search and similar image search functions. These two functions are very helpful. I will try to restart HomeGallery again in the next days. This time I may regularly back up the database file to reduce the risk. I've tried other apps and there are some issues. I feel like there may be a database issue on several of the self-hosted galleries I've used. |
Hi @xemle I was rebuilding the database and I tested directly The server memory has 64G, and 18G is currently used. |
Hi @a18090 Thank you for your report. From the logs I can not say a lot. The rebuild command I can see that the You can try to set 8 GB in your gallery config for the database creation. Maybe it helps
To fix your side I need further investigation of node's memory management and need to check what can be improved to keep the memory consumption low. |
Hi @xemle thank you for your reply,
It seems to be a problem during operation? I try During operation, the cached data enters the memory directly and is not written to database.db.
gallery.config.yml
|
Hi @a18090 Thank you for your logs. It seams that the memory consumption on the heap grows to much at the end of your logs. As I wrote earlier: I can not explain currently why the high consumption is required. Currently the time to investigate for such case is also very limited since I am using my spare time to work on the plugin system to open the gallery to custom functions. I am very sorry but I can not help here currently. I keep it in my mind because it bugs me that the consumption is that high while in theory it should be slim. |
I did some analysis regarding memory consumption. IMHO I was not able to detect a memory leak or greater memory issue. Except that the database building process needs much memory since it loads all the data into memory. My database with 100k requires 200 MB uncompressed JSON data and the database creation succeeds with 750 MB heap size. The process can be optimized in a stream way which should require less memory since it does not need to load the whole database into the memory while creating the database. This would enable the building of larger galleries with more than 400k images on less memory. The current workaround is to provide more heap space to the database building process. Still on the server and client side the gallery needs to load the database into the memory but I guess it is more efficient since it is read only. @a18090 Is this memory issue still relevant for you? |
Hi @xemle Memory is not very critical for me, because my server has 128G memory, and computers and mobile phones generally have 12-16G. Most of the time it is fine. When using Chrome to access it on the PC, I noticed that the stack will be very large, but I recently encountered a problem that prevented me from using home-gallery recently. My database.db will crash directly after running for a while, causing database.db to be unable to read, as shown below
db.bak is the database.db I backed up some time ago.
I will resume running the test again after the update, and I will clear the log and pay attention to |
Hi @a18090 Thank you for the update. I guess you identified an bug and it seems that the bug is quite old and my tests were not covering it. I will provide an fix in the next days. So the error |
Hi @xemle I found an interesting situation when I ran it again. When I open the browser with (Laptop) Chrome, if I enter the year tab, it will enter a waiting state if it is waiting for loading. This is the command output log.
If I use Chrome on a desktop computer to access the site, this problem will not occur. There is no continuous log output, clicks are normal, and Chrome will not lose response. Chrome I suspect this may be a page problem. I'll try to check it out and see if I can help you (of course I'm not sure if this is possible, haha) |
Hi @a18090 thank you for your update. From your latest console logs I read that the database can be loaded with 259386 entries which sound good. A friend of mine has 400k image which should work, too. Regarding your previous error
I found the issue. The error happens when the database could not be read. Unfortunately the error message with the cause is swallowed by this bug so I can not tell why the database can not be read. A following fix will will change that. My best guess is that the database can not be read due memory issues on the server component. Especially while you importing a new and even larger database. Than the server has to keep 2 large version in memory for a short time of period. This issue could be fixed with the environment variable I doubt that the database itself is corrupt because on database creation the new database is written to a temporary file which is than renamed to the target database filename. This is a usual way to provide a kind of atomic file creation. The rename should only happen on no error cases. Would you mind to check your database with |
Hi @xemle I've removed the problematic database and restored the backup, then rebuilt the database and now by But that number doesn't seem right, I looked for JPGs by listing the files and there exists about 462626 files The full file count with video is roughly 597817 |
Can you check also the file index with |
(base) root@cdn:/data/ssd/glh/config# zcat tdl.idx | jq .data[].filename | wc -l (base) root@cdn:/data/ssd/glh/config# zcat database.db | jq .data[].id | wc -l (base) root@cdn:/data/hdd/tdl# find . |wc -l (base) root@cdn:/data/hdd/tdl# find . -type f -name *.jpg |wc -l |
Hi @a18090 thank you for the numbers. To summarize
The diff between jpg files and indexed files is that the gallery also index other files like meta files The more important question is why there is a big gab between media files and index files since there should not be so much meta files. My best guess is that the import was done in multiple steps. Currently the algorithm does not recover correctly the files after the input process is restarted (Did not find time/interest do implement it yet). Therefore it is recommended to rerun the full import after all media files have been processed. Would you mind to rerun a full import via |
Thanks, no problem, I'll try it. |
Hi @a18090 I've updated the master with a stream based database creation. This should be less memory demanding and your 400.000 images should work better to read and to update. Please try out if your have any issues with the newest version |
Hi @xemle This is resource occupancy I noticed an error occurred
I will also try multi-platform testing later Thanks for your efforts
|
I am in the same boat, already reaching 600k files (out of 2 million images) and I am already seeing the slowness. The impact is huge on the browser, it had no issues with less than 300k images but as soon as I hit 500k it started slowing down. I am using the latest version and I am seriously considering setting the database into a database engine or sending the browser only the information it needs. I keep seeing the browser update the positioning of the recently loaded images and slowing down more and more |
So, my numbers:
I also excluded mkv and mp4 files because it was clogging my cache. I would love to set those to use the file instead of transcoding it. But alas, I am in a privileged position to test it with a lot of files. :) |
Hi @kryztoval Thank you for your input and your numbers. Your numbers are the highest so far reported. Congratulations! You rule the board. As stated in #134 (comment) the memory consumption of database creation was improved. Now I pushed an improvement of offline database to the master. However the architecture and database design decision of the gallery was not to serve more than my own collection of 100K media entries. My original assumption was that all required data can be load into the browser so that the filtering and sorting of the media can be done quickly within the browser. Server request and a server based database and database scaling was out of scope. At the end currently the server requires to read the whole database file and keep and serve it in memory... So your 5.3M photos outnumbers my assumption and experience by factor of 50. This is a lot. It is very tricky to provide advises or even solutions if 5M images can be handled by the gallery or not. So I would like to return to the start, want to know what kind of problem do you like to solve and how does HomeGallery fits in your needs? Maybe there is a unseen possibility to customize it and strip down data. E.g. the AI features of face, object and similarity detection require a lot of bytes. By disabling them the overall performance should be improving. But I do not know if this is an option... @kryztoval I guess you were testing different open source web galleries out there, too? How do they perform on your large media directory? |
@xemle Yes, indeed. I understand. And yeah the memory consumption of the database was improved a lot, I did see that. I do not really require that much, the things that home-gallery did incredible was its speed loading images. I would love it if it could play the original video without converting it as well, but that is it. I tried several solutions and even thought of making my own but I always struggled with the interface part and your project nailed the interface almost to a T. I used the tags to mark actions on the filesystem, allowing me to delete files directly and removing them from the list of images with ease. I wanted it to be in a database so I could index the files myself. I will try and take a look at the interface and see if I can make it use a database instead of loading fully from memory. It sounds like a great challenge! I will keep on the lookout for other galleries too. Thank you so much! :) |
Hi @kryztoval Thank you for your response and thank you for your positive feedback about the gallery. Regarding the database: The general challenge is that a query should be performed fast. To be fast, it should be in a fast memory. Depending on the data size there will be always the threshold where the data can not be handled by a single machine due the limitation of memory. If it does not fit into the memory the data needs to be distributed to multiple machines. HomeGallery serves a list of private images and the assumption is that the amount of private images is limited. Therefore the data should fit into the memory. Further the assumption is that the amount of images is so "small" that the data can be loaded into the browser memory so that the database queries can be performed in the browser. Even complex queries with different boolean expression and similarity vector comparisons. The numbers of my setup are:
For larger image sets the question is: Does this setup scale without changing the basic setup? It will scale up to a specific factor. My gut feeling is that it could scale to the factor of 4. Factor 10 is critical. Your factor is 50 and I would be impressed if the current setup could handle that. My assumption it wont. I do not know if a database like sqlite or postgres on a single host can handle the current query complexity in a recent time. So what are the options IMHO: Reduce the data per entry A database entry can be reduced to a bare minimum with
This sums up to 200 bytes. This is a factor of 10. The query is limited to mainly date and tags. No similarity. The question here if the query times as sufficient in the browser? Currently a query takes up to 200ms. Factor 10 would be 2 sec in worst case. The other option is to move the database handling to the server. Does the current setup can handle a database with 5 M entries in a recent time? I doubt. The complexity to move the data to a standard database like sqlite or postgres is high. It turns the current architecture up side down, introduces new dependencies and both code paths needs to be supported (a 5 M database is more an edge case then the average for a private image gallery, so I would keep the browser based database feature) One option could be to split the database into different gallery instances. NodeJS is single threaded while current systems provide a few CPU cores. Splitting the gallery instances can be done by directories. So you can keep the query complexity and the server component already supports the same query language. So the query needs to be split to multiple instances and the result needs to be merged in one place. The merge needs to evaluate the sort order again since in the current result some order keys are not transported in the result like the similarity result. Depending on the performance the gallery can be distributed to different machines if the performance becomes poor. Split your database to multiple instances sounds promising to me and requires following changes
Maybe introduce a search result paging which includes only some result but provides meta data like the total matching amount so that the view can simulate the "height" of the result page What do you think? Which solution do you have in mind? You said that your want to implement things by your own: How good are your programming skills? |
I looked at the structure of your database and I see that you are using json, so basically this is a document database, as such I think the fastest way to put this into a database for me would be to use mongo or couchdb because they would not need to be translated at all. I also took a look at the queries made by the browser to the api node and I think I could make the api node gather the information from the document database too. mongodb and couchdb have the added benefit of being shardable and distributed in a semi-transparent way to the client. I think that for a single person a local instance of mongodb would be way more efficient than having all the information in memory. I did a little project a few months back where I used static html files to interact with a restful service that forwarder queries either to the original API or gathered data from the mongodb records, that mongo server right now contains around 73807067 documents just in one document store and it responds at a decent pace. I was thinking that implementing this in the api server would be a good place to leave the logic. My programming skills are ok, but your gallery has a lot of moving parts. It would take me a bit to get familiar with all the things it does. |
@xemle I kept playing with it, I ran the database without the images and... even at over 1 million entries it did not halt the browser. It seems that the thing slowing it down is the amount of loaded images. I am not sure how to tell react that once an image goes out of the IntersectionObserver that its src should be changed to a transparent gif or some other small image so the browser can clear that image memory to test out if that improves the situation. the intersectionobserver could be 1 extra display up and 1 extra display down and it would still have enough freed images. I do not think the database is the reason for this stutter in the browser now. |
Awesome that your are trying different things. A nosql database like MongoDB or CouchDB sounds promising. Please keep me updated.
HomeGallery uses a virtual scrolling. So only the images in the current view are rendered. You can check the DOM in the browser via the developer tools. Maybe these tools would also highlight performance bottle necks.
If so I am happy to help to solve this bottleneck... If you continue working on this topic, I would also suggest to switch this discussion to discord. We can have also sessions via the voice channel. |
Yes, I saw that, but the object seems to be still referenced, or cached. Because the memory used by the tab is over 1.5GB for some reason. And when it gets a bit over Chrome will start having issues rendering anything. Literally, even requests will not get queued. So I was thinking it would be worth a try to set the image to a transparent image to see if the browser can free it up earlier.
Ok, I will see you over there. :) |
Many thanks for providing this project,
I encountered a very difficult problem when using it, that is, the amount of data is too large, about 600,000 images and videos, and the local database file frequently presses CTRL + C. After re-running, it will prompt me that the database is damaged.
This is a nightmare, I'm wondering if I can put the database into mysql or MariaDB so that when I rerun it it won't affect my data.
I judged that I might have been interrupted while he was writing, so this problem occurred, but I have so many images that it may take a week to process each time.
Thanks again
The text was updated successfully, but these errors were encountered: