@pamtbaau:
What’s wrong with copying the stacktrace, removing sensitive data and copy it here with a bit of formatting to make it more readable?
Nothing wrong with that, will do that the next time.
all24.net is currently working.
@pamtbaau:
AFAIK, is_writable is only being called when Grav needs to write to the cache. If the system is completely stable (no changes to configs/pages), Grav wouldn’t need to. So, it seems like something is causing a change.
That is my understanding of caching as well. Interesting side note: After I set up all24.net—and you can see that the whole site has only one word of content—it blew up until it exceeded the virtual server quota. Also a little miracle.
@pamtbaau:
If you disabled cache for 1 site only, as suggested, and only that site remained alive, you would have a clear indication.
I have five sites in production with Grav. I disabled it on two sites and kept the other three with cache enabled. Only all24.net crashed.
@pamtbaau:
Why are your permissions using setuid. Yes, I had to look it up, since I have no idea what it is/does…
I took the permissions from the permission section of the Grav troublesheeting section.
@pamtbaau:
Have you been able to figure out if all sites fail in a certain time window?
At the moment this is totally unclear. I will update all sites to the most recent version of Grav and then start watching them.
@pamtbaau:
I don’t think Grav changes any permissions, so how do they get changed overtime?
- Are you sure there is no cron running?
- Or could you have been infected?
- Do the sites have Admin?
- What if you remove it from all24.net?
I am sure that no cron jobs are running. In fact I would be happy, if the cache clearing would work. 😉 I am also quite sure that my system is not infected in any way, as the Grav sites are the only ones, that had been crashing - everything else works like charm. The sites are running in the user context of the virtual server.
What should I remove from all24.net?
So, what are my next steps? I will
- update all sites to the most recent version and
- try to check the sites on a regular basis, until I see at least one site crashing, and
- provide the stack trace.
Apart from that I will consider commenting the setuid flag from the permissions script.
Thank you for being my travelling companion on this interesting journey. 🤗
Update
After switching the site from FCGId to FPM (all other sites are using FPM) I got Crickey! again:
All sites are now 1.7.7 and all plugins are updated.
I will do an ls -laR for the cache directory, fix permissions like I did the last time (omitting suid flag though) and make an ls -laR again. In case the site should be working afterwards, I hope that the diff will show some hint.
Update 2
Here comes a new phenomenon (sorry): the ownership of the files and folders for all24.net is www-data:all24.net instead of all24.net:all24.net. All other sites' files are properly owned. Geez ...
After running find . -type d -exec chmod 775 {} \; the site was working again. I also removed all setuid flags - no problem.
Diffing the changes before and after setting the directory permissions reveals, that many directories were indeed not writable.
Okay, let's think about something new: As I dislike fixing things manually and producing new inconsistencies (like the directory permissions for the twig directory), I will now remove Grav and re-install it. At least all permissions should be fine then and the ownership should also be fine (unless this specific domain should have some other weird configuration setting).