Over 1 Million lines of HTML cached in options table!

Home Forums Calendar Products Events Calendar PRO Over 1 Million lines of HTML cached in options table!

Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
    Posts
  • #999300
    Dennis Freeze
    Participant

    Hey, guys. I started using the new caching option recently because my client’s site has a lot of events. Last night, I was just checking out their site in general, and noticed that the wp_options table was over 1 gigabyte in size. I exported the table as CSV from phpMyAdmin, and opened it up in Excel. (Did you know Excel stops at 1,048,5776 rows? Now I do.)

    After the usual stuff in the table, it started into cached HTML events — literally a million lines of HTML cached in there, and that’s only part of the exported file. And I do mean Million, as in 1,000,000-plus lines of cached HTML.

    I see you’re using WP Transients, but I can’t imagine it could possibly be saving over a million lines of HTML at a given time…. What’s going on here? I can give you a dump of the table if you care to see the gory details….

    Thanks,
    Dennis

    #999391
    Brook
    Participant

    Wow! That is a lot records.

    I can actually see how this would happen with Month Caching enabled (setting found in WP Admin > Events > Display). Basically when you enable that, the HTML for a given month is stored as a WP Transient. When someone tries to view that page, the server only needs 1 DB query for the HTML instead of the 30+ that are unfortunately required to render Month view. But, if a bot (computer) was crawling your website it could attempt loading every month from now until the year 85000 AD, one million months in the future. In such a scenario you would indeed get 1 million transients in your database :-/.

    In short, a computer or bot has likely crawled over a million pages, causing over a million months worth of cached pages to appear as transients. Did you try to spider your website with a bot? Typically search engines would not try anything close to a million months, especially when our robots meta tags tell them not to index empty month view pages. So, this is likely the result of a manual spidering. If you didn’t do it someone else might have. It might be worth stopping them with an IP block, that much crawling is sometimes the sign of a hacking attempt. If your server logs show a million+ requests from an IP you don’t recognize, a block is probably appropriate.

    How to resolve this

    First of all, I would make a backup of your database. Always a good idea before dabbling. Then, I would clear the transients using a strategy like this. That will shrink your table back down to a reasonable size.

    If you are worried about another bot crawling your site and causing the problem to resurface, then you could disable month view caching (WP-Admin > Events > Settings > Display).

    Finally, I am going to check with our developers and see if it makes sense to limit how far into the future/past we cache Month views for. Typically pages in the distant past or future don’t see a lot of users, so for the majority of people this will likely help in preventing this. That might be a good improvement we could plan.

    Does that all make sense? Will that work for you? Please let us know.

    Cheers!

    – Brook

    #999396
    Dennis Freeze
    Participant

    Brook,

    OK, that makes sense. I’ve been using WordFence to block malicious logins and stuff like that, and to somewhat throttle non-Googlebots. Blocking the bots seems like a good idea, although it’s always a shifting target. Several other sites of mine have been recently subjected to merciless pounding with bad login attempts, and they use their botnets to keep coming at you. It hasn’t been quite as bad with this site, but there’s a lot of bad traffic.

    Limiting the future-month requests seems like a really good idea — I can’t imagine limiting that to a couple of years would impact more than a few sites, and then it wouldn’t be a big deal. I vote for it!

    So, your message seems to be missing a link or something: “Then, I would clear the transients using a strategy like this. That will shrink your table back down to a reasonable size.” I’m thinking there was something between those sentences, not the old “left as an exercise for the reader” ploy. 🙂

    I really hope I don’t have to drop the caching — we have a lot of events on this site, and a lot of other custom code, so everything that helps speed is good. I’m going to put in one of the caching plugins soon — I guess I could disable your cache then, right?

    Thanks,
    Dennis

    #999912
    Brook
    Participant

    Thanks for your understanding Dennis! We definitely did not design this with the notion that it would pack a database with a gigabyte of data. But now that the need has arisen, I will talk to our devs about adding a throttle.

    So, your message seems to be missing a link or something: “Then, I would clear the transients using a strategy like this. That will shrink your table back down to a reasonable size.” I’m thinking there was something between those sentences, not the old “left as an exercise for the reader” ploy.

    Ahh yes. It was a link, but those don’t show up in the email transcripts. Here ‘s the link: http://stackoverflow.com/questions/10422574/can-i-remove-transients-in-the-wp-options-table-of-my-wordpress-install

    I really hope I don’t have to drop the caching — we have a lot of events on this site, and a lot of other custom code, so everything that helps speed is good. I’m going to put in one of the caching plugins soon — I guess I could disable your cache then, right?

    Installing a different caching solution will render ours moot. That’s why the caching is a toggle on our end. Most caching plugins do appear to be fully compatible with our calendar and its ajax requests. A lot of our users use W3 Total. But, it does have at least one setting which is incompatible: Minifying JS/CSS assets. Our assets are already minified, and minifiers often break things when run twice. As a bonus, this caching will will speed up other areas of the site as well. However, you might still find you have a giant cache if a lot of bots are hitting your site, causing the server to cache all of those pages. But, typically those caches are not in the DB they are stored as HTML files.

    Please let me know if you have any more questions. Thanks again for bringing this to our attention, it’s definitely something we will fix for future users. But, in the mean time hopefully a caching plugin will prove a good workaround for you.

    Cheers!

    – Brook

     

    #1000091
    Dennis Freeze
    Participant

    Brook,

    I haven’t used the WP Transients API, and I’m really surprised that it uses the options table instead of dumping things into a separate table. WordPress really does not like having that table get so large: I couldn’t even see the /wp-admin/options.php page — it was just blank. And I kept getting random white screens after updating things in the backend, although refreshing the page always restored it. A bit scary, though. All back to normal now….

    Thanks for the caching tip about double-minified assets, and for sticking with me to figure this out.

    Regards,
    Dennis

    #1000785
    Brook
    Participant

    You’re very welcome Dennis. And again, I apologize for the inconvenience. Sometimes you just don’t think about things like “what will happen if someone visits every month between now and 85000 AD?”

    I can understand your server not liking such an enormous table, and it impacting basic WordPress functions. For whatever reason that’s how WordPress stores them. I agree, it’s a bit odd. Does clearing the transients as outlined in that link help fix it for now?

    • Brook
    #1005257
    Support Droid
    Keymaster

    This topic has not been active for quite some time and will now be closed.

    If you still need assistance please simply open a new topic (linking to this one if necessary)
    and one of the team will be only too happy to help.

    #1130617
    Geoff
    Member

    Hey there,

    Just want to drop a quick note in here to let you know that The Events Calendar 4.2.1 is about to release any moment and it will include a patch for this issue. Please update and let us know if you continue to hit any trouble.

    Thanks so much for your patience while we worked on this!

    Cheers,
    Geoff

Viewing 8 posts - 1 through 8 (of 8 total)
  • The topic ‘Over 1 Million lines of HTML cached in options table!’ is closed to new replies.