404-errors in logs for future dates

Home Forums Calendar Products Events Calendar PRO 404-errors in logs for future dates

Viewing 11 posts - 1 through 11 (of 11 total)
  • Author
    Posts
  • #29717
    Ken
    Participant

    I have been noticing a bunch of 404 errors lately for what appears to be someone or somebot trying to access events in the future (like 2034). Should I just add a line to the robots.txt file so that events are not crawled?

    I also noticed that there is a few errors generated when pagination occurs and then the events are no longer there so then the pagination is no longer relevant. For instance we try to schedule our events 3 months out. then when a list of our events is generated there are about 9 pages of 10 events each. As the month ends and there are 2 months left the pagination of the events goes down and maybe only paginates 7 times and then there are 404 errors for indexed pages for 8,9 & 10. what do you suggest then?

    #29734
    Barry
    Member

    Should I just add a line to the robots.txt file so that events are not crawled?

    That’s totally up to you, I’d hope that if a 404 header is being returned then the search engine or whatever is doing the crawling would stop trying to access them.

    I’m not really sure there is anything we can do to help with that, or that it is really a problem with our plugin as such – but we’re totally open to any feedback or suggestions you yourself might have on this one 🙂

    As the month ends and there are 2 months left the pagination of the events goes down and maybe only paginates 7 times and then there are 404 errors for indexed pages for 8,9 & 10.

    I visited the URL you posted when you created this thread but couldn’t find an example of this behaviour – could you provide a specific URL for me to look at?

    Thanks!

    #29742
    Ken
    Participant

    For the future dates I’m not sure what to do. I was hoping there where others that might have be experiencing the same thing.

    As for a URL that has the pagination (http://www.financialtools.com/events/category/web-based-training/)
    When the events pass the pagination will get shorter..until we add more events back in again.
    Maybe it would be best to just block bots from the events?

    #29743
    Barry
    Member

    Oh I see, sorry I completely misunderstood you first time round 🙂

    Yes that is going to happen, what sort of system would you prefer – your concern is that …/page/7 will be indexed but then cease to be accessible, right?

    #29744
    Ken
    Participant

    That is correct. What would be the best practice for eliminating the errors?

    #29762
    Barry
    Member

    That’s a good question and I’m not sure off the top of my head, so bear with me while I check in with the team and gather some thoughts.

    There was a thread a month ago or so which was similar (I seem to remember they wanted to stop Google from viewing past events in that particular case, I can’t remember why) and possible solutions including returning different HTTP status codes such as moved permanently or just a 404 not found as discussed above.

    I wonder also if providing a canonical link to the main calendar page would be viable here.

    My own feeling here is that this may simply be an occasion where you have to do what’s best for the human (with sensible pagination) not the search engine, but I will check in and see if there are any other thoughts.

    #29763
    Barry
    Member

    Ken, I think the feeling here is that if this is an issue you feel really needs to be addressed then you would be best to catch those requests for non-existent upcoming event pages and change them into 301 redirects back to the calendar page.

    #29786
    Ken
    Participant

    The thing is that they are not non-existent events:
    66.249.76.214 – – [19/Dec/2012:10:17:54 -0800] “GET /events/2203-02-23/ HTTP/1.1” 200 22737 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
    66.249.76.214 – – [19/Dec/2012:10:24:01 -0800] “GET /events/2203-02-24/ HTTP/1.1” 200 22737 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
    66.249.76.214 – – [19/Dec/2012:10:26:06 -0800] “GET /events/1961-11-12/ HTTP/1.1” 200 22737 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
    66.249.76.214 – – [19/Dec/2012:10:26:08 -0800] “GET /events/1961-11-16/ HTTP/1.1” 200 22737 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
    66.249.76.214 – – [19/Dec/2012:10:26:09 -0800] “GET /events/1961-11-07/ HTTP/1.1” 200 22737 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
    66.249.76.214 – – [19/Dec/2012:10:26:11 -0800] “GET /events/1961-11-21/ HTTP/1.1” 200 22737 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”
    66.249.76.106 – – [19/Dec/2012:10:26:12 -0800] “GET /events/2008-01-10/ HTTP/1.1” 200 19077 “-” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”

    #29790
    Barry
    Member

    Right; I mean it could be that Google is figuring out that the URLs map to dates and is intelligently trying to determine the date range since presumably nothing is linking to those URLs.

    I can’t imagine it is indexing non-existent content, though, so is there a real problem here?

    #29803
    Ken
    Participant

    Oh I see..its returning the 200 so it’s finding the pages just not indexing them hopefully. I guess there is not ‘real’ problem then. The main issue is just the pagination (i’m just a little too OCD I guess trying to eliminate all the errors that I can)

    #29865
    Barry
    Member

    Hmm, could be sure I replied to you yesterday yet it doesn’t seem to be showing up here.

    I’m not really sure what to suggest for pagination links, it seems to me that it’s a case where you have to choose between servicing the human or the bot. Perhaps you could make them no-follow links?

Viewing 11 posts - 1 through 11 (of 11 total)
  • The topic ‘404-errors in logs for future dates’ is closed to new replies.