Home › Forums › Calendar Products › Event Aggregator › How can I stop a hung Events Import job?
- This topic has 38 replies, 3 voices, and was last updated 7 years, 8 months ago by
Mark Evilsizor.
-
AuthorPosts
-
June 5, 2018 at 12:56 pm #1546797
Mark Evilsizor
ParticipantRegarding 1-3, and 5 here is what I did
– Delete all current events (they went into Ignored)
– Delete all events permanently from Ignored
– Delete the scheduled import
– Test one time import with http://www.lindahall.org, http://www.lindahall.org/event, http://www.lindahall.org/events
o it previewed and ingested 3 every time
o but never bringing in media
– Delete the events and repeat the tests with scheduled import
o it previewed and ingested 3 every time
o but never bringing in the mediaSo it appears to me that it is respecting the set date ranges. The 177 import must have had something to do with the prior version, or some lingering effect of my failed early attempts.
Once we get the media working, I am ready to try this again on my production sites.
Thanks,
MarkJune 6, 2018 at 9:01 am #1547496Sky
KeymasterMark,
Great! I’m glad that took care of the date range issues. That just leaves the problem with the images.
I should note that I did see a few different results when checking the header information, after running it a few times. Once, I saw the 200 ok message, another time a permanent 301, and then the 403. I tried to use another online http header checker tool, several others actually, and got different results. But some seem to not support more than the main site url. But this is from the same tool I’ve been using to troubleshoot other image importing issues in the past, so not sure what’s going on.
I would normally look at resource headers in the Chrome Developer Tools, but it seems that it doesn’t work when loading the file in the browser directly, rather than as an asset on a complete webpage.
If I curl it, I get the following message:
curl: (60) SSL certificate problem: self signed certificate in certificate chain
More details here: http://curl.haxx.se/docs/sslcerts.htmlIf I add the flag to ignore the SSL, it does return a 200. If I curl the http url, I get a 301.
To be honest, I’m no expert in this particular aspect of web development, so I’m not sure how to interpret potential issues from the rest of the header information. I ran curl on my personal website, and got the same header results. shrug
When I import from our demo site https://wpshindig.com/events/ I do see the featured images being pulled in. Can you try running an import from that URL and see if the images are getting imported? We will at least know what side of things the issue is on.
Thanks,
SkyJune 6, 2018 at 9:22 am #1547509Mark Evilsizor
ParticipantI would expect you to get 301 status when you try to access http as we are issuing redirects to https.
Our certificates are definitely not self signed, this has been validated by Chrome and other browsers as well as third party checking sites. So for that case I am wondering if the firewall between the box you did the CURL on and our site is setup to do deep packet inspection which would kind of be a man in the middle for the connection and could be using self signed certificate.
I don’t have an explanation for the 403, if you are using a website to test our header and see this, I would love to have the link so that I can look into it.
I did try 3 events from your test site https://wpshindig.com/events/ and the media were ingested properly.
I would be happy to run any debugging tool/setting from your ingest product if that would help your developers understand what it is about our media that is not handled by the TEC ingestion tool.
Mark
June 6, 2018 at 11:48 am #1547753Sky
KeymasterMark,
I ran this by one of my superstar colleagues here, and he is finding this message when trying to import from your site:
PHP Warning: file_get_contents(https://www.lindahall.org/wp-content/uploads/sites/5/2018/03/Shefchik-d-2014-Scotland-002-Killin-042e-345-Day-Six_Mark-c-comp2-e1520260666500.jpg): failed to open stream: HTTP request failed! HTTP/1.1 403 Forbidden
We’ve looked at the records from the Aggregator server, and everything is a-ok on it’s end for the imports you’ve tried.
The consensus seems to be that there is something in your server configuration, perhaps in .htaccess that is preventing the images from being served properly to the client. Unfortunately, there’s not much more I can suggest to look into other than that.
Where is this site being hosted, out of curiosity?
I’ll ask around some more to see if there’s anything specific you could look for in your configuration that could be causing this, and let you know what I find. In the meantime, you might check out your php.ini and .htaccess to see if anything has been added to tighten up security etc that could be causing this.
Thanks for your patience,
SkyJune 6, 2018 at 1:04 pm #1547829Mark Evilsizor
ParticipantI was preparing to make a screencast of the sites I used to review the httpd response headers to try and get at the details of this issue, when think I found the issue. I tried some new tester websites and I found one that gave me the 403 that you experienced. Then by reviewing the Apache log I see what the problem is. The user-agent in the request from the aggregator is blank. And at least one layer of our website security blocks requests when the agent is blank.
Two attached images illustrate. The Apache log image shows my request getting the image, and the Mozilla… user agent, and then the request from the test site which has a blank agent and it is blocked.
The curl image shows the default curl getting the file, then when I force the user agent to blank, the request is denied.
It appears that blank user agent is a sign of bad guys on the internet. Can you set your aggregator tool to have a legitimate value for its user agent in the request?
Mark
June 7, 2018 at 10:34 am #1548679Sky
KeymasterMark,
Great! Glad to hear that you further tracked down the cause of this. I have created a feature request ticket for this. However, the dev team will need to assess it and our next scheduled release is already pretty full. So, no guarantees about if and when this will happen.
Is it possible to temporarily change your configuration until this is looked at?
I am going to change this ticket to “pending fix” and someone will follow back up with you here once this has been a) completed and pushed out in a release, or b) declined as a modification.
Let me know if you have any other questions about this in the meantime.
Thanks,
SkyJune 7, 2018 at 11:54 am #1548754Mark Evilsizor
ParticipantI did a bit of research on whether or not current recommendations are to block blank agent HTTP requests and it looks like this is still a best practice. We have run the site for years and not had a problem with this rule, so I am reluctant to remove a layer of protection because of all the other bad guys it would allow to interact with my site.
Seems like it should be about 1-2 lines of code to put an agent description in the http request for the media.
So I probably can’t use this TEC purchase until this fix makes it in.
Mark
June 7, 2018 at 2:35 pm #1548900Mark Evilsizor
ParticipantThis reply is private.
June 7, 2018 at 2:47 pm #1548911Mark Evilsizor
ParticipantUpdate, I deleted all the events and ran a preview on the existing scheduled event and indeed it wants to get 177 events of which only 3 are after the date set. The attached image illustrates the date parameter, and the past date of the events it wants to bring in.
I clicked cancel from the preview window, deleted the scheduled import, and created a new scheduled import. Now the preview shows 3, and it properly ingested them.
So for some reason, after a few days it wants to bring in historical events rather than respecting the date limit.
Mark
June 8, 2018 at 10:27 am #1549540Sky
KeymasterMark,
There’s nothing more I can do regarding the image headers. The dev team will assess it when they can.
Regarding the import timeframe, I have set up a scheduled import to try. If I run it manually, I only get one event showing up. I will let it run over the weekend, and see what it does.
Thanks,
SkyJune 11, 2018 at 8:52 am #1550765Mark Evilsizor
ParticipantOn my test server, there is no routine activity, so the WP Cron does not run. I just logged into it today which triggered the TEC Aggregator to run. It picked up the past events (177).
How did yours run?
June 11, 2018 at 10:59 am #1550882Sky
KeymasterHi Mark,
Mine did the same thing! I’ve created a bug ticket for this.
Thanks for reporting the issue. Someone will take a look and get back to you when it’s been fixed.
Let me know if you have any other questions in the meantime.
Thanks,
SkyJune 18, 2018 at 2:02 pm #1555698Mark Evilsizor
ParticipantSky the daily import job is running, and it is set to have a 3 month window so that each day it will import any events within the next 3 months from the source site.
From the source site (https://www.lindahall.org/events/ ) you can see that there is a September 13th event. However it has not been imported into the client site (https://www.lindahalllibraryfoundation.org/events/ ) even though the job is running without error.
Can you see if your client tied to my site is doing the same thing?
I am wondering if it is calculating 3 months from the first import date rather than 3 months from today’s date? Or perhaps this is related to the issue of the scheduled runs picking up all the historical events?Thanks,
MarkJune 18, 2018 at 2:52 pm #1555728Mark Evilsizor
ParticipantFollowing up, if I do a one-time import, then the 9/13/2018 event is imported properly.
MarkJune 19, 2018 at 8:50 am #1556204Sky
KeymasterThanks for the additional information Mark.
I removed the scheduled import from my test site, so I’m not sure. While investigating your issue, we found that even our internal tool for inspecting imports that have been processed by our Aggregator server completely breaks when trying to look at these particular imports that are failing for you.
It’s obviously something that occurs only when the event is scheduled. You may need to manually run the import until our engineers can find the problem. We will let you know as soon as we find the issue.
Thanks,
Sky -
AuthorPosts
- The topic ‘How can I stop a hung Events Import job?’ is closed to new replies.
