|
Technobabble Post your general Need for Help questions here.
• Lossy or Lossless? Moderators |
|
Thread Tools |
#1
|
|||
|
|||
Any good Python coders here? Especially with regard to web scraping?
I'm trying to scrape the torrent details from TTD to put into a database which I would make available online, in the event that TTD closes down.
Please see this thread for more details: http://www.thetradersden.org/forums/...d.php?t=202461 I've not used the requests package before, and I'm having problems. I'm starting from this page: http://www.thetradersden.org/forums/....php/f-11.html, then going into the various list pages for some of the categories (eg Audio, Audio Inactive, Audio Pulled). The next level pages list torrents, 250 per page, each with a link to the torrent detail thread eg http://www.thetradersden.org/forums/....php/f-12.html is page 1 of 147 listing Active Audio torrents. Sometimes my scraping code retrieves an Index page, but other times I get a Status of 200 from the request, but an empty response. Trying to retrieve a torrent thread, eg http://www.thetradersden.org/forums/.../t-203252.html, always give a 200 status and an empty response. If the request fails, I should get a non-200 status code, but I don't. Could it be authentication? Caching? Is the TTD backend blocking scraping of torrent detail threads? TIA I realise that web scraping these pages may not be the best way to get the info; a better way would be a dump/extract of the backend database(s). If the scraping does work, I would be mindful of NOT scraping 100,000+ pages quickly. The following members like this post: PanTau, Mr. Clumpy
|
#2
|
|||
|
|||
Re: Any good Python coders here? Especially with regard to web scraping?
So what's happening is your scraper is trying to grab the page data before it's loaded because there's a js file that loaded before the content.
For example: http://www.thetradersden.org/forums/.../t-203252.html If you pull up your dev tools in your browser->console-> this is the error: Quote:
Quote:
Quote:
https://selenium-python.readthedocs.io/waits.html No members have liked this post.
|
#3
|
|||
|
|||
Re: Any good Python coders here? Especially with regard to web scraping?
Thanks so much for your reply.
I will study what you've suggested, and see what happens. Would you be willing to engage in a direct conversation via email? all the best, Mike No members have liked this post.
|
#4
|
|||
|
|||
Re: Any good Python coders here? Especially with regard to web scraping?
Yes, actually after I closed the browser (i'm at work atm) I realized that I could probably just scrape it all tonight for you.
I will DM you my email. The following members like this post: Mr. Clumpy
|
The Traders' Den |
Tags |
archive, python, ttd |
Similar Threads | ||||
Thread | Forum | Replies | Last Post | |
Reseed of Monty Python's Hastily Cobbled Together - krokodyle | Seeding Talk - ISO Requests | 0 | 2009-03-13 06:31 PM |
|
|