The Traders' Den  

  The Traders' Den > Where we go to learn ..... > Technobabble
 
Home Forums FAQ Register Members List

Notices

Technobabble Post your general Need for Help questions here.
Lossy or Lossless?
Moderators

Reply
 
Thread Tools
  #1  
Old 2024-09-13, 09:12 AM
mjcrossuk mjcrossuk is offline
400.98 GB/1.57 TB/4.00
 
Join Date: Mar 2007
Any good Python coders here? Especially with regard to web scraping?

I'm trying to scrape the torrent details from TTD to put into a database which I would make available online, in the event that TTD closes down.

Please see this thread for more details: http://www.thetradersden.org/forums/...d.php?t=202461

I've not used the requests package before, and I'm having problems.

I'm starting from this page: http://www.thetradersden.org/forums/....php/f-11.html, then going into the various list pages for some of the categories (eg Audio, Audio Inactive, Audio Pulled). The next level pages list torrents, 250 per page, each with a link to the torrent detail thread eg http://www.thetradersden.org/forums/....php/f-12.html is page 1 of 147 listing Active Audio torrents.

Sometimes my scraping code retrieves an Index page, but other times I get a Status of 200 from the request, but an empty response.

Trying to retrieve a torrent thread, eg http://www.thetradersden.org/forums/.../t-203252.html, always give a 200 status and an empty response.

If the request fails, I should get a non-200 status code, but I don't.

Could it be authentication? Caching? Is the TTD backend blocking scraping of torrent detail threads?

TIA

I realise that web scraping these pages may not be the best way to get the info; a better way would be a dump/extract of the backend database(s). If the scraping does work, I would be mindful of NOT scraping 100,000+ pages quickly.
Reply With Quote Reply with Nested Quotes
Reply

The Traders' Den > Where we go to learn ..... > Technobabble

Tags
archive, python, ttd

Similar Threads
Thread Forum Replies Last Post
Reseed of Monty Python's Hastily Cobbled Together - krokodyle Seeding Talk - ISO Requests 0 2009-03-13 05:31 PM



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forums


All times are GMT -5. The time now is 11:45 AM.


Powered by: vBulletin, Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004 - , TheTradersDen.org - All Rights Reserved - Hosted at QuickPacket