12

Benefits of using scrapy over requests/selenium (lemmy.world)

submitted 1 year ago by IceSea@lemmy.world to c/python@programming.dev

3 comments fedilink hide all child comments

When I'm writing webscrapers I mostly just pivot between selenium (because the website is too "fancy" and definitely needs a browser) and pure requests calls (both in conjunction with bs4).

But when reading about scrapers, scrapy is often the first mentioned Python package. What am I missing out on if I'm not using it?

you are viewing a single comment's thread
view the rest of the comments

[-] Wats0ns@programming.dev 2 points 1 year ago

The huge feature of scrapy is it's pipelining system: you scrape a page, pass it to the filtering part, then to the deduplication part, then to the DB and so on

Hugely useful when you're scraping and extraction data, I reckon if you're only extracting raw pages then it's less useful I guess

[-] qwertyasdef@programming.dev 1 points 1 year ago

Oh shit that sounds useful. I just did a project where I implemented a custom stream class to chain together calls to requests and beautifulsoup.

[-] Wats0ns@programming.dev 2 points 1 year ago

Yep try scrapy. And also it handles for you the concurrency of your pipelines items, configuration for every part,...

this post was submitted on 18 Jul 2023

12 points (92.9% liked)

Python

6187 readers

1 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

October 2023

PyConES Canarias 2023, 6-8th
DjangoCon US 2023, 16-20th (!django 💬)

November 2023

PyCon Ireland 2023, 11-12th
PyData Tel Aviv 2023 14th

Past

July 2023

PyDelhi Meetup, 2nd
PyCon Israel, 4-5th
DFW Pythoneers, 6th
Django Girls Abraka, 6-7th
SciPy 2023 10-16th, Austin
IndyPy, 11th
Leipzig Python User Group, 11th
Austin Python, 12th
EuroPython 2023, 17-23rd
Austin Python: Evening of Coding, 18th
PyHEP.dev 2023 - "Python in HEP" Developer's Workshop, 25th

August 2023

PyLadies Dublin, 15th
EuroSciPy 2023, 14-18th

September 2023

PyData Amsterdam, 14-16th
PyCon UK, 22nd - 25th

🐍 Python project:

💓 Python Community:

#python IRC for general questions
#python-dev IRC for CPython developers
PySlackers Slack channel
Python Discord server
Python Weekly newsletters
Mailing lists
Forum

✨ Python Ecosystem:

🌌 Fediverse

Communities

#python on Mastodon
c/django on programming.dev
c/pythorhead on lemmy.dbzer0.com

Projects

Pythörhead: a Python library for interacting with Lemmy
Plemmy: a Python package for accessing the Lemmy API
pylemmy pylemmy enables simple access to Lemmy's API with Python
mastodon.py, a Python wrapper for the Mastodon API

Feeds

founded 1 year ago

MODERATORS

erlingur@programming.dev

jnovinger@programming.dev