this post was submitted on 17 Jun 2023
124 points (99.2% liked)
Lemmy.World Announcements
29048 readers
4 users here now
This Community is intended for posts about the Lemmy.world server by the admins.
Follow us for server news ๐
Outages ๐ฅ
https://status.lemmy.world
For support with issues at Lemmy.world, go to the Lemmy.world Support community.
Support e-mail
Any support requests are best sent to info@lemmy.world e-mail.
Report contact
- DM https://lemmy.world/u/lwreport
- Email report@lemmy.world (PGP Supported)
Donations ๐
If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.
If you can, please use / switch to Ko-Fi, it has the lowest fees for us
Join the team
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I heard that reddit has a dedicated cdn each for Microsoft and Google scraping. That's why they work so well to search reddit posts. It will probably take some effort to feed data so we'll from the fediverse.
On that note, perhaps we should have some per-community as well as per-post scrape/noscrape toggle. Might be difficult to get buy-in from all parties.
Whether a community gets to opt out of being scraped depends on the scraper respecting robots.txt and/or the meta tag of the page.
Not all do, particularly the ones scraping for SEO purposes, so instances might to add IP bans for scrapers that refuse to respect restrictions in those places.