this post was submitted on 23 Jul 2023
61 points (98.4% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

54565 readers
473 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

founded 1 year ago
MODERATORS
 

(As part of the Reddit migration, any time I'm only able to find info on Reddit, I'm reposting it to kbin/Lemmy.)

TL;DR - To get the page's OCR text from Newspapers.com, replace /image/ with /newspage/ in the url with the thumbnail.

EDIT: @godless Pointed out that some libraries have access to Newspapers.com through a Library Edition portal. My local library has several newspaper archives, and I figured the first couple would be the most complete. Nope, but there was Newspapers.com Library Edition access buried under the fold. That worked!

Bonus tip - Also search for current info of close family members. The spokeo hit was due to searching his mother's name, and spokeo is too dumb to understand that deceased people don't move with their families to future homes. It treated his records like he was living ("Current" address, phone numbers, etc were listed, even though they were for his sister, who's still alive).

And here's my rant/vent/story...

I was looking for an obituary in that nebulous early 90's time period where only some info is digitized. Hi s family's having a memorial for him next week and I was hoping to bring a pic of the newspaper from his birthday and deathday, along with the obit. I had a general idea of the date of death, knew the city and funeral home, and his name minus middle initial. Sites like legacy.com refused to return a match. Even the state and county records sites were useless.

After a couple hours, I had only 2 partial hits. Bing Chat (yeah, I was surprised, too) said it found the obit, but it was locked behind a paywall. The newspaper that had it (which I checked earlier) said nothing was there. It appears that the obits are available going back to 2004. Dates before that were supposedly available in the paper's archive. The archive was 404. Or, rather, the entire domain was 404.

The second hit was on spokeo - one of those obnoxious sites that gives partial info and then wants you to subscribe to 3 different levels of services. But, from there I got his middle initial and the exact birthday and death date. That info helped.

I eventually made it to Newspapers.com, which threw up a paywall, but indicated it had the info. I did the usual checking the source and css, reader mode, incognito, etc. It was clear that the image was probably there, judging by the css. Nope. The only info I could find on getting through that barrier was on Reddit. It doesn't lead to the paper image, but the OCR text. Just replace /image/ with /newspage/ in the url with the thumbnail.

Good. It existed and was exactly where I was expecting through the whole search. Now to get the paper image that the text was extracted from... nope. Gotta sign up.

One last thing to try again, since Newspapers.com gave me the exact PAGE NUMBER.

I tried looking into the archives of the paper available in the library's database. It appears most obits (non-newsworthy ones) were excluded. My hypothesis is that the paper sold the archives to a site that stipulated that they must be excluded from other sources. It's the only explanation.

So, looks like I'll be visiting the library Monday to see if they have microfiche of the paper. WTF is going on that I can't find a major metropolitan newspaper's obit section in 2023? I can find 15 million pictures of influencers' breakfasts, but a 2x2 inch shred of paper is completely inaccessible. Not even a torrent out there of this stuff because who the fuck would make it hard to find an old newspaper?

(Forgot to mention that I used Google, Bing, DDG, and SearXNG. Bing was the most helpful, Google the least helpful.)

This shit right here is why I pirate - "great" business models. If there was a torrent of the entire decade's worth of that newspaper, it would have been easier to download that, compared to jumping through all these hoops.

you are viewing a single comment's thread
view the rest of the comments
[–] mettwurstkaninchen@feddit.de 15 points 1 year ago (1 children)

Here's a kind of secret and totally awesome way to access newspapers.com and other tools like JSTOR, Proquest and ancestry itself: If you're an active participant of Wikipedia, you'll get access to the Wikipedia library:

https://wikipedialibrary.wmflabs.org

You need an account age of 6 month, 500 edits at all and 10 edits in the last month, but that is something you can easily get and you're also doing something for the greater good

[–] azerial@lemmy.dbzer0.com 1 points 1 year ago

Oh yeah. I forget about this. Lol This is pretty easy to do with tools like awb. You can mindlessly fix typos like i do.