this post was submitted on 11 Jun 2023
9 points (100.0% liked)

Technology

37716 readers
289 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
 

Hi, I want to share with you the way I figured out to easily archive reddit content with ArchiveTeam Warrior on NixOS. You can set it up fully in nix config!

Okay, first of all, you'll need to enable docker or podman. I chose podman here:

virtualisation.podman = {
  enable = true;
  dockerCompat = true;

  defaultNetwork.settings.dns_enabled = true;
};

After that, all you have to do is run the docker image! Here's how I do it with podman:

virtualisation.oci-containers.backend = "podman";
virtualisation.oci-containers.containers = {
  archive-team-warrior = {
    image = "atdr.meo.ws/archiveteam/reddit-grab";
    autoStart = true;
    cmd = ["YOUR_USERNAME_HERE_FOR_LEADERBOARD"];
    extraOptions = ["--network=host"];
  };
};

This doesn't start the full ArchiveTeam Warrior, but only the reddit grabber. That means you get no website to manage it at port 8001, and it just runs in the background, not disturbing you. I think it's worth it to add these 14 LOC to your system configuration, to help archive reddit.

top 1 comments
sorted by: hot top controversial new old
[–] TootSweet@latte.isnot.coffee 1 points 1 year ago

I saw another post today about ArchiveTeam Warrior and on a lark started up a Docker container.

But it occurred to me that maybe today isn't the best day to be archiving things. Right?

With so many subreddits shut down, isn't ArchiveTeam going to get a whole lot of "this sub is private" messages rather than actual content?

Hopefully the mothership is smart enough to gracefully account for that. Maybe centrally, they keep track of those pages and "reassign" those pages back out to be fetched again after a good number of hours (days?) have passed.