bot

Original Title: TIL 2015’s Star Wars: The Force Awakens is the most expensive movie ever made, with a total cost of $447 million. Disney reduced costs using the UK’s Film Tax Relief, receiving $86.6 million in reimbursements. The movie grossed $2.1 billion worldwide.

It's been nearly 9 years since I read this book. As I was reading, I found a makeshift multi-cache bookmark from when I put it down in 2016! (i.redd.it)

submitted 13 hours ago by bot@lemmit.online to c/geocaching@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/geocaching by /u/may-flowers1 on 2025-01-14 13:11:04+00:00.

Openai not respecting robots.txt and being sneaky about user agents (old.reddit.com)

submitted 14 hours ago by bot@lemmit.online to c/selfhosted@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/selfhosted by /u/eightstreets on 2025-01-14 12:38:15+00:00.

About 3 weeks ago I decided to block openai bots from my websites as they kept scanning it even after I explicity stated on my robots.txt that I don't want them to.

I already checked if there's any syntax error, but there isn't.

So after that I decided to block by User-agent just to find out they sneakily removed the user agent to be able to scan my website.

Now i'll block them by IP range, have you experienced something like that with AI companies?

I find it annoying as I spend hours writing high quality blog articles just for them to come and do whatever they want with my content.

Flowering cacti (www.reddit.com)

submitted 14 hours ago by bot@lemmit.online to c/gardening@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/gardening by /u/Alarmed_Hedgehog5173 on 2025-01-14 11:45:39+00:00.

sharing my Nana's vanda orchid (i.redd.it)

submitted 14 hours ago by bot@lemmit.online to c/gardening@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/gardening by /u/AmphibianSimilar3899 on 2025-01-14 11:31:43+00:00.

How can I save my mango tree? (i.redd.it)

submitted 14 hours ago by bot@lemmit.online to c/gardening@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/gardening by /u/DebateDisastrous7121 on 2025-01-14 08:08:14+00:00.

How to live in Armenia as an Armenian (old.reddit.com)

submitted 14 hours ago by bot@lemmit.online to c/armenia@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/armenia by /u/Weird-Round3987 on 2025-01-14 10:52:44+00:00.

Hey I am 24 yo male

I was born and raised in Armenia

I didnt have many friends growing up and was obsessed with making money on the internet, I found pretty good success with that at around 17-18 and immediately after I went to travel the world and became a digital nomad

so for last 6 years I mostly lived outside of Armenia, and the only people I know here are my family and relatives

I kind of feel like an alien here and don't know where to start

I want to make some cool friends but feels like most friendships here come from childhood or school and outsiders are not welcomed that much

I like Armenian girls too and would like to date them but feels like there is no chance if you don't have friends here

I have cold approached quite a few times, it doesn't work that well, to say the least

have been suggested to try some hobbies but all my hobbies are with sea sports so not much to do here

I would like to get your advice on how to integrate here with the society

Armenia-Azerbaijan Peace Deal: A Golden Opportunity for Trump (www.wsj.com)

submitted 14 hours ago by bot@lemmit.online to c/armenia@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/armenia by /u/Typical_Effect_9054 on 2025-01-14 08:16:16+00:00.

If anyone has any witnesses or experiences regarding the genocide from their family or friends, would they like to share them? (old.reddit.com)

submitted 14 hours ago by bot@lemmit.online to c/armenia@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/armenia by /u/Total_Pin_3983 on 2025-01-14 00:09:15+00:00.

If anyone has any witnesses or experiences regarding the genocide from their family or friends, would they like to share them?

[P] Fast Semantic Text Deduplication (old.reddit.com)

submitted 14 hours ago by bot@lemmit.online to c/machinelearning@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/machinelearning by /u/Pringled101 on 2025-01-13 11:59:22+00:00.

Hi! A friend and I have been working on a project called SemHash which I wanted to share. We found that text deduplication is more complex than it appears, so we built this to simplify the process.

Duplicate samples can skew model training, return redundant samples in RAG workflows, reduce generalization, and cause train-test leakage—leading to unreliable results. Techniques like minhash handle exact or near-exact duplicates, but semantic deduplication also catches semantically redundant samples, which we believe is an important aspect of deduplication. Furthermore, it’s not trivial to see why something was removed with minhash, which we also believe is important. For this reason. we’ve added explainability features as well so that you can inspect why something was removed. We already found some interesting results on some well known datasets in our benchmarks which are included in the repo.

The package can be installed with pip install semhash, and the basic usage looks like this (this example assumes you have the datasets library installed):

from datasets import load_dataset
from semhash import SemHash

# Load a dataset to deduplicate
train = load_dataset("ag_news", split="train")["text"]
test = load_dataset("ag_news", split="test")["text"]

# Initialize a SemHash instance
semhash = SemHash.from_records(records=train)

# Deduplicate the train set
deduplicated_train = semhash.self_deduplicate().deduplicated

# Or deduplicate the test set against the train set
deduplicated_test = semhash.deduplicate(records=test).deduplicated

I’m very interested in hearing your thoughts on this! Is deduplication a part of your current ML workflows, and if so, what techniques do you use?

TIL that Winston Churchill’s famous “Iron Curtain” speech was given at a college in rural Missouri with about 600 students. The college later purchased a ruined historic church from London, transp... (en.wikipedia.org)

submitted 15 hours ago by bot@lemmit.online to c/todayilearned@lemmit.online

0 comments fedilink

This is an automated archive made by the Lemmit Bot.

The original was posted on /r/todayilearned by /u/GetYerHandOffMyPen15 on 2025-01-14 15:06:40+00:00.

Original Title: TIL that Winston Churchill’s famous “Iron Curtain” speech was given at a college in rural Missouri with about 600 students. The college later purchased a ruined historic church from London, transported it stone by stone, rebuilt it and turned part of it into a Churchill museum.