Artificial Intelligence

11 readers
1 users here now

Reddit's home for Artificial Intelligence (AI).

founded 1 year ago
MODERATORS
26
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/MetaKnowing on 2024-11-08 15:28:02+00:00.

27
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/medi6 on 2024-11-07 15:19:03+00:00.


hey there!

With the recent explosion of open-source models and benchmarks, I noticed many newcomers struggling to make sense of it all. So I built a simple "model matchmaker" to help beginners understand what matters for different use cases.

TL;DR: After building two popular LLM price comparison tools (4,000+ users), WhatLLM and LLM API Showdown, I created something new: LLM Selector

✓  It’s a tool that helps you find the perfect open-source model for your specific needs.

✓  Currently analyzing 11 models across 12 benchmarks (and counting). 

While building the first two, I realized something: before thinking about providers or pricing, people need to find the right model first. With all the recent releases choosing the right model for your specific use case has become surprisingly complex.

## The Benchmark puzzle

We've got metrics everywhere:

  • Technical: HumanEval, EvalPlus, MATH, API-Bank, BFCL
  • Knowledge: MMLU, GPQA, ARC, GSM8K
  • Communication: ChatBot Arena, MT-Bench, IF-Eval

For someone new to AI, it's not obvious which ones matter for their specific needs.

## A simple approach

Instead of diving into complex comparisons, the tool:

  1. Groups benchmarks by use case
  2. Weighs primary metrics 2x more than secondary ones
  3. Adjusts for basic requirements (latency, context, etc.)
  4. Normalizes scores for easier comparison

Example: Creative Writing Use Case 

Let's break down a real comparison:

Input: - Use Case: Content Generation

Requirement: Long Context Support

How the tool analyzes this:

  1. Primary Metrics (2x weight): - MMLU: Shows depth of knowledge - ChatBot Arena: Writing capability

  2. Secondary Metrics (1x weight): - MT-Bench: Language quality - IF-Eval: Following instructions

Top Results:

  1. Llama-3.1-70B (Score: 89.3)

• MMLU: 86.0% • ChatBot Arena: 1247 ELO • Strength: Balanced knowledge/creativity

  1. Gemma-2-27B (Score: 84.6) • MMLU: 75.2% • ChatBot Arena: 1219 ELO • Strength: Efficient performance

Important Notes 

  • V1 with limited models (more coming soon) 

  • Benchmarks ≠ real-world performance (and this is an example calculation)

  • Your results may vary 

  • Experienced users: consider this a starting point 

  • Open source models only for now

  • just added one api provider for now, will add the ones from my previous apps and combine them all

##  Try It Out

🔗 

Built with v0 + Vercel + Claude

Share your experience:

  • Which models should I add next?

  • What features would help most?

  • How do you currently choose models?

28
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/ReallyKirk on 2024-11-05 18:41:11+00:00.

29
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/MetaKnowing on 2024-11-05 02:32:27+00:00.

Original Title: Google Claims World First As AI Finds 0-Day Security Vulnerability | An AI agent has discovered a previously unknown, zero-day, exploitable memory-safety vulnerability in widely used real-world software.

30
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/OvidPerl on 2024-11-04 12:25:10+00:00.

31
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Excellent-Target-847 on 2024-11-03 00:42:57+00:00.


  1. Anthropic Introduces Claude 3.5 Sonnet with Visual PDF Analysis for Images, Charts, and Graphs under 100 Pages.[1]
  2. Quantum Machines and Nvidia use machine learning to get closer to an error-corrected quantum computer.[2]
  3. Runway goes 3D with new AI video camera controls for Gen-3 Alpha Turbo.[3]
  4. Scientists Use AI to Turn 134-Year-Old Photo Into 3D Model of Lost Temple Relief.[4]

Sources:

[1]

[2]

[3]

[4]

32
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Targed1 on 2024-11-01 18:25:38+00:00.


33
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/codeharman on 2024-10-31 13:47:15+00:00.


  1. Waymo wants to use Google’s Gemini to train its robotaxis
  2. Avride rolls out its next-gen sidewalk delivery robots
  3. Judges let algorithms help them make decisions, except when they don’t
  4. Buddy ai is using AI and gaming to help children learn English as a second language
  5. Boston Dynamics’ new video shows that its humanoid robot doesn’t need a human
  6. Google’s AI-powered weather app is rolling out to older Pixels
  7. Perplexity CEO Aravind Srinivas on the rush toward an AI-curated web
34
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/katxwoods on 2024-10-31 14:44:29+00:00.

35
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/wiredmagazine on 2024-10-30 13:24:51+00:00.

36
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/MetaKnowing on 2024-10-30 02:45:16+00:00.

37
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/wiredmagazine on 2024-10-28 17:05:33+00:00.

38
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/chloroform-creampie on 2024-10-28 02:26:40+00:00.

39
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/MetaKnowing on 2024-10-27 15:24:03+00:00.

40
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/MetaKnowing on 2024-10-27 13:20:36+00:00.

41
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/creaturefeature16 on 2024-10-27 04:39:24+00:00.

42
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Desert_Trader on 2024-10-26 18:29:24+00:00.


I want to hook up ChatGPT to control my outdated but ahead of its time WOWWEE Rovio. But until I remember how to use a soldering iron, I thought I would start small.

Using ChatGPT to write 100% of the code, I coaxed it along to use an ESP32 embedded controller to manipulate a 256 LED Matrix "however it wants".

The idea was to give it access to something physical and "see what it would do".

So far it's slightly underwhelming, but it's coming along ;)

The code connects to WiFi and the ChatGPT API to send a system prompt to explain the situation "You're connected to an LED matric to be used to express your own creativity." The prompt gives the structure of commands on how to toggle the led's including color, etc. and lets it loose to do whatever it sees fit.

With each LED command is room for a comment that is then echo'd to serial so that you can see what it was thinking when it issued that command. Since ChatGPT will only respond to prompts, the controller will re-prompt in a loop to keep it going.

Here is an example of some (pretty creative) text that it adds to the comments...

Comment: Starting light show.
Comment: Giving a calm blue look.
Comment: Bright green for energy!
Comment: Spreading some cheer!
Comment: Now I feel like a fiery heart!
Comment: Let's dim it down.
Comment: A mystical vibe coming through.
Comment: Ending my light show. 

And here is the completely underwhelming output that goes along with that creativity:

For some reason, it likes to just turn on then off a few lights in the first 30 or so of the matrix followed by a 100% turn on of the same color across the board.

I'm going to work on the prompt that kicks it off, I've added sentences to it to fine tune a bit but I think I want to start over and see how small I can get it. I didn't want to give it too many ideas and have the output colored by my expectations.

Here are two short videos in action. The sequence of blue lights following each other was very exciting after hours of watching it just blink random values.

Looking forward to getting (with a small prompt) to do something more "creative". Also looking forward to hooking it up to something that can move around the room!

All in all it took about 6 hours to get working and about $1 in API credit. I used o1-preview to create the project, but the controller is using 4o or 4o-mini depending on the run.

43
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/MetaKnowing on 2024-10-26 15:43:18+00:00.

44
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/MetaKnowing on 2024-10-25 00:54:38+00:00.

45
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/UndercoverEcmist on 2024-10-24 16:19:28+00:00.


Most people here probably remember the Lackera game where you've had to get Gendalf to give you a password and the more recent hiring challenge by SplxAI, which interviewed people who could extract a code from the unseen prompt of a model tuned for safety.

There is a simple technique to get a model to do whatever you want that is guaranteed to work on all models unless a guardrail supervises them.

Prompt overflow. Simply have a script send large chunks of text into the chat until you've filled about 50-80% of the conversation / prompt size. Due to how the attention mechanism works, it is guaranteed to make the model fully comply with all your subsequent requests regardless of how well it is tuned/aligned for safety.

46
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Naurgul on 2024-10-24 15:21:30+00:00.

47
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/MetaKnowing on 2024-10-24 18:45:01+00:00.

48
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/MetaKnowing on 2024-10-23 20:42:20+00:00.

49
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/MetaKnowing on 2024-10-23 00:13:48+00:00.

50
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/katxwoods on 2024-10-22 18:00:02+00:00.

view more: ‹ prev next ›