Artificial Intelligence

11 readers
1 users here now

Reddit's home for Artificial Intelligence (AI).

founded 1 year ago
MODERATORS
576
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/koconder on 2024-02-16 22:40:33.


How can AI transform a static image into a dynamic, realistic video? OpenAI’s Sora introduces an answer through the innovative use of spacetime patches.

I did an explainer on Sora's underlying training process and patches

Image Slicing Processes

It's ability to understand and develop near perfect visual simulations including digital worlds like Minecraft will help it create training content for the AI's of tomorrow. For AI's to navigate our world it needs data and systems to help it better comprehend.

We can now unlock new heights of virtual reality (VR) as it changes the way we see digital environments, moving the boundaries of VR to new heights. The ability to create near perfect 3D environments which we can now pair with spatial computing for worlds on demand on Apple Vision Pro or Meta Quest.

577
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/wyem on 2024-02-16 18:20:50.


  1. Meta AI introduces V-JEPA (Video Joint Embedding Predictive Architecture), a method for teaching machines to understand and model the physical world by watching videos. Meta AI releases a collection of V-JEPA vision models trained with a feature prediction objective using self-supervised learning. The models are able to understand and predict what is going on in a video, even with limited information [Details | GitHub].
  2. Open AI introduces Sora, a text-to-video model that can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions [Details + sample videos| Report].
  3. Google announces their next-generation model, Gemini 1.5, that uses a new Mixture-of-Experts (MoE) architecture. The first Gemini 1.5 model being released for early testing is Gemini 1.5 Pro with a context window of up to 1 million tokens, which is the longest context window of any large-scale foundation model yet. 1.5 Pro can perform sophisticated understanding and reasoning tasks for different modalities, including video and it performs at a similar level to 1.0 Ultra [Details |Tech Report].
  4. Reka introduced Reka Flash, a new 21B multimodal and multilingual model trained entirely from scratch that is competitive with Gemini Pro & GPT 3.5 on key language & vision benchmarks. Reka also present a compact variant Reka Edge , a smaller and more efficient model (7B) suitable for local and on-device deployment. Both models are in public beta and available in Reka Playground [Details].
  5. Cohere For AI released Aya, a new open-source, massively multilingual LLM & dataset to help support under-represented languages. Aya outperforms existing open-source models and covers 101 different languages – more than double covered by previous models [Details].
  6. BAAI released Bunny, a family of lightweight but powerful multimodal models. Bunny-3B model built upon SigLIP and Phi-2 outperforms the state-of-the-art MLLMs, not only in comparison with models of similar size but also against larger MLLMs (7B), and even achieves performance on par with LLaVA-13B [Details].
  7. Amazon introduced a text-to-speech (TTS) model called BASE TTS (Big Adaptive Streamable TTS with Emergent abilities). BASE TTS is the largest TTS model to-date, trained on 100K hours of public domain speech data and exhibits “emergent” qualities improving its ability to speak even complex sentences naturally [Details | Paper].
  8. Stability AI released Stable Cascade in research preview, a new text to image model that is exceptionally easy to train and finetune on consumer hardware due to its three-stage architecture. Stable Cascade can also generate image variations and image-to-image generations. In addition to providing checkpoints and inference scripts, Stability AI has also released scripts for finetuning, ControlNet, and LoRA training [Details].
  9. Researchers from UC berkeley released Large World Model (LWM), an open-source general-purpose large-context multimodal autoregressive model, trained from LLaMA-2, that can perform language, image, and video understanding and generation. LWM answers questions about 1 hour long YouTube video even if GPT-4V and Gemini Pro both fail and can retriev facts across 1M context with high accuracy [Details].
  10. GitHub opens applications for the next cohort of GitHub Accelerator program with a focus on funding the people and projects that are building AI-based solutions under an open source license [Details].
  11. NVIDIA released Chat with RTX, a locally running (Windows PCs with specific NVIDIA GPUs) AI assistant that integrates with your file system and lets you chat with your notes, documents, and videos using open source models [Details].
  12. Open AI is testing memory with ChatGPT, enabling it to remember things you discuss across all chats. ChatGPT's memories evolve with your interactions and aren't linked to specific conversations. It is being rolled out to a small portion of ChatGPT free and Plus users this week [Details].
  13. BCG X released of AgentKit, a LangChain-based starter kit (NextJS, FastAPI) to build constrained agent applications [Details | GitHub].
  14. Elevenalabs' Speech to Speech feature, launched in November, for voice transformation with control over emotions and delivery, is now multilingual and available in 29 languages [Link]
  15. Apple introduced Keyframer, an LLM-powered animation prototyping tool that can generate animations from static images (SVGs). Users can iterate on their design by adding prompts and editing LLM-generated CSS animation code or properties [Paper].
  16. Eleven Labs launched a payout program for voice actors to earn rewards every time their voice clone is used [Details].
  17. Azure OpenAI Service announced Assistants API, new models for finetuning, new text-to-speech model and new generation of embeddings models with lower pricing [Details].
  18. Brilliant Labs, the developer of AI glasses, launched Frame, the world’s first glasses featuring an integrated AI assistant, Noa. Powered by an integrated multimodal generative AI system capable of running GPT4, Stability AI, and the Whisper AI model simultaneously, Noa performs real-world visual processing, novel image generation, and real-time speech recognition and translation. [Details].
  19. Nous Research released Nous Hermes 2 Llama-2 70B model trained on the Nous Hermes 2 dataset, with over 1,000,000 entries of primarily synthetic data [Details].
  20. Open AI in partnership with Microsoft Threat Intelligence, have disrupted five state-affiliated actors that sought to use AI services in support of malicious cyber activities [Details]
  21. Perplexity partners with Vercel, opening AI search to developer apps [Details].
  22. Researchers show that LLM agents can autonomously hack websites, performing tasks as complex as blind database schema extraction and SQL injections without human feedback. The agent does not need to know the vulnerability beforehand [Paper].
  23. FCC makes AI-generated voices in unsolicited robocalls illegal [Link].
  24. Slack adds AI-powered search and summarization to the platform for enterprise plans [Details].

Source: AI Brews - you can subscribe the newsletter here. it's free to join, sent only once a week with bite-sized news, learning resources and selected tools. Thanks.

578
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/holy_moley_ravioli_ on 2024-02-16 17:21:57.

Original Title: The fact that SORA is not just generating videos, it's simulating physical reality and recording the result, seems to have escaped people's summary understanding of the magnitude of what's just been unveiled

579
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/aurumvexillum on 2024-02-16 03:43:28.


I'm seeing numerous reposts of Sora's text-to-video samples, which are impressive in their own right, and showcase what is undoubtedly a massive leap forward for generative video models. However, the full range of the model's capabilities — outlined within the technical report — is truly remarkable.

580
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/holy_moley_ravioli_ on 2024-02-15 19:47:59.

581
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/SAT0725 on 2024-02-15 16:57:20.

582
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/LogiPredator on 2024-02-15 02:56:21.

583
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Jariiari7 on 2024-02-14 13:09:49.

584
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/TechExpert2910 on 2024-02-14 08:21:23.

585
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Starks-Technology on 2024-02-13 18:33:12.


The folks over at the r/ArtificialInteligence subreddit really liked this, so I thought to share it here too!

Last week,I wrote a technical article about a new concept: an intelligent AI-Powered screener. The feature is simple. Instead of using ChatGPT to interpret SQL queries, wrangling Excel spreadsheets, and using complicated stock screeners to find new investment opportunities, you’ll instead use a far more natural, intuitive approach: natural language.

Screening for stocks using natural language

This screener doesn’t just find stocks that hit a new all time high (poking fun at you, RobinHood). By combining Large Language Models, complex data queries, and fundamental stock data, I’ve created a seamless pipeline that can search for stocks based on virtually any fundamental indicator. This includes searching through over 130 industries including healthcare, biotechnology, 3D printing, and renewable energy. In addition, users can filter their search by market cap, price-to-earnings ratio, revenue, net income, EBITDA, free cash flow, and more. This solution offers an intuitive approach to finding new, novel stocks that meet your investment criteria. The best part is that literally anybody can use this feature.

Read the official launch announcement!

How does it work?

Like I said, I wrote an entire technical article about how it works. I don't really want to copy/paste the article text here because it's long and extremely detailed. To save you a click, I'll summarize the process here:

  1. Using Yahoo Finance, I fetch the company statements
  2. I feed the statements into an LLM and ask it to add tags from a list of 130+ tags to the company. This sounds simple but it requires very careful prompt engineering and rigorous testing to prevent hallucinations
  3. I save the tags into a MongoDB database
  4. I hydrate 10+ years of fundamental data about every US stock into a different MongoDB collection
  5. I used an LLM as a parser to translate plain English into a MongoDB aggregation pipeline
  6. I execute the pipeline against the database
  7. I take the response and send another request to an LLM to summarize it in plain English

This is a simplified overview, because I also have ways to detect prompt injection attacks. I also plan to make the pipeline more sophisticated by introducing techniques like Tree of Thought Prompting. I thought this sub would find this interesting because it's a real, legitimate use-case of LLMs. It shows how AI can be used in industries like finance and bring legitimate value to users.

What this can do?

This feature is awesome because it allows users to search a rich database of stocks to find novel investing opportunities. For example:

  • Users can search for stocks in a certain income and revenue range
  • Users find stocks in certain niche industries like biotechnology, 3D printing, and alternative energy
  • Users can find stocks that are overvalued/undervalued based on PE ratio, PS ratio, free cash flow, and other fundamental metrics
  • Literally all of the above combined

What this cannot do?

In other posts, I've gotten a bunch of hate comments by people who didn't read post. To summarize what this feature isn't

  • It doesn't pick stocks for you. It finds stocks by querying a database in natural language
  • It doesn't make investment decisions for you
  • It doesn't "beat the market" (it's a stock screener... it beating the market doesn't make sense)
  • It doesn't search by technical indicators like RSI and SMA. I can work on this, but this would be a shit-ton of data to ingest

Happy to answer any questions about this! I'm very proud of the work I've done so far and can't wait to see how far I go with it!

Read more about this feature here!

586
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/mind-wank on 2024-02-13 12:37:50.

587
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/PorkyPORM on 2024-02-12 04:02:20.

588
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/thisisinsider on 2024-02-10 22:50:20.

589
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/wyem on 2024-02-09 16:19:25.


  1. Google launches Ultra 1.0, its largest and most capable AI model, in its ChatGPT-like assistant which has now been rebranded as Gemini (earlier called Bard). Gemini Advanced is available, in 150 countries, as a premium plan for $19.99/month, starting with a two-month trial at no cost. Google is also rolling out Android and iOS apps for Gemini [Details].
  2. Alibaba Group released Qwen1.5 series, open-sourcing models of 6 sizes: 0.5B, 1.8B, 4B, 7B, 14B, and 72B. Qwen1.5-72B outperforms Llama2-70B across all benchmarks. The Qwen1.5 series is available on Ollama and LMStudio. Additionally, API on together.ai [Details | [Hugging Face].](https://qwenlm.github.io/blog/qwen1.5/)
  3. NVIDIA released Canary 1B, a multilingual model for speech-to-text recognition and translation. Canary transcribes speech in English, Spanish, German, and French and also generates text with punctuation and capitalization. It supports bi-directional translation, between English and three other supported languages. Canary outperforms similarly-sized Whisper-large-v3, and SeamlessM4T-Medium-v1 on both transcription and translation tasks and achieves the first place on HuggingFace Open ASR leaderboard with an average word error rate of 6.67%, outperforming all other open source models [Details].
  4. Researchers released Lag-Llama, the first open-source foundation model for time series forecasting [Details].
  5. LAION released BUD-E, an open-source conversational and empathic AI Voice Assistant that uses natural voices, empathy & emotional intelligence and can handle multi-speaker conversations [Details].
  6. MetaVoice released MetaVoice-1B, a 1.2B parameter base model trained on 100K hours of speech, for TTS (text-to-speech). It supports emotional speech in English and voice cloning. MetaVoice-1B has been released under the Apache 2.0 license [Details].
  7. Bria AI released RMBG v1.4, an an open-source background removal model trained on fully licensed images [Details].
  8. Researchers introduce InteractiveVideo, a user-centric framework for video generation that is designed for dynamic interaction, allowing users to instruct the generative model during the generation process [Details |GitHub ].
  9. Microsoft announced a redesigned look for its Copilot AI search and chatbot experience on the web (formerly known as Bing Chat), new built-in AI image creation and editing functionality, and Deucalion, a fine tuned model that makes Balanced mode for Copilot richer and faster [Details].
  10. Roblox introduced AI-powered real-time chat translations in 16 languages [Details].
  11. Hugging Face launched Assistants feature on HuggingChat. Assistants are custom chatbots similar to OpenAI’s GPTs that can be built for free using open source LLMs like Mistral, Llama and others [Link].
  12. DeepSeek AI released DeepSeekMath 7B model, a 7B open-source model that approaches the mathematical reasoning capability of GPT-4. DeepSeekMath-Base is initialized with DeepSeek-Coder-Base-v1.5 7B [Details].
  13. Microsoft is launching several collaborations with news organizations to adopt generative AI [Details].
  14. LG Electronics signed a partnership with Korean generative AI startup Upstage to develop small language models (SLMs) for LG’s on-device AI features and AI services on LG notebooks [Details].
  15. Stability AI released SVD 1.1, an updated model of Stable Video Diffusion model, optimized to generate short AI videos with better motion and more consistency [Details | Hugging Face] .
  16. OpenAI and Meta announced to label AI generated images [Details].
  17. Google saves your conversations with Gemini for years by default [Details].

Source: AI Brews - you can subscribe the newsletter here. it's free to join, sent only once a week with bite-sized news, learning resources and selected tools. Thanks.

590
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Southern_Opposite747 on 2024-02-10 04:03:14.

591
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/dead_planets_society on 2024-02-09 15:58:17.

Original Title: Minecraft could be the key to creating adaptable AI: Researchers have a new way to assess an AI model’s intelligence: drop it into a game of Minecraft, with no information about its surroundings, and see how well it plays

592
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Georgeo57 on 2024-02-08 16:55:29.


There's a lot of talk about governments throughout the world building their own ais primarily for the purpose of national security. among these are the governments of the u.s., china, india, the u.k. and france. it's said that this is why pausing or halting ai development is not a viable option. no country can afford to be left behind.

government ais, however, perhaps with the exception of countries like china that maintain very close ties with private businesses, will for the most part be involved in security matters that have a little impact on the everyday lives of the citizens of those countries. at least in times of peace.

the same cannot, however, be said for ais developed expressly for the private citizens and businesses of these countries. this is where the main battles of the ai arms race will be waged

imagine, for example, if business interests in china were first in the world to develop an agi that was so successful at picking stocks that they were able to corner the world's financial markets. that success would soon after result in massive transfers of wealth from all other countries to china.

such transfers would improve the quality of life in china, and reduce it in every other country. such transfers could become so substantial that the global community begin to consider creating a new system of wealth allocation between the countries of the world.

because of such a prospect, it is in everyone's interest everywhere to neither pause nor halt ai development, but rather to move on it full speed ahead.

593
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/jaden530 on 2024-02-08 05:42:01.


I would like to start off with I know the bare minimum when it comes to coding. I'm pretty good with computers in general and have always been able to do something with enough googling.

I recently read an article about Samsung that talked about a fridge that they had at CES that used cameras to identify 33 food items and track what they are, nutritional information, spoil time, and stock. I have been pretty hands off with AI while keeping up with all of the newest improvements so once I saw that it was going to have only 33 food items and also be set up to be used in the samsung environment I wondered "can I do better?"

So I booted up my laptop, downloaded vscode, python, and launched chat gpt. I figured that I could at the least bit learn something about python if nothing else.

Well in the few days that I have been working on this project I have a program that is able to identify thousands of foods with little error, parse the data to itemize it better for the other systems, give each item nutritional information, log it into inventory, and then have a gpt-4-turbo assistant analyze the inventory and recognize trends, recommend recipes, give insight, etc. All of this is available to use via an extremely simple to use GUI.

The journey is far from over, and if you guys are interested I can update with photos and more information about it or even give you the latest build that I have compiled into a .exe. I don't plan to beat out samsung, but I feel like having a cheap alternative "smart fridge" system that can run on a raspberry pi would be pretty cool!

There are still some huge features that I'm in the process of adding that could make or break the project to either be something exciting or a wall that my skill and chatgpt's skill just can't get around. It's crazy what AI is capable of though!

Edit:

I decided to add a walkthrough of all of the features currently available with photos on Imgur. Everything seen there is extremely early development and will be changed.

594
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/vinaylovestotravel on 2024-02-07 11:08:41.

595
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/yimmy51 on 2024-02-05 23:15:07.

596
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Illustrious_Court178 on 2024-02-06 14:56:25.

597
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/dead_planets_society on 2024-02-05 23:17:52.

Original Title: Ancient Herculaneum scroll piece revealed by AI : A Greek philosopher’s musings on pleasure, contained in ancient papyrus scrolls buried by Mount Vesuvius’s eruption 2000 years ago, have been rediscovered with the help of AI

598
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/Philipp on 2024-02-04 14:28:52.

599
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/dead_planets_society on 2024-01-31 16:09:09.

Original Title: AI can better retain what it learns by mimicking human sleep: Building AIs that sleep and dream can lead to better results and more reliable models, according to researchers who aim to replicate the architecture and behaviour of the human brain.

600
 
 
This is an automated archive made by the Lemmit Bot.

The original was posted on /r/artificial by /u/leggedrobotics on 2024-01-31 12:32:20.

view more: ‹ prev next ›