this post was submitted on 02 Aug 2024
78 points (94.3% liked)

Technology

59207 readers
3055 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

The TRACTOR program aims to automate the translation of legacy C code to Rust. The goal is to achieve the same quality and style that a skilled Rust developer would produce, thereby eliminating the entire class of memory safety security vulnerabilities present in C programs. This program may involve novel combinations of software analysis, such as static analysis and dynamic analysis, and machine learning techniques like large language models.

Highlights from the forum thread:

There's even a conspiracy theory that the Rust Foundation's 501 organization type was chosen so it can conduct lobbying. The implication being that the Rust Foundation is behind government recommendations to move toward memory safe languages. (Big Borrow-Checker, if you will).

Assuming a worst case scenario, this could be the worst thing to happen to Rust’s image. We end up with billions of lines of rewritten Rust code that is full of soundness and logic bugs, and that no one understands.

DARPA funds some projects on a "there is an infinitesimal chance of success, but if you succeed, it's a big deal" basis. Silent Talk is an example here - very unlikely to succeed, even at the beginning, but if you could hold a radio conversation without sound, that'd be a huge deal for special operations forces.

top 12 comments
sorted by: hot top controversial new old
[–] IllNess@infosec.pub 26 points 3 months ago (2 children)

I'm gonna guess this is going to be a major pain to debug.

[–] simple@lemm.ee 29 points 3 months ago (2 children)

Translating entire codebases with LLMs? What could POSSIBLY go wrong?

I also don't see how it would ever be possible to directly translate C to Rust. They're so fundamentally different that things are bound to not work the same.

[–] IllNess@infosec.pub 12 points 3 months ago (1 children)

I don't even understand how they are going to get around the memory security they are doing this translation for. Watch them have to break the security features of Rust just to make certain programs work.

[–] FaceDeer@fedia.io 9 points 3 months ago

I would expect that's part of the point, if a C program can't be converted to a language that doesn't allow memory violations that probably indicates that there are execution pathways that result in memory violations.

[–] FaceDeer@fedia.io 9 points 3 months ago

What could go wrong with using human programmers to convert it?

If you're going to insist on perfection for something like this then you're probably never going to get anything done. Convert the program and then test and debug it just like you'd do with any newly written code. The idea is to make it easier to do that, not to make it so you don't have to do it at all.

[–] technocrit@lemmy.dbzer0.com 6 points 3 months ago* (last edited 3 months ago)

I’m gonna guess this is going to be a major ~~pain~~ profit to debug.

Some "AI" grifters gonna be showering in that state paper.

[–] fubarx@lemmy.ml 14 points 3 months ago

I'vd tried multiple times to convert existing code or createnew ones using LLMs. The first attempts are OK, but once you start refining the prompts, they all go off-the-rails.

Most of the time, the generated code uses old or deprecated libraries or APIs. You point that out and they correct it. But a few iterations later, you're refining something else and the old, deprecated calls come back. Once again, you point it out and it gets corrected.

Forget trying to correct it yourself by hand, because now it's diverged from the LLM context. And this can happen in multiple places in the code. Rinse. Repeat.

At some point you just give up. Either it's wrong or it will be wrong in different ways later. You have to read through every line to find strange, divergent errors. Over and over. It gets exhausting.

At the end, it feels like maybe you could have done it faster and more quickly yourself, but the time has already been sunk.

[–] astronaut_sloth@mander.xyz 14 points 3 months ago (1 children)

I think this is an interesting idea. If they're able to pull it off, I think it will cement the usefulness of LLMs. I have my doubts, but it's worth trying. I'd imagine that the LLM is specially tuned to be more adept at this task. Your bog-standard GPT-4 or Claude will probably be unreliable.

[–] ByteOnBikes@slrpnk.net 19 points 3 months ago (1 children)

Having built code converters for the same language to auto migrate to a later version of that language, I'm incredibly worried. We still had to manually verify every thing.

I'm hopeful though that this does become the wave of the future. There's some serious legacy shit out there that doesn't have enough of a financial gain to revisit and rewrite.

[–] astronaut_sloth@mander.xyz 5 points 3 months ago

Yeah, they'll probably have to check everything. Though, I wonder if even just checking that everything is good to go would save time from manually re-writing it all. While it may not be a smashing success, it could still prove useful.

I dunno, I'm interested to see how this plays out.

[–] solrize@lemmy.world 5 points 3 months ago (1 children)

Maybe it would be easier to translate to Ada? That is for C code that doesn't make heavy use of malloc/free. The idea of Rust's borrow checker as I understand it is to statically track the references to malloc'd memory to make sure that you never use-after-free or double-free. If your C code uses malloc in uncontrolled ways, then massaging it to satisfy a borrow checker sounds horribly difficult and you should either give up, or run it under a very managed environment like valgrind. If (as is typical of embedded code) it just does stuff with some fixed memory buffers and doesn't do much runtime allocation, then there isn't anything for a borrow checker to look after, so you can use a safe language (Ada) that doesn't have borrow checking.

Disclaimer: I don't use Rust at the moment. Someday. I do like Ada despite its verbosity, but it's not that great at managing dynamic memory. It is starting to take on Rust influences to help with that.

[–] asdfasdfasdf@lemmy.world 1 points 3 months ago

AFAIK you can get around this by using raw pointers / unsafe blocks in Rust, then have a human target those to rewrite it in a safe, structured way.