Gentoo bans AI-created contributions

This article brought to you by LWN subscribers

Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

By Joe Brockmeier
April 18, 2024

Gentoo Council member Michał Górny posted an RFC to the gentoo-dev mailing list in late February about banning "'AI'-backed (LLM/GPT/whatever) contributions" to the Gentoo Linux project. Górny wrote that the spread of the "AI bubble" indicated a need for Gentoo to formally take a stand on AI tools. After a lengthy discussion, the Gentoo Council voted unanimously this week to adopt his proposal and ban contributions generated with AI/ML tools.

The case against

In his RFC, he laid out three broad areas of concern: copyrights, quality, and ethics. On the copyright front, he argued that LLMs are trained on copyrighted material and the companies behind them are unconcerned with copyright violations. "In particular, there's a good risk that these tools would yield stuff we can't legally use."

He questioned the quality of LLM output, though he did allow that LLMs might "provide good assistance if you are careful enough". But, he said, there's no guarantee contributors are aware of the risks. He minced no words about his view of the ethics of the use of AI. Górny took issue with everything from the energy consumption driven by AI to labor issues and "all kinds of spam and scam". The only reasonable course of action, he said, would be to ban the use of those tools altogether in creating works for Gentoo:

In other words, explicitly forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to create ebuilds, code, documentation, messages, bug reports and so on for use in Gentoo.

He added that this only extended to works created expressly for the Gentoo project, and did not encompass upstream projects using things like ChatGPT. Andreas K. Hüttel asked whether there were objections to packaging AI software for Gentoo. This did not elicit a response in favor or against on the list, but the AI policy page expressly mentions that the policy does not prohibit packaging AI-related software.

Is this necessary?

Rich Freeman wrote that he thought it made sense to consider the use of AI, but suggested the Gentoo developer certificate of origin (DCO) already had the necessary language to prohibit AI-generated contributions. "Perhaps we ought to just re-advertise the policy that already exists?" He also poked at the ethical case laid out by Górny, and suggested it would alienate some contributors even if the majority of the project was in favor. Freeman said it was not a bad idea to reiterate that Gentoo didn't want contributions that were just piped out of a GPT application into forums, bug reports, commits, etc., but didn't think that it required any new policy.

Ulrich Mueller replied that there is overlap with existing policy, but did not find it redundant and supported the idea of a clarification on how to deal with AI-generated code. Sam James agreed with the proposal but worried that it was "slightly performative [...] given that we can't really enforce it." Górny wrote that it was unlikely that the project could detect these contributions, or that it would want to actively pursue finding them. The point, he said, is to make a statement that they are undesirable.

Oskari Pirhonen wanted to know about cases where a contributor uses ChatGPT to help with writing documentation or commit messages (but not code) because they don't have "an excellent grasp of English". If those contributions explicitly called out AI-generated content, would those be acceptable? Górny said that would not help much, and dismissed the quality of content generated by ChatGPT. Mueller wanted to know where the line was: "Are translation tools like DeepL allowed? I don't see much of a copyright issue for these."

In a rare dissent, Matt Jolly responded that Gentoo would always have poor quality contributions, and could simply use common sense to filter out low-quality LLM material. "We already have methods for weeding out low quality contributions and bad faith contributors - let's trust in these and see what we can do to strengthen these tools and processes." He argued in favor of using LLMs for code documentation and asked why he had to type out an explanation of what his code does if an LLM can generate something that only requires some editing. The proposal, he said, was a bad idea and banning LLMs "at this point is just throwing the baby out with the bathwater". Guidelines would be fine, even a ban on completely AI-generated works, but he was opposed to "pre-emptively banning useful tools".

James replied that tools trained on Gentoo's current repository should be OK, as well as using LLMs to assist with commit messages. But, he said, a lot of FOSS projects were seeing too much AI spam and were not interested in picking the "possibly good" parts out.

David Seifert responded in support of the RFC and asked if it could be added to the next Gentoo Council meeting agenda. Górny said that he had been asked for a specific motion and provided this language:

It is expressly forbidden to contribute to Gentoo any content that has been created with the assistance of Natural Language Processing artificial intelligence tools. This motion can be revisited, should a case been made over such a tool that does not pose copyright, ethical and quality concerns.

Approved

Given the ratio of comments in favor of banning AI-generated contributions to objections to such a ban, it is not surprising that the council voted to accept Górny's proposal. Now the question is how Gentoo implements the ban. In an emailed response to questions, Górny said that Gentoo is relying on trust in its contributors to adhere to the policy rather than trying to police contributions to see if they were generated with AI/ML tools:

In both cases, our primary goal is to make it clear what's acceptable and what's not, and politely ask our contributors to respect that. If we receive contributions that contain really "weird" mistakes, the kind that [do not] seem likely to be caused by a human error, we're going to start asking questions, but I think that's the best we can do.

As AI/ML continues to dominate the tech industry's agenda, Gentoo is unusual in looking to shut it out rather than trying to join the party. How well the policy works, and how soon it is tested, will be interesting to see.

(Log in to post comments)

Gentoo bans AI-created contributions

Posted Apr 18, 2024 17:31 UTC (Thu) by gmgod (subscriber, #143864) [Link]

Beyond the potential copyright violation, there is also the waste of time associated with these for documentation/commit message purposes.

Prompting an AI tool to do "say that the app was missing a feature about how to handle numbers in a commit message" is going to generate a novel's worth of text that the person with broken English won't be able to vet. And anything beyond that half-broken prompt will be assumptions on the AI side that humans are going to waste time reading and finding mostly consistent until they read the code and figure the description might not even match! Wasting even more time.

I'm sorry but words have meaning. Using AI as a fluff generator is probably the worst disrespect you can show to your reader. I much prefer broken English.

Gentoo bans AI-created contributions

Posted Apr 18, 2024 17:40 UTC (Thu) by snajpa (subscriber, #73467) [Link]

So much for theory. And now, any practical example of this AI-driven spammy contribution? In the projects I watch closely, the situation you and the article are describing, is mostly purely theoretical. To me it seems like a signal that there are enough contributors to the project, when they can raise barriers to avoid problems that aren't even there.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 3:39 UTC (Fri) by epg (subscriber, #34047) [Link]

It's already happening in the workplace and it sucks.

Now, the perpetrators weren't writing quality commit logs before either; but if you're going to commit shit, I'd rather it be a small amount, not a steaming pile.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 9:52 UTC (Fri) by gmgod (subscriber, #143864) [Link]

As said under there, it's already happening at the work place and it sucks.

It's bot just code. Have you tried hiring anyone recently? Before ChatGPT, we could weed out most I-saw-there-was-a-light-here in seconds or minutes. For every 25 application, we would shortlist 1-2 people and we would go from there. In the last round we shorlisted 14 people out of 31 applicants... And in spite of that he role wasn't fulfilled in the end. It's not theoretical, it's a fucking nightmare.

Of course we've changed the way we advertise positions now, to make them less chatgptable...

Also, when was last time you searched something on the web to find a page that seems to be exactly what you ask for, to read the arm-long article, to finally figure out that though it's very well-written, you've just lost your time because the information was blatantly wrong ("recommendation" sites are typical).

When was the last time you had a fruitful interaction with a customer service, first time? Does it not strike you as odd that an "agent" will be there with you? I had a very nice chat with an Amazon engineer, there are reasons for everything.

So please don't say it's theoretical. I don't mind people using AI to generate something before they vet it, if they have enough skills to do so appropriately. I do mind people not doing so.

Speaking of theoretical... I suspect you use AI in your contributions then... Can you swear here that everything you AI generate is correct and always commited as is?

Gentoo bans AI-created contributions

Posted Apr 19, 2024 10:22 UTC (Fri) by snajpa (subscriber, #73467) [Link]

> Can you swear here that everything you AI generate is correct and always commited as is?

But I never said I let AI do whatever it wants, it's a very useful tool, but it it still - just a tool. If we had reached singularity by now, this discussion would look entirely different :) In my code, since I started with Copilot, most of the bytes are actually an output of their LLM. But if I didn't guide it, the bytes coming out of it wouldn't make much sense. Of course I'm still responsible.

It'd be an entirely different discussion if there was so advanced LLM that I'd just tell it what to do and it would even submit the change without my intervention, possibly without even having to read the commit... but the tech is not there and I suspect it'll take a while before it gets there... thus it's so far IMHO pretty irrelevant *how* the contributor comes to the result they want to contribute, it's the result that matters - and it is still *their* action to contribute that piece. LLMs might just be a new (and a pretty strong) +1 reason to ban _some_ contributors, who are wasting maintainer's time...

Gentoo bans AI-created contributions

Posted Apr 19, 2024 16:45 UTC (Fri) by rgmoore (✭ supporter ✭, #75) [Link]

When was the last time you had a fruitful interaction with a customer service, first time?

Tuesday. I got a knowledgeable human service rep who gave me the help I needed promptly. I've had plenty of miserable interactions with human service reps, though. I think the big difference is whether the company sees customer service as a cost center or a profit center. The reason the rep I encountered on Tuesday is good is because we're paying real money for a service contract, and the company knows we won't renew if they do a bad job.

The root cause of bad AI service is companies that see service as a cost to minimize. Today they've decided AI is the way to save money. A few years ago, it was offshoring to a call center in some low cost country. Before that it was underpaid domestic call centers mixed with endless message trees. The common denominator is unwillingness to pay for competent staff.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 9:02 UTC (Sat) by gfernandes (subscriber, #119910) [Link]

Except that LLM based AIs don't actually save any money at all!

Today's LLMs are *heavily* subsidised by the companies pushing them, thanks to the Crypto Bros VC lot pumping money into the NextBigThing(TM) since sliced bread (and of course, crypto!).

Actual costs per query are very high. Not surprisingly, considering after the crypto collapse, nVidias "big market" is LLM based AI!

Gentoo bans AI-created contributions

Posted Apr 20, 2024 14:50 UTC (Sat) by rgmoore (✭ supporter ✭, #75) [Link]

Today's LLMs are *heavily* subsidised by the companies pushing them, thanks to the Crypto Bros VC lot pumping money into the NextBigThing(TM) since sliced bread (and of course, crypto!).

The companies don't care about the real cost. They care about how much they're paying today. If some VC is willing to subsidize their customer support by paying 95% of the real cost of AI, they'll look at the 5% they have to pay. It's the same as taxi passengers switching to heavily subsidized Uber and Lyft. What mattered was what they paid out of pocket, not who was paying for the rest. AI companies turning off the money spigot at some point in the future is a future problem. They've already switched how they handle customer support a bunch of times, so they aren't afraid of having to do it again if costs change.

Not that the VC care. Their business model isn't to build the next Google or Facebook. It's to convince a bunch of suckers in the stock market that they've built the next Google or Facebook and sell their stock before the whole house of cards collapses. Bringing in a bunch of companies to "prove" they have source of future profits is just part of the process.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 1:34 UTC (Sun) by khim (subscriber, #9252) [Link]

> Not that the VC care. Their business model isn't to build the next Google or Facebook. It's to convince a bunch of suckers in the stock market that they've built the next Google or Facebook and sell their stock before the whole house of cards collapses. Bringing in a bunch of companies to "prove" they have source of future profits is just part of the process.

But that perpetuum mobile works on the willingness of the World Majority to pay for this whole stupidity. And they are less and less willing.

So that makes each individual company less susceptible but means that the whole society would, eventually, lose the ability to buy stuff for the money which no longer worth anything… and then the whole thing would collapse.

Is it really the only way to return sanity to the whole process? Civil war, rivers of blood and the whole thing that would follow? Why would people want that?

Gentoo bans AI-created contributions

Posted Apr 20, 2024 15:04 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

> Today's LLMs are *heavily* subsidised by the companies pushing them

That's not actually true. OpenAI has $2B revenue and could turn a healthy profit.

Some companies like Microsoft are losing money right now, but it's mostly from front-loaded costs like model training and R&D. Once they are paid off, running AI models is not super-different from any other compute workload. We also have multiple companies working on various AI accelerators that are going to make model running significantly cheaper.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 2:48 UTC (Mon) by ssmith32 (subscriber, #72404) [Link]

No, it just hit $2 billion "annualized" revenue. Which means it hasn't actually made $2 billion.

https://www.investopedia.com/terms/a/annualized-income.asp

Not quite as bad as "covid adjusted earnings". But it's still BS.

Also, no, they can't make a profit.

"Altman has said OpenAI remains lossmaking because of the vast costs of building and running its models. The spending is expected to continue to outpace revenue growth as it develops more sophisticated models. The company is likely to need to raise *tens of billions* more in order to meet those costs."

https://www.ft.com/content/81ac0e78-5b9b-43c2-b135-d11c47...

Rumor is PhDs willing to shill for these companies pull in a million a year in actual revenue, so we're not just talking about the costs of fueling NVIDIA's market cap - 1000 engineers could be costing them $1 billion. And that's a real billion not an "annualized" one. And, of course, the executives can't be seen making less than the engineers, and Altman *has* to make more than everyone else, so.. there's your "vast costs" of training models ..

Gentoo bans AI-created contributions

Posted Apr 22, 2024 4:52 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> No, it just hit $2 billion "annualized" revenue.

Uh, "annualized" just means "projected until the end of the year, given the current income". It usually underestimates the income of growing companies, unless something terrible happens.

> "Altman has said OpenAI remains lossmaking because of the vast costs of building and running its models. The spending is expected to continue to outpace revenue growth as it develops more sophisticated models. The company is likely to need to raise *tens of billions* more in order to meet those costs."

They need money because they want to build newer and better models. This is how R&D works in _any_ industry. Pure model running can already be profitable, if all you want is just to run models. It will become even more profitable in the near future when new AI accelerators hit the market.

New model development is expensive, but it also will get cheaper. Both because of new specialized hardware and because training methods are improving.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 8:18 UTC (Mon) by atnot (subscriber, #124910) [Link]

I'm just going to point out how every point in this argument relies on the word "will". In an extremely speculative market that has been recognized even by proponents as probably a bubble.

"Will" is a word that means "doesn't". No matter how optimistic you are about the outlook, the facts are they're highly unprofitable right now. We don't know much about how big the operating vs r&d costs actually are, nor do we know how fruitful any of those r&d efforts might actually be. We do know that in the here and now, the cost of these things, financial and societal, is astronomical and that the actual value is minor. We know that for more than a year, we haven't seen any glimpses of the previous exponential improvements[1]. Especially given the track record of the tech industry in the last decade, just taking the boosters at their word here seems extremely foolish.

[1] https://garymarcus.substack.com/p/evidence-that-llms-are-...

Gentoo bans AI-created contributions

Posted Apr 22, 2024 17:53 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> I'm just going to point out how every point in this argument relies on the word "will".

OpenAI has actual income from actual paying customers. They can just stop the R&D, and they'll be hugely profitable (for a while).

> In an extremely speculative market that has been recognized even by proponents as probably a bubble.

It's speculative, but not in the sense you're thinking about. Nobody seriously doubts that the AI is here to stay, and that it's going to be hugely impactful. However, nobody also knows who is going to win the AI race. So every VC is making tons of bets, resulting in a somewhat frenzied environment.

They fully expect 99+% of their investments to go up in smoke, but if they make a successful bet, that remaining 1% will recoup the losses.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 22:09 UTC (Mon) by atnot (subscriber, #124910) [Link]

> OpenAI has actual income from actual paying customers. They can just stop the R&D, and they'll be hugely profitable (for a while).

Can they?

For one, as I said, we don't know how much of their losses are from operating their service vs research. But I am going to go out on a limb and say that if even just a single service was operating at a profit, they would have been very eager to tell us. That would, to my knowledge, be a first for any genAI offering and cement them as the clear leaders in the industry. It would also help reinforce the idea that AI is going to be incredibly profitable as that comes under fire. However they seem weirdly coy about their operating figures somehow.

For two, we don't know what those "paying customers" are actually doing. There's been a whole lot of demos and webtoys and "experimenting" and "trials" and pass-through APIs that hastily paste your parameters into some pre-written prompt. But there's been remarkably little actual success stories and useful business applications for it. I couldn't find any company stating they made a profit *using* AI either. I don't think they could stop r&d if they wanted to, because all of these companies testing out chatgpt solutions to problems that it can't actually solve are doing so based on the idea that these things soon enough will be able to do anything. The visible r&d spending is crucial to that.

I found it remarkable how in a recent article I read, the Washington Post, despite clearly trying very hard, couldn't scrounge up anyone to balance the article with some positive news who didn't have to resort to constructs like "I think we will see" and "my expectations are" (https://archive.is/pIOra). It's almost funny how consistently all of the bad news is in present tense and the good news are in the future tense. The constant excuses how about how "it's just the early days" and "we'll see applications for it any days now" will also seem very familiar to anyone who has followed e.g. cryptocurrency news[1].

It may just be that they are earning a lot of money by teaching customers their tech isn't useful for them. The one thing I can find reliably is stories such as an animation company hiring a bunch of "prompt engineers" instead of artists to paint background mattes, failing at making minor revisions of the work and not understanding how animation works and getting canned. This great success, of course, will up (4x) in their next annualized revenue figures :)

[1] Which does bring up a fun comparison: local divisions of McDonals and Hershey's did certainly pay a few thousand dollars (or rather, paid a marketing agency to pay a few thousand dollars) for a "crypto experience" on some crypto "metaverse" platform I don't remember the name of at the height of the crypto/metaverse bubble. A bunch of brands minted NFTs too. However that money was, unsurprisingly, not the start of enormous year-on-year brand spending on the blockchain people claimed it was.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 22:17 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> Can they?

Of course. Pure model running is already highly profitable on general-purpose hardware at the prices that OpenAI charges. And it's going to be even better on special-purpose hardware.

> For two, we don't know what those "paying customers" are actually doing.

OpenAI certainly does. And a lot of customers are using ChatGPT on their own. I know non-native English speakers who are using OpenAI to correct spelling mistakes in emails. Or business analytics people using ChatGPT to write Python scripts to do data queries for data in Google Sheets. A very common use is to create TLDR versions of news articles and books.

I'm using ChatGPT to filter emails that are just CC-ed to me, but that don't need my personal attention, and then do a daily summary.

Are these ground-breaking mega-AI use-cases? Not really. But they are highly useful.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 23:16 UTC (Mon) by atnot (subscriber, #124910) [Link]

The point of the discussion was that these minor convenience functions are not solved by LLMs in a resource or cost effective way. I'm sure these things are useful to you, but the question is how many people would still be doing it if they had to pay what it actually cost.

If the answer is supposed to be "corporations" then, well, they can afford the true cost but don't have any worthwhile uses for it. If it's individuals, then sure, those may have some minor uses for it, but wouldn't pay the cost. And the end result is an overhyped technology that's just not useful to anyone unless we assume an endless chain of VCs pumping it forever, putting aside the immense environmental and societal costs for now.

Gentoo bans AI-created contributions

Posted Apr 23, 2024 0:27 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> The point of the discussion was that these minor convenience functions are not solved by LLMs in a resource or cost effective way.

With ChatGPT you pay directly for your use (on a literal per-character basis). This cost easily covers the model runtime cost. Why is it not cost-effective?

> If it's individuals, then sure, those may have some minor uses for it, but wouldn't pay the cost.

Now you're making assumptions. Why do you think regular people won't use AI once it becomes more user-friendly?

Gentoo bans AI-created contributions

Posted Apr 23, 2024 7:48 UTC (Tue) by atnot (subscriber, #124910) [Link]

> With ChatGPT you pay directly for your use (on a literal per-character basis). This cost easily covers the model runtime cost. Why is it not cost-effective?

Because, as pointed out, that's not true. The current pricing is subsidized under the assumption that a) the models will rapidly become obsolete anyway b) the lasting market share advantage will offset the losses. We don't know this for certain with OpenAI except by omission, but we know it for other offerings by more public companies that have nearly identical costs and pricing, e.g. Microsoft.

I think you may be underestimating how resource hungry these things are. Consider the person in another thread here who said they needed two 3090 GPUs to get acceptable output speed for programming. That's $2500 upfront and nearly 1kW continuous power draw just for some autocomplete. Datacenter inference systems will be somewhat more efficient, but the scale of hardware needed to perform these queries is just bonkers.

> once it becomes more user-friendly

You've answered your own question :) It's hard to imagine a more use friendly interface than a text chat, but there is currently no clear route to improvements there either. The so-called hallucination is just inherent and can not be solved. That would require a system where facts are first-class citizens instead of just crossing your fingers and hoping they are statistically likely. As recently shown this type of model also requires exponential increases in training data for linear increases in capability and we're already out of public data to train them on. It is generally questionable whether we can get much better than this by predicting tokens.

So if we're stuck with approximately what we have now, and we know the real costs are many times what the billing prices are, I think it'll be hard to find end-users who consider that a worthwhile investment.

Gentoo bans AI-created contributions

Posted Apr 23, 2024 17:30 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

> Because, as pointed out, that's not true. The current pricing is subsidized under the assumption

No, it's not. I know financials of a couple of a small AI company, and model running is expensive, but it can be done profitably. It's not feasible if you're doing any of the ad-funded "user is the product" crap, but it's doable if your customers actually pay you.

> I think you may be underestimating how resource hungry these things are. Consider the person in another thread here who said they needed two 3090 GPUs to get acceptable output speed for programming. That's $2500 upfront and nearly 1kW continuous power draw just for some autocomplete. Datacenter inference systems will be somewhat more efficient, but the scale of hardware needed to perform these queries is just bonkers.

The power draw is not continuous, you only need to do computation when you're doing a query. This takes in total maybe for a minute or so within an hour. In the OpenAI case, a single hardware node is shared across multiple customers.

The main cost that is not covered is R&D (model training and engineering salary).

> You've answered your own question :) It's hard to imagine a more use friendly interface than a text chat, but there is currently no clear route to improvements there either.

Chat is not great for UI systems, actually. It's too low-level, and you need to do context imports periodically. Just as with any other service, you need application support in many cases. My email classifier is a bunch of scripts that run on my home server, and it's certainly not a good general-purpose solution.

> The so-called hallucination is just inherent and can not be solved.

There are thousands of very smart people working on solving it. I'm pretty sure they'll think of something that will be good enough for practical purposes.

Gentoo bans AI-created contributions

Posted Apr 28, 2024 5:46 UTC (Sun) by ssmith32 (subscriber, #72404) [Link]

I've learned I can't argue with the wishes and religious fervor that permeates all those who buy into the VC-fueled Silicon Valley bubble. Pets.com probably had some amazing annualized numbers too, if you picked the right day to project from.. so did Enron, AIG, lots of folks...

But this

>Nobody seriously doubts that the AI is here to stay, and that it's going to be hugely impactful.

I can 100% say is 100% false.

At least one person, who has received a graduate level degree in compuer science, and has worked in the industry for more years than I care to say, does not think it will be hugely impactful.

And I know others that have at least voiced similar cynicism. Including some with graduate specializations directly in the field of CNNs/autoencoders/etc.

So, yeah, enjoy the ride. Slightly better auto-complete is nice, but hugely impactful, it is not. Oh, and remember the self-driving taxis? Still waiting...

Gentoo bans AI-created contributions

Posted Apr 28, 2024 6:10 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

> I can 100% say is 100% false.

I meant VCs. Also, you should examine yourself for religious fervor.

> Oh, and remember the self-driving taxis? Still waiting...

If I had seen your reply earlier today, I would have written the reply from a self-driving taxi. Waymo exists, and it provides service in SF. It's also slowly expanding its service area.

Pretty much all new advances follow the https://en.wikipedia.org/wiki/Gartner_hype_cycle . The self-driving cars are in the trough of disillusionment, and are slowly climbing to the plateau of productivity.

Gentoo bans AI-created contributions

Posted Apr 28, 2024 11:31 UTC (Sun) by Wol (subscriber, #4433) [Link]

> But this

? >Nobody seriously doubts that the AI is here to stay, and that it's going to be hugely impactful.

> I can 100% say is 100% false.

I think you're reading this all wrong. I think it's going to be hugely impactful - and NOT in a good way.

As so often, the mathematicians (namely the guys writing all this software) seem to think that mathematics dictates the way the world behaves, not describes it. They're busily disappearing into an alternative universe, the problem being that they're trying to force everyone else to live in it ...

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 29, 2024 10:37 UTC (Mon) by NAR (subscriber, #1313) [Link]

I think it's going to be hugely impactful - and NOT in a good way.

I remember the good old days of late-1990s when on Linux-related mailing lists we were "competing" on how much Nigerian scam e-mails we were receiving in a month. There were separate competitions for the number of offers and for the sum value. During this we were sure that the "general population" was safe from this scam as most people didn't speak English and these were obvious scams. Today the most exposed population still doesn't speak English, but the scammers can generate good enough Hungarian text (on the level of an uneducated native speaker) that can easily fool them and they do fall for scams...

Gentoo bans AI-created contributions

Posted Apr 22, 2024 11:30 UTC (Mon) by hkario (subscriber, #94864) [Link]

R&D exists in every industry. That's how we got here, whole civilisation-wise.
But in normal industries, you sell a product and use money from sale to fund more development.
In bubble economies it's the VCs that pony up the money and the "product" is sold at a loss to create a marked or just capture the market. It's unsustainable, but the investment class has the money to burn so they play like that.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 13:50 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link]

R&D does come from ongoing operations in established industries, but it often has to come from somewhere else in order for those industries to establish themselves. There are plenty of industries that got their start from hobbyists spending their personal time and money on something that interested them, or from academic or government-funded research. As an example, computers got their start first as academic projects and then as military ones. They didn't get any interest from industry until that basic R&D was done for them, and even with that they got plenty of government money to help them grow. It's not totally out there for venture capitalists to put money into the initial R&D needed to get an industry off the ground, provided they see a big enough reward if the companies succeed.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 14:54 UTC (Mon) by atnot (subscriber, #124910) [Link]

A difference is that with more traditional investments the entity doing the investment is one that pretty directly stands to benefit from the R&D being done. For example, AT&T investing into developing the transistor, or the US government developing nuclear-resistant networks, as you mentioned. Conversely, VCs are remarkable for just how little they care what the actual technology is. Some even write gloating blog posts about how virtuous they think they are for basing investment decisions solely on how charistmatic they think the founders are and they don't even look at the slide decks or financials.

This makes sense, because at some point the industry realized that their primary way of getting paid is only rarely the technologies actually working out, and more commonly just using their elite connections to find a bigger sucker as fast as possible. This detachment from any of the effects of any of their investments may have some arguable advantages, kodak inventing and then killing the digital camera comes to mind. But it also runs the risk of, say, growing an entire sector around repeatedly wasting a medium sized countries GDP on things that nobody wants or needs and then trying to hype up everyone else to buy it off them for more.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 17:57 UTC (Mon) by Cyberax (✭ supporter ✭, #52523) [Link]

> But in normal industries, you sell a product and use money from sale to fund more development.

That model works for established industries. If you try to do that in a new industry, you can be outcompeted by VC-funded companies who can afford to fund better R&D.

This does create perverse incentives where you're working to please VCs rather than your customers, but it also results in a much faster R&D overall. Think about it as unintentional private funding of academic research, if this makes you feel better.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 18:34 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

> The root cause of bad AI service is companies that see service as a cost to minimize. Today they've decided AI is the way to save money. A few years ago, it was offshoring to a call center in some low cost country. Before that it was underpaid domestic call centers mixed with endless message trees. The common denominator is unwillingness to pay for competent staff.

You forgot one intermediate, which is what they were doing right before LLMs came along - many-to-one chatting. The idea is that each CSR is in four or five (or maybe more, I don't really know the specifics) chats at once, and doing manual timesharing between them (i.e. looking at this chat for a few seconds, then looking at the next one, etc.). Whenever a response is due, they have a series of canned replies they can quickly input (presumably with copy and paste and/or some bespoke software), and most of those are more polite ways of saying "I'll get back to you in a minute," but often phrased in a way that makes it sound like the CSR is doing something other than juggling several chats at once (such as pulling up your file or consulting internal documentation).

You can tell that you're dealing with this kind of customer support based on the following:

1. They subtly encourage you to chat rather than getting on the phone (e.g. the phone has a longer wait time, they won't just give you a phone number and require you to fill out a form first, the phone has one of those awful voice-response menu thingies, etc.).
2. The chat has a queue, but it rarely takes more than a few minutes.
3. Every time you send a message to the CSR, it takes 5-10 seconds (sometimes longer) before they reply.
4. When they do reply, their reply does not meaningfully advance the conversation, and you have to wait another 5-10 seconds for them to give you a real reply.
5. Everything they say reads like it was meticulously proofread and approved in advance by some committee. There are never abbreviations, typos, missing capitalization or punctuation, or other common hallmarks of "real" chatting.

The obvious problem here is that this is much "safer" than using an LLM (see for example the Air Canada fiasco where their chatbot made up a policy and the company was forced to honor it), while simultaneously being relatively cheap (you don't need as many CSRs because they're multiplexed over several chats at once) and providing an experience that is not that much worse than an LLM can provide (the CSRs responses are already grammatically correct and professional, because they've been pre-vetted, so the most the LLM can do is cutting out the "give me a moment" intermediate replies).

Gentoo bans AI-created contributions

Posted Apr 22, 2024 9:25 UTC (Mon) by farnz (subscriber, #17727) [Link]

There's also been a truly horrible thing that I've seen pitched - speech recognition and speech synthesis feeding into a multi-chat CS system. You have the same one worker in a few chats thing, but you combine it with speech recognition feeding the chats, so that you get all the downside you describe, but while you're on the phone with "John in Florida" or similar.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 11:13 UTC (Sun) by josh (subscriber, #17465) [Link]

> we could weed out most I-saw-there-was-a-light-here

The only hit for this phrase is this article's comments section. From context it clearly means people who are completely unqualified or a complete mismatch but applied anyway, but what's the direct interpretation/origin of the phrase?

Gentoo bans AI-created contributions

Posted Apr 22, 2024 11:14 UTC (Mon) by gmgod (subscriber, #143864) [Link]

Oh yeah, it did not occur to me that phrase wouldn't be one.

We have a colleague who, after interviewing a candidate so dreadful we wondered if they did not show up to the *wrong* interview, once said something like "the light was on so I just let myself in" to give a satirizing summary of what the candidate did (not actually) say.

That stuck and we have many variants now, including "I saw there was light here" or "it looked warm in here" and they all mean the same. I don't know where that colleague got that from, whether it was divine inspiration or one French expression he brought with him.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 13:26 UTC (Fri) by LtWorf (subscriber, #124958) [Link]

I tried at work, where I had to write a document.

I gave some guidelines to chatgpt and it generated 10x more text, which contained at some point my guidelines verbatim.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 12:11 UTC (Sat) by donald.buczek (subscriber, #112892) [Link]

> Prompting an AI tool to do "say that the app was missing a feature about how to handle numbers in a commit message" is going to generate a novel's worth of text that the person with broken English won't be able to vet.

Like all tools, LLMs can be used wrongly and correctly. You gave an example of a bad prompt and usage. I agree, that many people use LLMs wrongly with bad outcome and I see some dangers for society.

However, LLMs can also be used correctly. For example, you can pipe a text you've wrote through a LLM and prompt it to optimize the text or make suggestions for improvement. I sometimes use gpt4-turbo for that. In my experience the model is able to make good suggestions, fix errors and optimize the text while preserving its meaning.

I've tried a few local models but, unfortunately, these seem not yet to be on the same level as gpt4-turbo.

True, you can't trust the output, you need to manually review it sentence by sentence to make sure, it still says what you want to say and only that. But errors and hallucinations are the exception not the rule. Often the optimized text is just better and you can take it as is.

It is a service to your readers if your text is free of errors, well structured, polite and to the point. You can very much fine-tune what you get if you tell the model what you want and what recipients you address.

===

As an example, I've wrote the above text without thinking too much about it. Probably there are a bunch of typos and bad style in it. Now I asked gpt4-preview via the api to "Optimize the following text, which is a comment in lwn.net for clarity and conciseness."

Here's the output, which I intentionally left untouched for demonstration purpose.

===

LLMs, like any tool, can be misused or used effectively. Misuse can lead to negative outcomes and societal risks. However, when used correctly, such as refining or suggesting improvements to a text, LLMs like gpt4-turbo can enhance clarity, fix errors, and maintain the original meaning. While local models may not yet match gpt4-turbo's capabilities, it's crucial to review the LLM's output to ensure accuracy and avoid misinformation. Properly optimized texts benefit readers by being error-free, well-structured, and clear, especially when the model is guided by specific instructions and audience considerations.

===

If I was to use that model for my "real" reply, I would further work on it iteratively. I'm not happy that it didn't mention that you provided a bad example. I'm not happy about "Misuse can lead to .. societal risk" which is not what I wanted to say. I didn't specify what societal risk I see or that misuse is a cause of it. In the end, I would probably just drop the vague side-note, which would be an improvement.

Not being an native English speaker, I'm not sure about the "While local models ..., it's crucial to ..." construct. Does it make a connection between the statements of the two sub-sentences, which in my intent are totally independent? I would either research that question (learning something by the way) or replace the construct out of my uncertainty [1].

So, IMO, LLMs can very well be use to optimize text, for example commit messages. But you need to write the text yourself in the first place, than use the LLM, than carefully review the result. If you don't like it, amend the prompt, rinse and repeat.

[1] However, now that I've mentioned it, I think I don't need to research it to satisfy my curiosity, because I more or less expect a lecture from a certain member of this community :-)

Gentoo bans AI-created contributions

Posted Apr 20, 2024 13:36 UTC (Sat) by Wol (subscriber, #4433) [Link]

> [1] However, now that I've mentioned it, I think I don't need to research it to satisfy my curiosity, because I more or less expect a lecture from a certain member of this community :-)

Well, the LLM's output certainly feels "stilted", or off, somehow.

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 20, 2024 16:14 UTC (Sat) by donald.buczek (subscriber, #112892) [Link]

> Well, the LLM's output certainly feels "stilted", or off, somehow.

I'm a bit disappointed by your restrained reply, I hope I didn't offend you, which was surly not indented. That was supposed to be a harmless joke.

@gpt3-5 (with gpt4-turbo currently being unavailable to me for yet unknown reasons) : "The following text feels a bit stilted. Make it more colloquially without changing the meaning. Avoid passive constructs. Drop the reference to the risk for society. Drop the comparison of gpt4-turbo with local models.<ret><ret><text from above>"

===

LLMs are like any other tool - you can either use them right or mess things up. If you misuse them, you'll end up with a lot of problems. But if you use them the right way, like tweaking or suggesting improvements to a text, LLMs can make things clearer, fix mistakes, and keep the original meaning intact. It's super important to check the stuff LLMs churn out to make sure it's right and avoid spreading false info. When texts are optimized properly, it helps readers by being error-free, well-organized, and easy to understand, especially when the model follows specific instructions and considers the audience.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 7:36 UTC (Sun) by Wol (subscriber, #4433) [Link]

> > Well, the LLM's output certainly feels "stilted", or off, somehow.

> I'm a bit disappointed by your restrained reply, I hope I didn't offend you, which was surly not indented. That was supposed to be a harmless joke.

Well, my reply was - while true - intended to be tongue-in-cheek. And of course, when you're worried about your own English, and the LLM gets it worse, of course that's funny :-)

And no, I'm not offended. Speaking four languages to varying ability (and a brit to boot!), I know how hard it can be to express yourself in another language. It's when native speakers can't be bothered to try and get it right that I get upset. After all, English is reputed to be one of the hardest languages to learn (and then you've got American, Strine, Pidgin, and all the others with their quirks on top :-)

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 21, 2024 12:43 UTC (Sun) by donald.buczek (subscriber, #112892) [Link]

> > > Well, the LLM's output certainly feels "stilted", or off, somehow.
> > I'm a bit disappointed by your restrained reply, I hope I didn't offend you, which was surly not indented. That was supposed to be a harmless joke.
> Well, my reply was - while true - intended to be tongue-in-cheek. And of course, when you're worried about your own English, and the LLM gets it worse, of course that's funny :-)

"Indented". OMG, how comes you only see your mistakes after you send/post/print something? I should have had it reviewed by an LLM or spouse; the former would have pointed the error out. And thank you for you not picking on it.

Btw: We diverted a bit into whether LLMs can help when you need to communicate in a language which you are not perfect in. In my opinion, they can. But it's not only with foreign languages. I occasionally ask an AI to review my German texts, too, and most of the time the reviews are helpful and the suggestions are good. I often accept them.

And I've replied to a comment which mentioned AI-tooling for text (commit messages). I don't want to open a whole new discussion with some of the arguments being the same, but I do think that using code-trained AIs can be very helpful in the programming domain, too.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 15:35 UTC (Sun) by Wol (subscriber, #4433) [Link]

> "Indented". OMG, how comes you only see your mistakes after you send/post/print something?

> surly not indented

There's an even bigger typo in there - I'm really sure you didn't mean that !!! :-)

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 21, 2024 19:49 UTC (Sun) by donald.buczek (subscriber, #112892) [Link]

My error rate of 66% surely makes me surly.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 16:56 UTC (Sat) by atnot (subscriber, #124910) [Link]

> Well, the LLM's output certainly feels "stilted", or off, somehow.

AI writing always makes me feel like those eerie pictures of thousands of faces averaged together. Yes, that may be a face and my brain recognizes as somewhat pleasant, but it's definitely not human.

It's somewhat worse for LLMs because one voice they have been specifically trained with is corporate american "as per my last email" customer service politeness, which leaves everything coming out of it with the characteristic charme and wit of a moist car insurance sales brochure.

Now, I'm sure there's people out there whose writing could be improved with the vacuous input of a thousand first year copywriting hires whose cover letter said they were "very enthusiastic about sewage lift pumps". But I really really don't want to live in a world where that normalized or encouraged, or necessary to suck the life out of everything that way. At least, more so than it already is. It's one of the great joys of reading, it's why I subscribe e.g. to this website, and it kills me inside a little every time I see someone think so little of the art of writing to sterilize their voice in this way of their own free will.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 18:44 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

The unfortunate reality is that much writing is already like this with or without LLMs. If you work for a large corporation, you both read and write* like this every day. The object of the game is not to produce good quality writing, it's to produce writing that communicates specific information or opinions, to as wide of an audience as possible, in a tone and style that is unlikely to confuse or offend anyone.

Of course, many people detest that game. But, on the other hand, money. Corporate jobs generally pay well and provide good benefits. If I have to write something bland and inoffensive every day, that's an entirely fair price of entry in my eyes.

* Freudian typo I actually made here: "writhe"

Gentoo bans AI-created contributions

Posted Apr 21, 2024 7:46 UTC (Sun) by Wol (subscriber, #4433) [Link]

> Of course, many people detest that game. But, on the other hand, money. Corporate jobs generally pay well and provide good benefits. If I have to write something bland and inoffensive every day, that's an entirely fair price of entry in my eyes.

That game is very good for corporates - it's very good at blame shifting.

And it's both encouraged - and damaging - in Science. It lends a "third person" air of authority which is often not deserved. Which is why the raid wiki style guide says "Write in the first person. Take personal responsibility for what you write". It's much pleasanter to read, as well :-)

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 21, 2024 15:39 UTC (Sun) by Wol (subscriber, #4433) [Link]

Oh - I'll add - Michael Faraday was a "first person" person. Which was one of the reasons he was a successful/famous scientist - he could communicate.

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 20, 2024 21:04 UTC (Sat) by donald.buczek (subscriber, #112892) [Link]

> customer service politeness

Yes, I understand. Nonetheless, in the domain of the technical references and descriptions, I don't want to see an authors personality shine through, neither polite nor rude nor funny. Just condensed, right to the point technical information. The text object, this subthread started with, was a commit message.

But any other text, even those which should show emotions and personality, can be improved if unintentional errors are removed. LLMs are good with text, they can point to possible errors. From spelling over grammar to even more abstract levels. Spell checkers on steroids. You can very much choose to what degree and how exactly you use the tool for. You don't need to let it rewrite your text into something without a soul.

---
Rephrase the following text into the style of a pirate from a fictional, funny, over the top pirate story. <above text>
---
Arrr, me hearties, listen well! When we be navigatin' the treacherous waters of technical manuscripts and parchments, I yearn not for a glimpse of the author's spirit, be it courteous, churlish, or full of jest. Give me straight, no-nonsense treasure maps of knowledge, I say, starting with the scrawl on the log that kicked off this here discussion.

But mark me words, any other bilge, even that which ought to be brimming with life and character, could stand to lose a few barnacles if it means cleaning up unintended blunders. Us sea dogs have our sea-lore chroniclers - LLMs, the mighty beasts of text, capable of sniffing out a whole array of missteps. From the cursed spellin' errors to the murky depths of grammar and beyond, they're like our spell-checkin' parrots on a hefty dose of grog. Ye be the captain of this vessel, choosing how to steer with this tool at your disposal. There's no need to set a course for blandness, devoid of the soul. Avast! Let your message fly true, without losin' its heart!

Gentoo bans AI-created contributions

Posted Apr 20, 2024 21:12 UTC (Sat) by kleptog (subscriber, #1183) [Link]

> It's somewhat worse for LLMs because one voice they have been specifically trained with is corporate american "as per my last email" customer service politeness,

So tell it to use a different voice. It can't read your mind you know. If you want it to talk in Southern US slang, it can do that.

There are services out there that will, for a fee, lookup all public communications you've ever made and any private communications you provide, and then create a customised chatbot that will respond with your exact writing style. That's the thing that makes the transformer architecture so revolutionary: it's so easy to customise, anyone can do it.

People sometimes forget it's just a computer therefore doesn't have all the subtle context cues conversations with people have. So if you don't tell it to produce an output appropriate for a Linux kernel commit message, it's not going to figure that out itself. Learning how to configure an LLM to produce output suitable for the context is not very hard, and can be learned by anyone in an afternoon.

The idea you're going to be able to tell someone is using an LLM is on the same level as whether you can tell someone is using a spell checker. If done well, you're not going to notice at all.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 2:26 UTC (Sun) by khim (subscriber, #9252) [Link]

> If done well, you're not going to notice at all.

And even if you do notice that it may not be a bad thing.

Have you ever tried to write to support of Chinese companies in an era before LLMs?

These answers were sure written by humans back then but oh, boy, if you think customer service politeness is something to complain about… you haven't seen what the majority of this small globe called Earth population produces in writing.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 16:18 UTC (Sat) by rgmoore (✭ supporter ✭, #75) [Link]

So, IMO, LLMs can very well be use to optimize text, for example commit messages. But you need to write the text yourself in the first place, than use the LLM, than carefully review the result. If you don't like it, amend the prompt, rinse and repeat.

And one might reasonably ask if this is worth the bother compared to revising the text yourself. If you can't trust the AI to do a good job, you'll probably spend as much time reviewing its work as you save by having it do that work for you. Maybe some future version will be better, but for now it's of questionable benefit.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 16:43 UTC (Sat) by donald.buczek (subscriber, #112892) [Link]

> And one might reasonably ask if this is worth the bother compared to revising the text yourself.

To me it is. Its difficult to see our own errors or find alternative structures and expressions, once you've settled for something. That's the reason why we sometimes give important texts to someone else to proofread it or make suggestions for improvements from another perspective.

With LLMs you don't need to bother your spouse :-)

===
Point out any error or bad style in the following text.<ret><ret><above text>
===

In the text provided, there is a spelling mistake in the second sentence. It should be "It's difficult" instead of "Its difficult." Additionally, the phrase "once you've settled for something" could be rephrased for clarity and conciseness.

The sentence "That's the reason why we sometimes give important texts to someone else to proofread it or make suggestions for improvements from another perspective" is a run-on sentence and can be divided into two separate sentences for better readability.

The informal smiley face ":-)" used at the end could be considered unprofessional in some contexts.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 2:36 UTC (Sun) by khim (subscriber, #9252) [Link]

> If you can't trust the AI to do a good job, you'll probably spend as much time reviewing its work as you save by having it do that work for you.

Where this level of arrogance comes from and why this haven't worked in era before AI when spellcheckers where introduced?

Sure, AI couldn't write text better than most native speakers.

But for every English native speaker there are three non-native cpeakers and twenty more who don't know English at all.

Which means that for 95% of Earth population AI **already** does job better then what they can do, themselves.

I, for example, don't know where and how to use “a” and “the”. And would, probably, never learn because my native language doesn't even have a notion of article. As in: such part of language don't exist, I have no idea why anyone would want to use it, where and why.

If AI may just only add these correctly this would already be worth for me to reread what it wrote. And it may do more.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 7:43 UTC (Sun) by Wol (subscriber, #4433) [Link]

> I, for example, don't know where and how to use “a” and “the”. And would, probably, never learn because my native language doesn't even have a notion of article. As in: such part of language don't exist, I have no idea why anyone would want to use it, where and why.

And how many other languages have THREE different forms of the present tense? Which one do you use where? That's another classic "this guy is a foreigner" giveaway ...

(btw, khim, the difference between "a" and "the" is easy to explain and hard to apply. If you (could) have several, and don't care which, then it's "a". If you have several and *do* care which you're talking about, then it's "the". "I saw a/the cat in the street". "a" means there are a lot of cats around, and it's a casual comment you saw one of them. "the" means you were talking about a cat earlier, and it's that specific cat.)

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 21, 2024 8:20 UTC (Sun) by khim (subscriber, #9252) [Link]

> btw, khim, the difference between "a" and "the" is easy to explain and hard to apply

The problem is not that it's hard to understand, the problem is that to do the right choice I need to think about something that I normally don't care about at all.

It's similar to palatalization to English speaker: compare ugol' to ugol: these are certainly sounding not completely identically, but would you care about that difference enough to hear and reproduce that difference in casual speech? I'm yet to see any English speaker who can reliably do that. Simply because that's not something they are trained to perceive.

I know the difference between “a” and “the” and if life (or, more, likely, my work permit) would depend on that difference I can use them correctly… most of the time. Just like people (even non-native speakers) can spell words correctly… most of the time. But similarly to how spellchecker effortlessly catches cases where there are no ambiguity with spelling so AI does that for me with articles: instead of deciphering my own text and looking at it from angle that is just not natural for me I may spend my mental efforts on something else.

English have it easy, BTW. Try to ensure that you are using articles in German correctly some day.

> And how many other languages have THREE different forms of the present tense?

You would be suprised. Latin have six tenses and because it was used in so many countries for so long that crazyness leaked out into them, too. Only it has mutated in the process and different languages got similar yet different ideas about how time passes.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 13:36 UTC (Sun) by Wol (subscriber, #4433) [Link]

> English have it easy, BTW. Try to ensure that you are using articles in German correctly some day.

German is my second language. I know :-)

> > And how many other languages have THREE different forms of the present tense?

> You would be suprised. Latin have six tenses and because it was used in so many countries for so long that crazyness leaked out into them, too. Only it has mutated in the process and different languages got similar yet different ideas about how time passes.

Well, I was taught we have the same 6 tenses. It makes perfect sense to me.

But another reply tells me we have FOUR present tenses - I'm not aware of the fourth ... that's for just ONE Latin present tense ...

(I program, I am programming, I do program. I'm not aware of any other European language with multiple present tenses - doesn't mean there aren't any, I've just never heard of any...)

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 21, 2024 16:55 UTC (Sun) by malmedal (subscriber, #56172) [Link]

> But another reply tells me we have FOUR present tenses - I'm not aware of the fourth ... that's for just ONE Latin present tense ...

They are:
Present simple I work
Present continuous I am working
Present perfect I have worked
Present perfect continuous I have been working

These are common, e.g. equivalent in Spanish:
Yo trabajo
Yo estoy trabajando
Yo he trabajado
Yo he estado trabajando

(apologies if I am messing up the conjugation)

German also has present simple and present perfect, it is missing the continuous forms.

Anyway, as someone who has studied multiple foreign languages. English is by *far* the easiest to deal with.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 19:11 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

This is complicated enough that linguists usually divide it up into three (or occasionally four) parts:

* Tense - Usually past, present, and future.
* Aspect - Simple ("perfective"), progressive (continuous), and perfect. Some languages have imperfect or other aspects, which English lacks.
* Mood or modality - Anything marked with a modal verb (the auxiliary verbs used for the progressive and perfect do not count as modal in this formalism). Sometimes subcategorized into indicative and subjunctive modalities (but we can further categorize into conditional, counterfactual, normative or deontic, etc.).
* Evidentiality - Not used in English, and for that matter not used in most/all Indo-European languages. Some languages mark verbs to indicate why the speaker believes the asserted fact to be true, for example distinguishing between something directly witnessed and something indirectly reported. Some authorities consider this an extension of modality.

One confusing thing about English is that English has no future tense in this formalism. The future is sometimes marked with the modal verb "will," so you could say that the future is a modality. But modalities are optional in a way that tenses are not, and so we can have sentences like "Tomorrow, I'm buying a new laptop," which is semantically happening in the future but has no grammatical marker indicating as much (and if you chop off the "tomorrow" prefix, it's a perfectly good present continuous sentence without a whiff of the future).

On the other hand, there are languages (such as Mandarin Chinese) that have only one tense in this formalism. Those languages either treat all time information as modal, or do not have grammaticalized time markers at all. Of course, speakers of those languages are perfectly capable of distinguishing between past, present, and future. Every language can do that. But in these languages, time information is truly optional. You can say "I buy a new laptop," as a discrete event (rather than the habitual or indefinite sense that English simple present would normally imply), without specifying when that happens.

Another "fun" property of English is do-support: There are some constructions in English which grammatically require a modal verb (e.g. turning a declarative sentence into a yes-no question), but modals are semantically optional in English, so the word "do" (or "does") is used as a placeholder when no modal is required. In general, English does a lot of fronting and other grammatical rearrangement when building different constructions, and I would imagine that this annoys speakers of Spanish just as much as Spanish verb conjugations tend to annoy speakers of English.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 19:57 UTC (Sun) by donald.buczek (subscriber, #112892) [Link]

It fills me with great satisfaction that I am now receiving the prophesied English lecture after all.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 20:31 UTC (Sun) by malmedal (subscriber, #56172) [Link]

I find https://www.ithkuil.net/ fascinating, not fascinating enough to actually learn it, but still interesting to what sort of concepts you can cram into a language.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 20:55 UTC (Sun) by donald.buczek (subscriber, #112892) [Link]

> German also has present simple and present perfect, it is missing the continuous forms.

No, German has that, too. "I am working" would be "Ich bin arbeitend". Yes, this isn't used much and sounds a bit strange but it is valid. It strongly indicates "right at this very moment".

Now, to make things even more complicated: A male "worker" is a "Arbeiter" and a female worker is a "Arbeiterin". A worker or a group of workers with unknown, irrelevant or mixed gender would be "Arbeiter", too. But many people now reject the generic masculine.

So currently, several gender neutral forms compete with the old generic masculine and with each other. For plural, the substantivized form of the verb in the present continuous tense is often used: "Arbeitende".

So the seldom-used tense got a little revival lately.

I don't like it, because to me the natural interpretation of "Arbeitende" would be "people, who work at this very moment".

Gentoo bans AI-created contributions

Posted Apr 21, 2024 21:14 UTC (Sun) by malmedal (subscriber, #56172) [Link]

> "I am working" would be "Ich bin arbeitend".

Mmmm, I believe arbeitend functions as an adverb, in this sentence. That is "I am something" and the something that I am is "working".

Gentoo bans AI-created contributions

Posted Apr 22, 2024 7:25 UTC (Mon) by donald.buczek (subscriber, #112892) [Link]

> Mmmm, I believe arbeitend functions as an adverb, in this sentence. That is "I am something" and the something that I am is "working".

After reading a bit I have to admit that you are more correct than I am. The word form 'arbeitend' is known as 'Partizip I' (Present Participle) in German, which functions as a hybrid between a verb and an adjective/adverb.

This is from the beginning of the German variant of the Wikipedia of "Participle" / "Partizip" page:

> A participle (Latin participium, from particeps "participating"; plural: participles) is a grammatical form (participial form) that is derived from a verb and thereby partially acquires the properties of an adjective, but also retains some properties of a verb. The term "participle" and likewise the German term Mittelwort express this characteristic of participating in two categories at the same time, namely verb and adjective. [...]
> German examples of participles are the forms ending in -end like spielend (to the verb spielen; called "Present Participle") and the forms starting with ge- like gespielt (called "Past Participle"). In traditional grammar, participles were often listed as a separate part of speech alongside verbs, adjectives, nouns, etc.; however, this view is not shared in modern linguistics, where participles are considered as words or even constructions that contain varying proportions of verbal and adjectival
components.

You are correct; the Present Participle ('Partizip I') is not typically used in natural speech as a verb to denote an 'immediate' tense. "Ich bin arbeitend" can be said to be wrong and is not listed in tables with verb tenses. At least, it sounds yoda-ish.

As you mentioned, The Present Participle is used as an adjective or adverb to describe a state. It's also used, a bit more verbish, to indicate simultaneous actions, for example, "Die Kinder kamen lachend aus der Schule" ("The children came out of the school laughing").

Gentoo bans AI-created contributions

Posted Apr 27, 2024 2:19 UTC (Sat) by gutschke (subscriber, #27910) [Link]

Just to throw another wrinkle into this discussion, I believe that in "ich bin arbeitend", the "arbeitend" would be a predicative expression, which is different both from an adverb and from a way of expressing what English would do with present continuous. It simply describes a state that you are in. For a better discussion, see https://en.wikipedia.org/wiki/Predicative_expression

I understand why it is tempting to say that the present participle is used to form a present continuous. That would feel very natural to an English speaker who is familiar with Latin. And it feels almost but not quite as if German should do the same. In fact, I wouldn't be surprised if some regional German dialects did this. There is a lot of cross pollination between all of these languages, but in the process grammatical concepts get repurposed and subtly change.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 18:07 UTC (Sun) by khim (subscriber, #9252) [Link]

> I program, I am programming, I do program.

And where do you see different present tenses?

> Well, I was taught we have the same 6 tenses. It makes perfect sense to me.

Well… to some degree it even makes sense to me. Just different from what sense it makes to you. Please open that Wikipedia link, scroll down literally dozen of lines and read:

English has only two morphological tenses: the present (or non-past), as in he goes, and the past (or preterite), as in he went.

WHA… what happened to these “six tenses”? That's also explained right there, too:

The study of modern languages has been greatly influenced by the grammar of the Classical languages, since early grammarians, often monks, had no other reference point to describe their language. Latin terminology is often used to describe modern languages, sometimes with a change of meaning, as with the application of "perfect" to forms in English that do not necessarily have perfective meaning, or the words Imperfekt and Perfekt to German past tense forms that mostly lack any relationship to the aspects implied by those terms.

> I'm not aware of any other European language with multiple present tenses - doesn't mean there aren't any, I've just never heard of any...

English times are “different” for the same reason English inches, feet and miles are different from meters, that everyone else uses. Difference is not in language per see, it's in how it's teached. Just why you say that I am programming is separate time while I love programming is not?

In reality most European languages may also use verbs to adjust time perception, they just don't [try to] pretend it's, somehow, grammatically different time and are [slowly] adopting the rules what actual linguistic designed and not [try to] pretend everyone is talking in a variant of Latin.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 22:14 UTC (Sun) by Wol (subscriber, #4433) [Link]

> > I program, I am programming, I do program.

> And where do you see different present tenses?

Because they have completely different meanings?

"I am a programmer, I program" - it's my job, I do it all the time, I may - OR MAY NOT - be doing it right now (I'm not - I'm busy writing right now :-)

"I am programming" - I'm not, I'm not doing it now, I'm writing.

"I do (not) act" - this variant is almost always either emphatic or negative - and when negative it implies "never".

So the first variant is the continuous present, I may not be doing it right now but it happens past present and future. (In the positive it also does not necessarily imply "right now".)

The second variant is the present - it's happening right now.

The third variant - I'm not sure what it's called - is almost always used to imply "never".

Three clearly different meanings.

To jump on your mention of "English has "he goes" and "he went"", what do you understand by the two sentences

"Jim is going to the gym" and "Jim goes to the gym". I was taught they are two - clearly different - present tenses. Because they have two - clearly different - meanings.

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 22, 2024 4:38 UTC (Mon) by khim (subscriber, #9252) [Link]

> And where do you see different present tenses?

Sure. And I love programming would be another meaning and I teach programming yet another one.

Does it mean there are bazillion times in English? Loving present, teaching present and so on?

No, there are two times and many verbs of which few selected ones are interpreted by teachers as “yet another time”.

English is not unique and not even particularly hard WRT to how it treats time (other languages have many other and different ways to talk about time passage, too). What is inique is absolute refusal to change anything anywhere for any reason in teaching of said language.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 7:38 UTC (Mon) by Wol (subscriber, #4433) [Link]

> No, there are two times and many verbs of which few selected ones are interpreted by teachers as “yet another time”.

Let's apply some very simple logic.

"I program" = "I am programming"

therefore true = false

Unless you live in Crete, these two cannot be same, therefore while they are both present, they have to be different present tenses. And I don't know about you, but this confusion is one of the absolutely standard ways by which we detect foreign speakers ... it's a VERY common mistake. (Coupled with the occasional giveaway of "I programming" which simply doesn't exist in standard English.)

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 22, 2024 10:29 UTC (Mon) by kleptog (subscriber, #1183) [Link]

This is one of the classic mistakes Dutch people with poor English make.

The phrase "ik ga naar school" in Dutch can mean either "I go to school" or "I am going to school (now)" depending on the context. For some reason Dutch people often throw in the "am" when it is unnecessary, but other Dutch don't pick up the problem either. Once you point it out to them it usually corrects fairly quickly, but it's fascinating that the same type of error keeps popping up.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 11:00 UTC (Mon) by paulj (subscriber, #341) [Link]

I'm confused, how is the am in "I am going to school" unnecessary in English? "I go to school" in English - at least in most of the Celtic Isles - would sound a little foreign. Indeed, it sounds... Dutch. ;)

Also (and ICBW, I've never really had native /adult/ dutch, and it's been a long time since I had native child's dutch), but could a dutch person not be more precise with "Ik ga nu naar school" for "I am going to school now"? Also, "Ik ga zo naar school" for "I am going to school shortly"? Part of the problem with dutch is it has become very terse, and dropped a lot of constructs - even in my lifetime AFAIK. (??).

Gentoo bans AI-created contributions

Posted Apr 22, 2024 11:45 UTC (Mon) by Wol (subscriber, #4433) [Link]

> I'm confused, how is the am in "I am going to school" unnecessary in English? "I go to school" in English - at least in most of the Celtic Isles - would sound a little foreign. Indeed, it sounds... Dutch. ;)

My daughter goes to school - and she's 40. She's a deputy head :-)

The "am" is WRONG (not unnecessary, wrong) if it's school holidays :-) "I go to school" typically means "I am a student", while "I am going to school" means I'm on my way right now.

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 22, 2024 21:04 UTC (Mon) by kleptog (subscriber, #1183) [Link]

Yeah, I guess I'm not explaining myself very well. It's not that the "am" is unnecessary in general, but that most of the time they mean the variant without. So you get conversations like:

A: What do you do during the day?
B: I am working.
A: (confused) Clearly you are sitting here having a drink? Oh you mean "I work".

It's not that some languages cannot express certain tenses, given enough words you can express any tense in any (sufficiently advanced) language. It's whether certain tenses have a special status in the grammer of a language. Generally similar concepts in different languages are linked in different ways which leads to people learning the language using words in ways a native speaker finds confusing.

But yes, this is a post about Gentoo, so better leave it at that.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 13:48 UTC (Mon) by khim (subscriber, #9252) [Link]

> Once you point it out to them it usually corrects fairly quickly, but it's fascinating that the same type of error keeps popping up.

What's fascinating about that? You are using less flexible language and are forcing someone to pick between two choice that to him (or her) are almost undistinguishable. Of course there would be mistakes!

It's like an attempt of someone to write perl program for the first time. Learning when should you use `$` and when should you use `@` with arrays names is non-trivial, to say the least.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 13:43 UTC (Mon) by khim (subscriber, #9252) [Link]

I don't know whether continuing is constructive at this point. You are sprouting the same kind of nonsense that you sprouted when undefined behavior was discussed and ignore everything except what you believe to be true. Even if your believe don't even remotely match the reality.

> Let's apply some very simple logic.

If by now “logic”, in English, means “random sequence of letters without any clear meaning”, then I guess I learned some kind of wrong English.

> "I program" = "I am programming"

therefore true = false

Unless you live in Crete, these two cannot be same

Sure, they are not the same, but so are sentences “I am”, “I like”, “I like programming”, “I teach”, “I teach programming” and many others.

> therefore while they are both present

Yes. And they are present in most other human languages. Or do you believe other languges couldn't distinguish without person who is programming for living and person who programs something right now, this very second? They can, that's not a reason to introduce some nonsense bazillion present tenses.

Why does it may surprise that not all things that may happen in present have the same meaning… or why have you decided that alls these sequences of words should be split into three semi-randomly picked present times?

> they have to be different present tenses

Why? Why “I like programming” or “I teach programming” don't need different present tenses, but “I am programming” needs it?

> And I don't know about you, but this confusion is one of the absolutely standard ways by which we detect foreign speakers ... it's a VERY common mistake.

Yes, but is it because English have more “more present tenses” or… because has it “less present tenses”? I would say that it's because it has less.

It's the same story as with articles: similarly to how most of the time difference between “a” and “the” is meaningless (can be picked from the context easily and can be easily conveyed if needed) difference between “I program” and “I am programming” exist but it's not useful! Of course other languages can distinguish between these two forms if needed, it's just most of the time there are no need to distingush them.

Worse: the form that is conveying more often needed meaning (that I'm programming right now) is longer and more complicated.

English is similar to BASH here: like in BASH you may want to write $* or "$@" and, most of the time, short form is not needed and not used so English insist on use of longer form where difference between two forms are meaningless (e.g. on a programmer's forum saying that you know how to program is not useful but saying that you are in process of writing program is useful).

> Coupled with the occasional giveaway of "I programming" which simply doesn't exist in standard English.

Indeed. English grammar is extremely inflexible, rigid and, I would even say, “strange”. It takes a long time for a speaker of some other language where words don't come in a sentence in any particular order to adjust to it.

English, of course, have no choice because it has words that may sound identically when used as noun and as verb, but, again, problem arises not when English offers you more capabilities (you may just ignore them) but when it doesn't have capabilities that other languages have (similarly to how translating program from statically-typed language to dynamically-typed is easy but going in the other direction is not).

On continuing

Posted Apr 22, 2024 13:57 UTC (Mon) by corbet (editor, #1) [Link]

Indeed, this conversation has gone fairly far afield, and it seems like a good time to wind it down.

Remember Gentoo? ... this is an article about Gentoo ...

Gentoo bans AI-created contributions

Posted Apr 21, 2024 9:29 UTC (Sun) by malmedal (subscriber, #56172) [Link]

> And how many other languages have THREE different forms of the present tense? Which one do you use where? That's another classic "this guy is a foreigner" giveaway ...

English has four, I believe, and this is the case for pretty much every language in the Indo-European language group. It is not rare.

In fact English has lost a number of constructs common in related languages, .e.g. "he is arrived" vs. "he has arrived".

Gentoo bans AI-created contributions

Posted Apr 22, 2024 3:17 UTC (Mon) by ssmith32 (subscriber, #72404) [Link]

It's not arrogance.

If you don't understand the language, you won't be able to review the output.

If you barely understand the language, it's either going to take a loooong time to review or you're just not doing it right.

It's like writing code vs testing and debugging. Writing it out is the easy part (relatively speaking).

Gentoo bans AI-created contributions

Posted Apr 22, 2024 4:54 UTC (Mon) by khim (subscriber, #9252) [Link]

> It's like writing code vs testing and debugging. Writing it out is the easy part (relatively speaking).

How much code in Haskell or Rust you wrote?

Whether writing code is an easy part or not depends very much on what you are writing and how.

Sure, if you are using language which allows you to write something like [] + {} and get nonsense output without any errors then writing code is easy and testing and debugging is tedios and time-consuming part.

If you use something like Haskell or Rust then writing the code is the majority of your work and if you are using something like WUFFS then writing the code that compiler accepts is 99% of work.

> If you don't understand the language, you won't be able to review the output.

Sure, but that's where AI and human complement each other: for human it's easier to understand unfamiliar language than to write sentence in unfamiliar language, while for AI it's the opposite. So by allowing AI to create something that is “looking nice” (task which current generative AIs already perform better than non-native speakers) and giving human the task that s/he does well you reduce the time needed to create the final result. That's true both for programming language like Python or Ruby and regular language like English or Chinese.

I have no idea why is it so hard to accept when it's obvious. It's the exact same reason spellcheckers work, after all.

Heck, do an expriment: try to write some simple program in language that you have never used before (Haskell, Scheme, or maybe APL or MUMPS) and compare to the time needed to first learn said language and then write something.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 7:23 UTC (Mon) by anselm (subscriber, #2796) [Link]

If you use something like Haskell or Rust then writing the code is the majority of your work

I don't buy that. There will still be loads and loads of bugs in people's code even if the compiler accepts it. Haskell and Rust may be nice languages but it's not as if they made debugging unnecessary.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 7:43 UTC (Mon) by khim (subscriber, #9252) [Link]

> Haskell and Rust may be nice languages but it's not as if they made debugging unnecessary.

This, of course, depends to a large degree on how you structure your code and, even more importantly, how you structure your data.

The determined Real Programmer can write FORTRAN programs in any language, after all.

But if you structure your code to embed enough domain knowledge in the data types then yes, debugging becomes mostly unnecessary. And even if you do need to debug things you know where to go: to these pesky few corner cases which you cut because you had poor understanding on what your program actually should do.

Gentoo bans AI-created contributions

Posted Apr 18, 2024 17:40 UTC (Thu) by atnot (subscriber, #124910) [Link]

I personally increasingly feel like this is going to solve itself when the companies that offer these services start charging for their actual cost. Instead of subsidizing it as they do now under the assumption that it'll give them a valuable market position, increase their valuation with hype and that the models will soon be obsolete and replaced by magitudes better and cheaper ones anyway etc.

It's just kind of hard to imagine someone paying $100/mo or sitting there with their GPU roaring for hours on end while coding just for some moderately improved autocomplete.

Gentoo bans AI-created contributions

Posted Apr 18, 2024 17:54 UTC (Thu) by snajpa (subscriber, #73467) [Link]

Umm, haven't they said the same thing about the shared e-scooters, ride-sharing, couch-sharing, etc.? That it will solve itself? :)

As long as there are always new investors ready to pour resources in, it won't solve itself, certainly not in the way you think. They might actually manage to make inference dirt cheap, so they could afford to stay at these subscription levels, while even making profit. I don't see why not. The hardware hasn't even really started moving in the direction of cheaper inference yet, but it will.

Gentoo bans AI-created contributions

Posted Apr 18, 2024 17:58 UTC (Thu) by snajpa (subscriber, #73467) [Link]

btw the improved autocomplete from Github is $100/year, not $100/month - and so far, at least to me, it's been worth every penny :)

Gentoo bans AI-created contributions

Posted Apr 18, 2024 18:00 UTC (Thu) by snajpa (subscriber, #73467) [Link]

(*and* I got three RTX 3090 sitting around here just so that I can play around these so-called improved autocompletes :D weren't even that expensive, 2nd hand from a miner)

Gentoo bans AI-created contributions

Posted Apr 19, 2024 18:30 UTC (Fri) by intelfx (subscriber, #130118) [Link]

> I got three RTX 3090 sitting around here just so that I can play around these so-called improved autocompletes

Is there anything of that sort (I mean, LLM-powered code assistance, Copilot-grade quality) that can actually be used locally? Any pointers?

(There is JetBrains' FLCC which runs on the CPU, but it is really not much better than lexical autocompletion. I'm talking about more powerful models.)

Gentoo bans AI-created contributions

Posted Apr 19, 2024 22:28 UTC (Fri) by snajpa (subscriber, #73467) [Link]

So far the closest to Copilot experience was with phind-codellama-34b-v2.Q4_K_M (GGUF format, llama.cpp and derivatives eat that, fits one 3090; bigger models are too slow to respond IMO) + Twinny extension for VS Code - though next time I get to it (ie. when my ISP has an outage so I have to use flaky backup LTE) I'm going give the Continue ext another shot; phind-codellama-34b-v2.Q4_K_M isn't as good as Copilot, but I haven't tried to modify the prompts the plugins feed to it, from the behavior I get I think there's a lot of room for optimization there.

Outside of code completion, people really ought to try the miqu-1-70b "leak", which can fit onto two 24G cards, to see where the state of the art is (or was, not that long ago) - comparatively to how much resources it needs to run... text generation with this thing is just about the most boring thing once can do, it IMHO doesn't deserve as much attention as it is getting; when we finally get an open- (or at least published-) weights models with those current extended "up to 1M"-class context window sizes, combined with QLoRA, I think people are going to make some amazing things with it. For me, the 32k context size is currently the most limiting factor.

Gentoo bans AI-created contributions

Posted Apr 18, 2024 18:55 UTC (Thu) by atnot (subscriber, #124910) [Link]

Sorry, but $100 is just nowhere near enough to cover the cost of running these things. Microsoft charges their enterprise customers roughly 4x that and not even they have remotely turned a profit on it. In fact to my knowledge, not a single company has ever turned a profit with an LLM offering at any price point. And they'd be yelling it from the rooftops if they did.

It's also notable that even at that price, they have to give deep discounts to enterprise customers so that they can proudly announce companies like McKinsey getting on board. Not because they have any use for it either mind you, but to be able to "better answer our customers questions about AI".

Gentoo bans AI-created contributions

Posted Apr 18, 2024 21:17 UTC (Thu) by snajpa (subscriber, #73467) [Link]

At that scale, they also have massive opportunities to optimize and cut the total amount of work they need to do, just by looking at the data that goes through and balancing it against the compute costs (using heuristics, for example, how often is the suggested code accepted, etc.).

Gentoo bans AI-created contributions

Posted Apr 18, 2024 17:42 UTC (Thu) by Karellen (subscriber, #67644) [Link]

From Matt Jolly's email linked in the article:

we're always going to have BS/Spam PRs and bugs - I don't really think that the content being generated by LLM is really any worse.

Isn't part of the issue with LLMs not just that the quality can be low, but that the quantity of low-quality submissions jumps by orders of magnitude if LLM-powered submissions are allowed? See, for example, Clarkesworld ceasing to accept submissions altogether because of the volume of low-quality LLM-powered dross.

Also, doesn't explicitly banning LLM-generated contributions simplify the rejection process? If it's allowed provided the quality is good enough, you could end up spending way too much time arguing with bad-faith actors about whether the contributions they submitted are good enough or not. Whereas being able to just say "Policy says no." makes dealing with such people a lot more straightforward.

Gentoo bans AI-created contributions

Posted Apr 18, 2024 18:59 UTC (Thu) by atai (subscriber, #10977) [Link]

>Isn't part of the issue with LLMs not just that the quality can be low, but that the quantity of low-quality submissions jumps by orders of magnitude

not really true in the context of spam generation (so not directly comparable to FOSS contribution but still matters)
AI has improved the quality to make it possible to democratize good spam among all spammers.

Gentoo bans AI-created contributions

Posted Apr 18, 2024 19:32 UTC (Thu) by flussence (subscriber, #85566) [Link]

Copyright and ethics, sure; it's well documented at this point LLMs will steal entire chunks of GPLed code wholesale with the serial numbers filed off, and their proponents are so far up their own asses that not even a bolt of lightning and voice from the heavens would get them to shut up. The purpose of the system is what it does: which is to steal labour from the undercompensated in novel ways outside the law. Much like Open Source™ has become.

But I don't think Gentoo has a leg to stand on regarding contribution quality, not while nobody seems to mind all the low-effort automated spam coming from within the house. The bugzilla is littered with tens of thousands of script-generated snowclone "QA" reports (not to be confused with clear, proofread, actionable bug reports), and almost nobody reads them, let alone acts on them, because the signal-to-noise ratio is somewhere between a windows UAC prompt and ph*ronix. Drive-by spamming one of the single digit bug IDs should've been a massive wake up call that this process is FUBAR, but alas.

How much does all that *cost*? The project still can't even scrape together enough resources or willing contributors to upgrade or moderate its phpbb2 forums, for so long that it's statistically likely that a few people who once made fun of what a farce it is may have died of old age at this point.

Gentoo bans AI-created contributions

Posted Apr 18, 2024 20:50 UTC (Thu) by kleptog (subscriber, #1183) [Link]

Honestly, this feels like a rerun of the "you can't use a spell/grammer checker on your school assignment because that's cheating". Right now people are using prompts in chatbots, but in a few years it will be seamlessly integrated into all sorts of products. It's only going to get faster and cheaper as time goes on.

Gentoo bans AI-created contributions

Posted Apr 18, 2024 21:04 UTC (Thu) by mb (subscriber, #50428) [Link]

>but in a few years it will be seamlessly integrated into all sorts of products

Right. That won't resolve the open questions, though.

Just processing copyrighted material through some sort of "AI" filter should not make the Copyright go away.
Or alternatively, any program processing any data shall be allowed to remove Copyright.
Cannot choose both.

Gentoo bans AI-created contributions

Posted Apr 18, 2024 21:14 UTC (Thu) by snajpa (subscriber, #73467) [Link]

> Right. That won't resolve the open questions, though.

I have a feeling that trend is going to accelerate. Open questions kinda rendered "obsolete" by even more pressing new open questions :-D

Gentoo bans AI-created contributions

Posted Apr 19, 2024 9:55 UTC (Fri) by kleptog (subscriber, #1183) [Link]

>Just processing copyrighted material through some sort of "AI" filter should not make the Copyright go away.

Well, if you're processing 1TB of data into a 1GB model, it's very questionable whether you can really consider it a derived work any more.

As a human I have consumed enormous amounts of copyrighted data, and the responsibility to respect copyright lies with me, not the tools I use. Similarly, the responsibility for respecting copyright lies with the person using the LLM. An LLM is not going to produce something that resembles an existing copyrighted work without explicit prompting. I find it hard to believe it's going to happen by accident.

> Or alternatively, any program processing any data shall be allowed to remove Copyright.

Or, the person using the program is responsible for complying with any relevant laws.

(I'm getting strong "colour of your bits" vibes. The tools you use are not relevant to the discussion of copyright.)

Gentoo bans AI-created contributions

Posted Apr 19, 2024 13:41 UTC (Fri) by LtWorf (subscriber, #124958) [Link]

> Well, if you're processing 1TB of data into a 1GB model, it's very questionable whether you can really consider it a derived work any more.

So you're saying that if I rip a music CD that is ~700MiB of data, but then use lossy compression and make it into 50MiB of data, I'm actually allowed to do that?

Gentoo bans AI-created contributions

Posted Apr 19, 2024 14:36 UTC (Fri) by farnz (subscriber, #17727) [Link]

14:1 compression like that is well within the expected bounds of today's psychoacoustically lossless techniques. 1000:1 is not, so the argument is that if you rip a music CD and get ~700 MiB PCM data, and compress that down to 700 KiB, the result of decompressing it back to a human-listenable form is going to be so radically different to the original that this use is transformative, not derivative.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 16:17 UTC (Fri) by samlh (subscriber, #56788) [Link]

If you turn the music into MIDI, that could achieve such compression, and would still be a derivative work.

The same argument may reasonably apply for LLMs given how much verbatim input can be extracted in practice.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 15:36 UTC (Sat) by Paf (subscriber, #91811) [Link]

A lot of verbatim input can be extracted from *me* in practice, surely enough that I could violate copyright from memory.

So uh what about the other stuff I create? I know what good data visualization looks like because I have read many data viz based articles over the years. Etc

Gentoo bans AI-created contributions

Posted Apr 20, 2024 15:59 UTC (Sat) by LtWorf (subscriber, #124958) [Link]

Well if you learn the whole divina commedia and then write it down you won't become the author :)

Humans extrapolate in a way that machines cannot. So the comparison doesn't hold.

A human can write functioning code in whatever programming language after reading the manual from that language. A text generator needs terabytes worth of examples before it can start producing something that is approximatively correct.

I don't think comparing a brain with a server farm makes sense.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 9:25 UTC (Mon) by farnz (subscriber, #17727) [Link]

Humans extrapolate in a way that machines cannot.

This isn't, as far as I can tell, true. The problem with AI is not that it can't extrapolate, it's that the only thing it can do is extrapolate. A human can extrapolate, but we can also switch to inductive reasoning, deduction, cause-and-effect, and most importantly a human is able to combine multiple forms of reasoning to get results quickly and efficiently.

Note that a human writer has also had terabytes worth of language as examples before they start producing things that are correct - we spend years in "childhood" where we're learning from examples. Dismissing AI for needing a huge amount of examples, when humans need literal years between birth and writing something approximately correct is not advancing the conversation any.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 14:44 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link]

Note that a human writer has also had terabytes worth of language as examples before they start producing things that are correct

It's not terabytes, though. A really fast reader might be able to read a kilobyte per minute. If they read at that speed for 16 hours a day, they might be able to manage a megabyte per day. That would mean a gigabyte every 3 years of solid, fast reading doing nothing else every day. So a truly dedicated reader could manage at most a few tens of GB over a lifetime. Most people probably manage at most a few GB. Speaking isn't a whole lot faster. That means most humans are able to learn their native languages using orders of magnitude fewer examples than LLMs are.

To me, this is a sign the LLM stuff, at least the way we're doing it, is probably a side track. It's a neat way to get something that produces competent text, and because it has been trained on a huge range of texts it will be able to interact in just about any area. But it's a very inefficient way of learning language compared to the way humans do it. If we want something more like AGI, we need to think more about the way humans learn and try to teach our AI that way, rather than just throwing more texts at the problem.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 14:50 UTC (Mon) by farnz (subscriber, #17727) [Link]

That carries with it the assumption that text is a complete representation of what people use to learn to speak and listen, before they move onto reading and writing. It also assumes that we have no pre-prepared pathways to assist with language acquisition.

Once you add in full-fidelity video and audio at the quality that a child can see and hear, you get to terabytes of data input before a human can read. Now, there's a good chance that a lot of that is unnecessary, but you've not shown that - merely asserted that it's false.

Gentoo bans AI-created contributions

Posted Apr 22, 2024 21:34 UTC (Mon) by rgmoore (✭ supporter ✭, #75) [Link]

That carries with it the assumption that text is a complete representation of what people use to learn to speak and listen, before they move onto reading and writing.

To the contrary, I think the different learning environment is part of what we need to reproduce if we want more human-like AI. A huge problem with LLM is that they are largely fed a single kind of input. It's no wonder chatbots have problems interacting with the world; they've read about it but never dealt with it firsthand. If we want an AI that can deal with the world as we do, it needs a full set of senses and probably a body so it can do something more than chat or paint.

Gentoo bans AI-created contributions

Posted Apr 23, 2024 9:26 UTC (Tue) by farnz (subscriber, #17727) [Link]

Right, but you were claiming that because the input to a child can be summarised in a small amount of text, the child's neural network is clearly learning from that small amount of data, and not from the extra signals carried in spoken work and in body language as well.

This is what makes the "training data is so big" argument unreasonable; it involves a lot of assumptions about the training data needed to make a human capable of what we do, and then says "if my assumptions are correct, AI is data-inefficient", without justifying the assumptions.

Personally, I think the next big step we need to take is to get Machine Learning to a point where training and inference happen at the same time; right now, there's a separation between training (teaching the computer) and inference (using the trained model), such that no learning can take place during inference, and no useful output can be extracted during training. And that's not the way any natural intelligence (from something very stupid like a chicken, to something very clever like a Nobel Prize winner) works; we naturally train our neural networks as we use them to make inferences, and don't have this particular mode switch.

Gentoo bans AI-created contributions

Posted Apr 23, 2024 10:51 UTC (Tue) by Wol (subscriber, #4433) [Link]

Plus, it's well known that our neural networks have dedicated sub-systems for, eg recognising faces, that can be rapidly trained.

For example, baby learns what mum sounds like in the womb, and that is re-inforced by mum hugging new-born. My grand-daughter was prem, and while there don't appear to be any lasting effects, it's well known that separating mother and child at birth has very noticeable impacts in the short term. Not all of them repairable ...

We're spending far too much effort throwing brute force at these problems without trying to understand what's actually going on. I'm amazed at how much has been forgotten about how capable the systems of the 70's and 80's were - the prolog "AI Doctor" running on a Tandy or Pet that could out-perform a GP in diagnosis skills. The robot crab that could play in the surf-zone powered by a 6502. I'm sure there are plenty more examples, where our super-duper AI "more power than sent a man to the moon" would find it impossible to compete with that ancient tech ...

Modern man thinks he's so clever, because he's lost touch with the achievements of the past ...

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 23, 2024 15:20 UTC (Tue) by Wol (subscriber, #4433) [Link]

> we naturally train our neural networks as we use them to make inferences, and don't have this particular mode switch.

We also don't feed back to our AIs "this is wrong, this is right". So it's free to spout garbage (hallucinate) with no way of correcting it.

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 19, 2024 17:09 UTC (Fri) by ballombe (subscriber, #9523) [Link]

> Well, if you're processing 1TB of data into a 1GB model, it's very questionable whether you can really consider it a derived work any more.

But this is not uniform compression. The most relevant parts are kept mostly verbatim, the least relevant part are ignored. The AI trick is that there is no easy way to find which parts are kept verbatim, and there is no easy way to find the source.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 17:28 UTC (Fri) by mb (subscriber, #50428) [Link]

> there is no easy way to find the source.

This is not a technical problem at all.
It's not relevant how you compressed the data to be a derived work.
And there is no single right or wrong answer to whether something is a derived works. Always been like that.

But. Machine learning breaks Copyright in a fundamental way in that it is very similar to human learning so that one can use human learning reasoning, but at the same time it is fast and cheap.

While you need to put significant effort into your work when human-learning from others and creating new non-derived work, with MI this is just a click of a button.
This is where Copyright breaks. It's hard for a human to create new non-derived work.
But it's cheap for machine learning to do the same thing.

A human "filter" processing ("learning") work set "A" into non-derived work "B" is expensive. So it's almost never done just for copying and Copyright erasing.
A machine learning filter, however, is cheap and is easy to erase Copyright in that way.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 20:17 UTC (Fri) by kleptog (subscriber, #1183) [Link]

> This is where Copyright breaks. It's hard for a human to create new non-derived work.

This is absurd. The value of a copyrighted work is not dependant on the amount of effort that went into it.

If anything LLMs are a great equaliser. It used to be that to be a great writer you needed to have a great idea for a story and the skill to execute it. Now people with a great idea but not quite as good writing skills get a chance they might not otherwise.

Copyright protects the economic value and moral value of a work. The fact that other people can now also create new works easier does not reduce the value of copyright at all (or break it). Copyright does not protect all uses of your work, only those that reduce the economic value of the original.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 20:49 UTC (Fri) by mb (subscriber, #50428) [Link]

>The value of a copyrighted work is not dependant on the amount of effort that went into it.

I did never claim that.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 22:37 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

> And there is no single right or wrong answer to whether something is a derived works. Always been like that.

This is prosaically true in the sense that it's ultimately a judgment call on the part of the trier of fact, but as a matter of law, there absolutely is an answer, and the law generally expects you to know it. You can wave your hands about copyright "breaking" all you want, but the legal system is not going to be impressed.

You are correct, however, that AI does put the legal system in a bit of an awkward spot. Up until now, derivative works have been decided by the trier of fact (judge or jury) looking at the original and the allegedly infringing work side by side, and seeing if they're close enough that copying can be inferred. The legal term used in the US is "substantially similar" (or "strikingly similar"), but most countries are going to use a similar method in their courts.

That works fine when you have one original. When you have two billion originals, and an unbounded set of potentially infringing works, it's a bit impractical. Right now, the unspoken expectation is that the plaintiff has to do the leg work of figuring out which images to put side by side in this comparison (US courts would say the plaintiff is "master of their complaint" and thus responsible for deciding exactly what is and is not in scope). That's not easy in the case of AI, and it's the main reason (or at least, a major reason) that artists have struggled to sue image generators successfully.

But that does not imply that an image generator "erases" copyright as you have put it. If an artist is able to find a specific infringing output that closely resembles their art, and their art was used as a training input, then the artist might have a case. Saying "the AI breaks copyright" is not going to be an effective defense.

Gentoo bans AI-created contributions

Posted Apr 21, 2024 22:39 UTC (Sun) by NYKevin (subscriber, #129325) [Link]

(Also, everything I said about image generators applies to LLMs, except for the "struggled to sue" part. The New York Times did sue OpenAI and alleged very specific, word-for-word copying of whole paragraphs of text.)

Gentoo bans AI-created contributions

Posted Apr 18, 2024 22:22 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Honestly, this feels like a rerun of the "you can't use a spell/grammer checker on your school assignment because that's cheating".

My feeling in all of this is IFF you use an AI to help you write a valid report (of whatever sort) that's fine. The AI is the *assistant*. If, however, the AI is the *author* then you don't want to go near it with a barge pole.

In other words, if there is a *human* involved, who has sanity checked it for hallucinations, accuracy, what-have-you, then that's fine. If the human sending it can't be bothered, then why should the human receiving it bother, either? And if it's the AI bot that's sending it, then you REALLY don't want to know!

Cheers,
Wol

Gentoo bans AI-created contributions

Posted Apr 20, 2024 11:20 UTC (Sat) by Baughn (subscriber, #124425) [Link]

That would make sense, but it isn’t what the policy states?

Gentoo bans AI-created contributions

Posted Apr 19, 2024 8:03 UTC (Fri) by atnot (subscriber, #124910) [Link]

> Right now people are using prompts in chatbots, but in a few years it will be seamlessly integrated into all sorts of products. It's only going to get faster and cheaper as time goes on.

There's no real reason to believe this will happen. For one, the big inherent problem of this type of system, just making shit up, really makes it unsuitable to use for anything but optional autocomplete, especially compared to the more specialized models that already exist. Secondly, while the AI boosters keep talking about exponential improvements, that hasn't actually happened. GPT-4 was released a year ago and the best openai can do is a few percentage point improvement on some benchmarks. Which is not what you'd expect in a field where improvements are supposedly so low hanging that people predict a doubling of capability (unspecified) every 18 months. Hardware has been looking a bit better with the usual dozen percentage points perf/w we've come to expect every 2-3 years, but it's hardly revolutionary either.

It also remains to be seen how much development effort will remain with this technology when investors realize it's been severely overhyped in two or three quarters or however long it'll take them.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 10:44 UTC (Fri) by snajpa (subscriber, #73467) [Link]

well, the LLMs available *today*, from what I’ve tried, are pretty close for me to be able to replace Grek KH and his stable picks, while I haven’t even tried to fine tune the models, all zero-shot classification... It’s not that far, people will absolutely use it for way more than an autocomplete.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 10:54 UTC (Fri) by snajpa (subscriber, #73467) [Link]

Gre*g* K. H., my apologies - I have nothing but respect for the guy for the record, it’s just that it seemed like a good case to test out the LLMs abilities, got the idea from reading the discussions here under the “all users must upgrade” posts ;)) I can highly recommend, if you have some spare time, play around with these things. It really can be a huge productivity booster, that’s what it can be used for _today_. I wonder what world we’re going to wake up to tomorrow… of course we’ll inflate it into an enormous bubble, but so far IMHO we aren’t there, with our collective expectations, we (“the markets”) are still pretty close to what the tech can actually do.

Gentoo bans AI-created contributions

Posted Apr 23, 2024 21:02 UTC (Tue) by flussence (subscriber, #85566) [Link]

> we (“the markets”)

Quite a revealing slip.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 13:45 UTC (Fri) by LtWorf (subscriber, #124958) [Link]

This is a very extraordinary claim that is going to require a peer reviewed paper or some other proof to substantiate it.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 15:37 UTC (Sat) by Paf (subscriber, #91811) [Link]

You really really don’t know what’s possible or isn’t. The recent advancements were enormous and came suddenly, remember.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 1:15 UTC (Fri) by rafaelgp (subscriber, #170959) [Link]

How can the maintainers confidently identify code as being AI-backed?

Gentoo bans AI-created contributions

Posted Apr 19, 2024 5:07 UTC (Fri) by lkundrak (subscriber, #43452) [Link]

I don't think they can, and I don't think this falls into the scope of this policy.

This is no different from code that's been stolen without aid of AI. The process relies on trust between the reviewer and the contributor, and the contributor understanding which code is acceptable and which is not.

Gentoo bans AI-created contributions

Posted Apr 19, 2024 11:19 UTC (Fri) by sdumitriu (subscriber, #56869) [Link]

There is one AI tool that's safe to use, the "world's most responsible AI model", https://www.goody2.ai/

That thing is amazingly smart, and completely safe to use. The only downside is that it politely refuses to answer anything you ask it!

Gentoo bans AI-created contributions

Posted Apr 20, 2024 5:01 UTC (Sat) by drago01 (subscriber, #50715) [Link]

LLMs are here to stay. Attempting to ban them is going nowhere.

A better approach is define rules ok how to use them. More and more tools will add some sort of LLM based features.

I'd rather see a push for open / free alternatives instead of leaving the field to the big cloud providers. There are some but they need more help, banning them for sure doesn't help.

Gentoo bans AI-created contributions

Posted Apr 20, 2024 15:48 UTC (Sat) by Paf (subscriber, #91811) [Link]

Yes, I have to say this is hilarious to me - these tools are only going to grow in their use. And we’re going to just *stamps foot* ban them.

Well, I’m going to keep using it to turn out boilerplate API code and smaller utility pieces that make up the boring part of my work. I don’t really see why forcing me to type those out by hand makes the world a better place. And I don’t contribute to Gentoo, so we’re fine.

Tragedy of AI generated packaging

Posted Apr 21, 2024 8:45 UTC (Sun) by zdzichu (subscriber, #17118) [Link]

Following issue highlights the mess hallucinating LLMs create when it comes to packaging:

https://github.com/pkgxdev/pantry/issues/5358

Tragedy of AI generated packaging

Posted Apr 21, 2024 12:26 UTC (Sun) by snajpa (subscriber, #73467) [Link]

Wonderful example. But that project seems overall seems to be an AI-based experiment - pretty much of a single person. So actually, I'm pretty impressed. If this is already possible now, and it wasn't literally "a few" months ago, what makes people here think we're at the final station? This is just so funny :-D

Tragedy of AI generated packaging

Posted Apr 21, 2024 12:41 UTC (Sun) by snajpa (subscriber, #73467) [Link]

ok I'm sorry, it's not a project of a single person, even though it does look like that from the commit history :-D it overall seems like something where I'd want to cut some corners too, packaging the whole world all over again, as if that hasn't been done a zillion times... it's hard to say whether no descriptions at all are better than this... I wouldn't want to copy-paste them from somewhere else either...