Monorepo for Tangled tangled.org

AI Contribution Guidelines #394

open opened by alexia.starlightnet.work

From what I could tell, Tangled currently lacks guidelines outlining whether or not Pulls assisted or created by Generative AI are accepted, and if so, to what extent.

It would be beneficial if this information could be added to the Contribution Guidelines or to a separate file on the matter.

Our position is that we hold all external pull requests to the same standard of review, whether they are AI-assisted or not. Ultimately, the patch owner is responsible for understanding and defending the code they submit, regardless of how it was produced.

We don’t currently distinguish between AI-generated and non-AI-generated contributions, but perhaps adding a short note to the contribution guidelines to clarify this expectation makes sense.

Perhaps something like:

Tangled does not differentiate between AI-assisted and non-AI-assisted contributions. All pull requests are reviewed under the same standards. Contributors are expected to fully understand and be able to justify the code they submit, regardless of how it was authored.

I think requiring contributors to mark contributions as AI-assisted would probably be a step in the right direction.

I have concerns about licensing though, these tools are known to regularly generate code snippets from projects with a different license*, and that might get Tangled (or any project) into trouble in the future depending on how some global rulings regarding this go. So, perhaps on the basis of that uncertainty, these contributions should probably be denied?

* as well as licensed works in general, not just code

Our position is that we hold all external pull requests to the same standard of review

fwiw 100% support this stance (as as oss maintainer who is routinely frustrated by ai slop PRs). "AI-assisted" is already quite hazy as a binary characteristic for a diff. as LLMs are more fully integrated into dev cycles, it seems it'd be complex / mostly irrelevant for contributors to describe the sources of LLM influence on the eventual diff

also for the data point, we tried this for a bit and the most inconsiderate offenders failed to self-report their AI use

If someone fails to adhere to contribution guidelines, I or another maintainer rewrite their contribution and future contributions are denied until further notice. If a point in the guidelines about AI is removed because there was no enforcement, the rest of the contribution guidelines may as well be useless as well.

Besides, I don't think the exact tool must be disclosed, but rather that an AI tool was used in general.

I don't think the exact tool must be disclosed

this is why i suggested that “AI-assisted” was hazy as a binary characteristic. if you google something and ai mode shows up, is it ai assisted? where is the line?

I personally think the line is where there is a choice. Google's AI overviews cannot officially be disabled, not without just going to a different search tab entirely.

However, things like GitHub Copilot are tools one can choose to use, or not.

The human details of "did they do it on purpose or not" can be handled on a per-pull/issue basis, because some may not even be aware of the reasons one might want to avoid such tools

Because this wasn't outlined in the main issue body, can a statement be made regarding the thoughts on legality and/or ethics of AI for writing code?

I am not asking for, nor expecting a full breakdown of the situation, but a statement regarding whether or not there are licensing concerns or ethical issues from the Tangled team regarding these tools. It would be much appreciated, thanks in advance.

You can take a note from how other projects handle this. Bevy Engine has a taken a position of no AI contributions for various reasons: https://bevy.org/learn/contribute/policies/ai/

For things like issues, using AI to help write an issue in English because it is not a language you are comfortable with is fine, but not when using to create issues wholesale without testing the bug yourself. Nor for contributions to avoid problems of overwhelming maintainers and to avoid problems with code attribution/licensing issues.

Bevy has had a fair number of slop issues and PRs thrown at them, so they decided to nip that in the bud by explicitly saying no to AI generated contributions. There's still wiggle room to use AI when contributing, but the onus is on the author to do due diligence and also not to submit work that hasn't been modified or iterated on by human hands (to avoid the licensing issues described).

Personally, I'd make it clear that AI generated code is less likely to be accepted for a lot of the same reasons as Bevy have outlined. In the end of the day, liability is a thing, and as an OSS project, you don't want to open yourself up to more liability than you can handle.

When it comes to accepting or not accepting AI contributions in general, the matter seems pretty straightforward at this point.

If a project accepts them, then they have to deal with the extra work this will create when it's used poorly.

If a project doesn't accept them, then they have to deal with the extra work when they're used poorly anyway (though maybe somewhat less due to honest participants). Furthermore, the project that doesn't accept them is going to have regular arguments about what's AI assisted or not and community members will spend much of their energy pointing fingers at each other.

This would seem to be an uneven contest except there are also lots of people who actively enjoy arguing about AI and pointing fingers, and a "not accepted" policy on a project gives them plenty of opportunities to entertain that vocation.

Just my 2c - I'm only a minor contributor and will happily respect whatever the team decides.

We (as in, @wafrn.net) have disallowed AI contributions wholesale following in the footsteps of https://elementary.io.

We do not get into arguments about what is or isn't AI-assisted at all. We have a label for potentially AI-assisted PRs or issues according to our own judgement. If a contributor admits to using it, or we feel the risk is too high, we reject the contribution and/or close the issue.

No one wants to put time into trying to contribute AI-generated code if you create an environment that is actively hostile to doing so.

And, also, if we don't enforce this guideline, what use are any guidelines at all? We may as well let contributors do anything, merge anything, and have no quality control to begin with, if we don't stick to our values and set guidelines.

That sounds like a fair old burden on the maintainers but if they're feeling up for it, more power to them.

I'll take your last paragraph as hyperbole... I assume we agree that guidelines around commit message formatting, or that there must not be any linter errors, or that the code must be organised into the correct locations are all effective and enforceable guidelines. If you want to write a guideline around something that cannot be determined by looking at the diff, then my commentary above applies.

The last part is, in part, hyperbole, but I genuinely do believe that "people will use AI anyways" is not a reason to not have policies around it. Because the contribution guidelines are all about how and how not to contribute, and if we add a clause about AI and it isn't respected, we should take action, not let it slide because it is "inevitable" as I've heard so many people say regarding this topic.

sign up or login to add to the discussion
Labels

None yet.

area

None yet.

assignee

None yet.

Participants 5
AT URI
at://did:plc:ysfpz7426pwdvjylsmcvmzzj/sh.tangled.repo.issue/3mdldyujezk22