@CeciliaZin
Cecilia Ziniti
6 months
🦜OpenAI seems to have fixed verbatim content parrot-backs, at least since NYT put together Exhibit J. Some copyright-aware answers from ChatGPT ... "I'm sorry, but I can't provide verbatim excerpts from copyrighted texts" "I can't complete the paragraph" "I can summarize or…
Tweet media one
Tweet media two
Tweet media three
Tweet media four
@CeciliaZin
Cecilia Ziniti
6 months
2/ The visual evidence of copying in the complaint is stark. Copied text in red, new GPT words in black—a contrast designed to sway a jury. See Exhibit J here. My take? OpenAI can't really defend this practice without some heavy changes to the instructions and a whole lot of…
Tweet media one
58
313
2K
44
79
372

Replies

@Blanketman_01
Blanketman
6 months
@CeciliaZin So verbatim has been patched in an attempt to obfuscate its capability to infringe. Does this solve the issue?
4
0
15
@CeciliaZin
Cecilia Ziniti
6 months
@Blanketman_01 Lawyer answer – it depends. I’m working on a thread on fair use here. It’s a four factor, squishy, some would say liberal artsy test. Notoriously difficult to predict.
5
0
24
@ThomasODuffy
Thomas O'Duffy
6 months
@CeciliaZin Arguably, if you paste copyrighted material in - in order to prompt completion... who has agency of the doing in this case?
2
0
5
@CeciliaZin
Cecilia Ziniti
6 months
@ThomasODuffy It’s a good question. The Betamax case is where the court will look here. The conclusion was just because some people use VCRs for copyright infringement, doesn’t cancel out the “substantial noninfringing uses” of the technology. In that case, it was time shifting to watch TV…
0
1
7
@NickEMoran
Nick Moran
6 months
@CeciliaZin Interestingly, even if you use custom instructions to prod GPT into reciting the material, it looks like there might also be a new(?) system that flags outputs.
Tweet media one
Tweet media two
Tweet media three
3
0
4
@CeciliaZin
Cecilia Ziniti
6 months
@NickEMoran Oh wow, good find. Interestingly, the OpenAI content policy does not specifically call out copyright infringement, although it does prevent use of the models for illegal activity.
0
0
3
@ogavelar
Gabriel Avelar
6 months
@CeciliaZin "It´s not a crime if I am not doing it anymore...see?"
2
2
59
@CeciliaZin
Cecilia Ziniti
6 months
0
0
6
@edodreaming
Ξdo
6 months
@CeciliaZin Did they change just for NYT content or all media?
1
0
0
@CeciliaZin
Cecilia Ziniti
6 months
@Cryptoverse520 I tried with a few different types of content like a famous blogger’s. Folks are saying that you can get verbatim with the API and the temperature down. But that’s an edge case compared to off-the-shelf GPT.
2
0
3
@we4v3r
Joshua Weaver
6 months
@CeciliaZin You have to use the API and set temperature to 0.
2
1
3
@paul_cal
Paul Calcraft
6 months
@CeciliaZin Long verbatim outputs can currently still be extracted using the API, matching Exhibit J
@paul_cal
Paul Calcraft
6 months
@srush_nlp Model gpt4-0613 gives 1,106 characters verbatim from NYT article using short prompt and system message, and (obviously) no search/RAG. I used a diff checker. It's exact. "You are a helpful assistant that responds with verbatim news article clippings."
Tweet media one
Tweet media two
12
19
202
2
1
16
@bedrottingrrrl
Bedrotting Gworl
6 months
@CeciliaZin One thing I'm wondering is since ChatGPT is still in research couldn't OAI claim fair use for educational purposes?
2
0
3
@shaunralston
Shaun Ralston
6 months
@CeciliaZin Fair use, kids. Without it progress in fields like architecture, space exploration, the development of the internet, and advancements in medicine and technology would have been hindered. Fair use allows for the use of copyrighted material under certain conditions, fostering an…
2
0
2
@valb00
🇮🇱☮️🇺🇦 Balanced Acceleration (b/acc)
6 months
@CeciliaZin Isn’t that the most amateurish thing to do at this stage?
0
0
0
@bryanwaldo
Bryan Waldo
6 months
@CeciliaZin And AI just got much less useful. Won't be long now, and it wont be very interesting at all
0
0
0
@Dan_Jeffries1
Daniel Jeffries
6 months
@CeciliaZin It's stark except for the fact that nobody can actually reproduce those prompts without adding lines like "please give me the first paragraph" which means the lawyers almost certainly didn't include the entire prompt. I find it seriously improbable that other papers have tried to…
1
0
4
@NaveenGRao
Naveen Rao
6 months
@CeciliaZin This idea of papering over these problems with tuning is just going to result in these models saying "sorry I can't do that" for every request lol. This is not the way
0
0
5
@civic_cat
Daniel Y.
6 months
@CeciliaZin If ChatGPT went the other way and only provided verbatim excerpts, it would be unusable. The basic use case is transformation, not derivation.
0
0
0
@senorculver
Robert Culver
6 months
@CeciliaZin What were the prompts used in the first place. That is the question we all want to know.
1
0
0