ðĄððŧ ðķðð ðžðģðģðēðŧðŊðŪðŋ ðąðķðē ððŪðððē ðŪðð ðąðēðš ðĶðŪð°ðļ!
As reported by several media (e.g. https://lnkd. in/e-bvsSX8), Meta has now confirmed that it used the illegal pirated library LibGen to train its AI.
The explanatory memorandum states that books are of course the best source for AI training, as they are often better in terms of language, content and subject matter than any short snippets from social media (logical). They are “well-written representations of human language”.
ððēððĩðŪðđðŊ ðððŋðąðēðŧ ððŪðīðē ððŧðą ðð°ðĩðŋðēðķðŊðē ðŪðģðŽ ð§ðēðŋðŪðŊðððē ððĖð°ðĩðēðŋ (ð°ðŪ. ðģ.ðą ð ðķðđðđðķðžðŧðēðŧ ððĖð°ðĩðēðŋ ððŧðą ðīðŽ ð ðķðđðđðķðžðŧðēðŧ ððķðððēðŧðð°ðĩðŪðģððđðķð°ðĩðē ðĶðððąðķðēðŧ) ðīðēðļðđðŪðð – ðŪðŧðąðēðŋð ðļðŪðŧðŧ ðšðŪðŧ ðąðŪð ðŧðķð°ðĩð ððŪðīðēðŧ. ðĻðŋðĩðēðŊðēðŋðŋðēð°ðĩððđðķð°ðĩ ðķðð ðąðŪð ðŧðŪððĖðŋðđðķð°ðĩ ðēðķðŧ ðŪðŊððžðđðððēð ðĄðž-ððž.
Now you can argue that Meta did not steal the data itself, but “merely” used an illegally curated stock for training. And you can argue that training AI does not constitute copyright infringement. The courts will decide on all of this.
ð ðŪðŧ ðļðŪðŧðŧ ðŪðŊðēðŋ ðēðŊðēðŧðģðŪðđðđð ðēðķðŧðšðŪðđ ðšðēðĩðŋ ððēðĩðēðŧ: ððŪð ðīðēðšðŪð°ðĩð ððēðŋðąðēðŧ ðļðŪðŧðŧ ððķðŋðą ðīðēðšðŪð°ðĩð – ðžðĩðŧðē ðĨðĖð°ðļððķð°ðĩð ðŪððģ ðĨðēð°ðĩð, ððēððēððð, ðĻðŋðĩðēðŊðēðŋ. ðĻðŧðą ðšðŪðŧðŧ ðļðŪðŧðŧ ððķð°ðĩ ððķð°ðĩðēðŋ ððēðķðŧ, ðąðŪðð ð ðēððŪ ðŧðķð°ðĩð ðąðķðē ðēðķðŧððķðīðēðŧ ððķðŧðą, ðąðķðē ððž ðŪðŋðŊðēðķððēðŧ. ððķðē’ðĩðŪð’ð ðĩðŪðđð ð·ðēððð ðēðŋððķðð°ðĩð ððŧðą ððķðŧðą ðŪððģðīðēðģðđðžðīðēðŧ.
ðĶð°ðĩðžĖðŧðē ðŧðēððē ðŠðēðđð!
P.S.: currently the users of LLM’s are responsible for their results, i.e. if you now use Meta’s Llama model and the text generated with it uses content from the illegally used training data, you are responsible for it – not Meta!
Hashtag#informatikersindcool Hashtag#kiistdaundbleibt

