Talk:Internet Archive

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

Userbox[edit]

I have created a userbox to help spread the word that Internet Archive is a useful website archiving service. {{User Internet Archive}}

clarify lede[edit]

The main subject of this article seems to be the 501c3 nonprofit organization that runs the Internet Archive digital libary and other products/services. The first 2 paragraphs of the article focus on both at once. Would it make sense to slightly rephrase the lede so that the organization is clearly distinguished from the products/services it produces? -- Oa01 (talk) 11:37, 16 August 2023 (UTC)[reply]

Status[edit]

Does anybody have some news about the current status since it lost the lawsuit? Mr.Lovecraft (talk) 09:31, 5 September 2023 (UTC)[reply]

Thank you for that question.
As I covered at the Internet Archive's annual update, at this year's Wikimania gathering, here is a blog post we have shared https://blog.archive.org/2023/08/17/what-the-hachette-v-internet-archive-decision-means-for-our-library/ Markjgraham hmb (talk) 18:33, 5 September 2023 (UTC)[reply]
Yeah i´ve read that already... But when i tried to borrow a book last Monday it worked at least for one hour. So i was woundering whether it was just a technical error or indeed the consequences from that lawsuit... Mr.Lovecraft (talk) 09:29, 6 September 2023 (UTC)[reply]
I believe that the lawsuit was just about the controlling digital lending of books and not everything that the Archive does. The Wayback Machine doesn't seem affected. Many of its collections don't seem affected. I have seen some speculation online that defeat could bankrupt the Archive, but the Internet Archive is not very open about its finances and it's hard to find a robust source on that. I don't see a reason to think that the lawsuit is any more a threat to its existence than the bad reputation that it has gained for hosting terrorist and neo-Nazi videos/books. Epa101 (talk) 21:30, 4 November 2023 (UTC)[reply]

I was "borrowing heavily" today for several bibliographies I'm working on. As of the past hour (approx 1:30 pm EDT) all searches are resulting in the "Borrow Unavailable" message. Meanwhile, no related news could be found on the web. Has the Archive's lending library shut down? Allreet (talk) 18:02, 10 September 2023 (UTC)[reply]

Hi,
The Internet Archive's library has not been shut down.
Please read https://blog.archive.org/2023/08/17/what-the-hachette-v-internet-archive-decision-means-for-our-library/
You say you were "borrowing heavily". You may have hit a borrowing limit. We have always had limits, just like nearly every library.
Please feel free to email info@archive.org (or me directly) with specifics if you want to explore this further.
- Mark Graham mark@archive.org Markjgraham hmb (talk) 18:50, 10 September 2023 (UTC)[reply]
@Allreet – Please note: Talk pages are for improving articles, and not for general discussions about the article subject. Please use WP:Reference desk for generalized inquiries and discussion. Thank you. -- dsprc [talk] 07:34, 12 September 2023 (UTC)[reply]
Understood. Thanks Allreet (talk) 12:48, 12 September 2023 (UTC)[reply]

Original research?[edit]

Some of the referencing here seems unusual for Wikipedia. The sections on the number of pages archives by year and the languages of the books archived are based on searches. We wouldn't reference YouTube with a search of YouTube on a certain date. With websites that constantly change, the numbers get outdated immediately. If a book said that X million articles on the Archive are in French, that would be fine to cite; using a number found through a search of archive.org on a certain date seems like original research to me. Epa101 (talk) 21:17, 4 November 2023 (UTC)[reply]

Having had no response, I'm going to remove the sections that I believe constitute original research. Epa101 (talk) 13:24, 12 November 2023 (UTC)[reply]
Responding late, but I agree that's appropriate removal. Masem (t) 14:18, 12 November 2023 (UTC)[reply]
@Epa101: Those numbers were not based on searches. They were based on the text on archive.org's front page as it appeared on previous dates. It was no different than citing to any other archived source from the past. Particularly, the YouTube comparison was inapt to this application of the Wayback Machine because you cannot use YouTube's search to find out how many videos YouTube had one a previous day. Neither could you use the Internet Archive's item search to determine this information. This is more like citing old issues of a newspaper that announced its circulation to establish a timeline. I think presenting this information in tables was a bit garish, but they were not OR and removing them on that basis was inappropriate. lethargilistic (talk) 01:20, 16 November 2023 (UTC)[reply]
@Lethargilistic Hello. I've had another look. I see your point on the first table, with the archived pages in billions. This took me a while to find the right number in each case. However, I also deleted the tables by language and century scanned, and those figures are not on the front-page: they only come up through searches and the numbers presented are outdated now. Epa101 (talk) 08:42, 16 November 2023 (UTC)[reply]
@Epa101 Oh, somehow I didn't see the removal of the language/century ones. That's definitely OR. But the first table ought to come back, IMO. lethargilistic (talk) 11:29, 16 November 2023 (UTC)[reply]
@Lethargilistic Yes, I'm happy to agree with you on that. I'll reinstate them after work. Epa101 (talk) 15:53, 16 November 2023 (UTC)[reply]

NPOV: too positive?[edit]

I have noticed that the tone of this article tends to be very positive about the Internet Archive. It might well have done a lot of good, but it also has a reputation for hosting terrorist and neo-Nazi material. (It really doesn't take a lot of searching to find on there if you want to try!) I've noticed edits recently have been taking away academic sources about these problems whilst leaving in positive articles from non-academic sources. I'm concerned that the neutrality of the article might be compromised by this. Epa101 (talk) 23:47, 25 December 2023 (UTC)[reply]

I agree that well-cited articles about terrorist and neo-nazi material on IA should be included in this article. I notice that this Memri article [1], which is cited by Boucher & Young, contains a screenshot of an Internet Archive search on "holocaust" that is drastically different from what I get when I do the same search (nearly three years later): I get no Holocaust denial or nazi material, as far as I can tell, in the first many results (screenshot available on request!). Perhaps the archive has changed its methods or policies? I also note this article [2] which mentions the archive taking down some offensive content. DoctorMatt (talk) 03:06, 26 December 2023 (UTC)[reply]
They might have changed a little bit since that article was written, but we have to go with what is written in reliable sources and not with original research. I don't want to offend anyone too much with some of the hateful content found on there, but, for just two examples (one British and one American), look at the search results for David Duke and for National Front. Epa101 (talk) 17:12, 26 December 2023 (UTC)[reply]
P.S. I would also note that Jason Scott, who is a senior person at the Internet Archive, has a history of vandalising Wikipedia: see here! I am concerned that he or some of his mates might be monitoring this page. Epa101 (talk) 17:21, 26 December 2023 (UTC)[reply]
This is completely fatuous logic. Archive.org, Archive.is, Ghostarchive.org etc blindly capture everything on the internet that is accessible. Just like Google, Bing etc index everything blindly. So what next? Ban Google and Bing?
And we can do without the "if you can't kick the ball, kick the man" personal attacks. 𝕁𝕄𝔽 (talk) 17:34, 26 December 2023 (UTC)[reply]
Erm, we don't allow Google and Bing as references on Wikipedia. I've not made any personal attacks. This is how we identify bad-faith edits on Wikipedia. Someone is taking off negative coverage here, even if it's backed up by reliable sources, and leaving all the positive coverage on there. That's the truth of it. Epa101 (talk) 17:55, 26 December 2023 (UTC)[reply]
You are still missing the point. The citation is to the original source document. The place where it resides is essentially irrelevant apart from demonstrating provenance. If the original URL is no longer available, we give the archive.org/archive.is/ghostarchive backup URL. We are not citing the archive, we are citing the document. [Using the term document loosely, could be a web page, an image, a video or a journal article.] Any clearer? --𝕁𝕄𝔽 (talk) 22:19, 26 December 2023 (UTC)[reply]
No. I am not talking about using the Internet Archive for citations (e.g. the Wayback Machine). I am talking about the Wikipedia article on the Internet Archive. The reason why you might find my missing the point is that you're on the wrong page for the discussion that you're trying to have. This is the Talk page for the article. I'm saying that the article is overwhelmingly positive and criticisms of the Internet Archive are being taken off. You mentioned Google in an earlier message; we have a page called Criticism of Google and rightly so. In contrast, there is little criticism in this article on the Internet Archive. I have been trying to restore an academic citation that says that the Internet Archive hosts fascist resources that are not allowed on other platforms. The text is very defensive in saying that the Archive takes sources that were mainstream at the time. No! The article looks at the works of William L. Pierce and Harold Covington, and they were not mainstream when they were released. I maintain that there is an NPOV problem here. Epa101 (talk) 11:24, 27 December 2023 (UTC)[reply]
I would just add that generating a distorted slant against the IA based on cherry-picked statements from research about the way Nazis use it is not automatically shielded from NPOV just because the article doesn't have enough information on what you wnat to talk about yet. The situation earlier was a good example. The article you wanted to use was appropriate for citing a description of the process that it was actually about—particularly linking to scans of public domain material that IA keeps for a separate mission, just like (the article mentions) other projects including Project Gutenberg and Google Books. It was not appropriate to quote one out-of-context sentence to say "the Archive is important to Nazis" with the clear implication that IA approves of that or invites it. Balancing the truth of the article with "it's in an RS I found" can be contentious, but this was clearly over the line for me. The thing is, I'm sure you could write a paragraph about this Nazi use pattern based on the cited source and the sources it cites, but you've got to do it without making it clear in the article that you think the IA is an active participant in it or somehow a silent supporter of Nazis, which is not true on its face. lethargilistic (talk) 11:39, 27 December 2023 (UTC)[reply]
As to "they were mainstream at the time," I wrote it when I was out and in a hurry. I recognize that that part could read to you as defensive of the Archive, but I was trying to describe why the Nazis were linking to it without approving or disapproving of the content. I agree that I did a bad job on that sentence, though. I'm sure you could find a better way to write the thing you want to write overall. lethargilistic (talk) 11:51, 27 December 2023 (UTC)[reply]
I don't think that the Internet Archive is an active participant in Nazism or a silent supporter of Nazism. It already says in the article that the Archive is used by the Islamic State and similar groups, which doesn't seem to be disputed. It would be a strange ideology to back both Nazism and the Islamic State. It might be a simple case that they have got their priorities wrong, but that is not an excuse for sharing books that are illegal in some countries. From reading the article through again, I don't think that it's unfair to take from the article that the Internet Archive is popularly used amongst neo-Nazis. (As an aside, I once clicked on a video made by the USA after their liberation of Germany and the comments underneath were horrible.) I'll think about how to rewrite the sentence with the citation. Epa101 (talk) 13:05, 28 December 2023 (UTC)[reply]
Yes, that reads well to me. I have tweaked the opening sentence of the section too, to introduce the concern/criticism more clearly.
No, I don't believe I'm on the wrong talk page. My concern is expressed well in the article, so I'll just copy it: amidst discussion about whether such documents should be preserved by archivists or not. --𝕁𝕄𝔽 (talk) 16:50, 30 December 2023 (UTC)[reply]
Here is a news article about the challenges of archiving controversial materials. [3] It is an honorable practice to "share books that are illegal in some countries". In repressive countries, it is required for freedom of thought to survive; see for example Samizdat. In free countries, it is a way to make materials available for study by both those who may agree with them, and those who may disagree and want to learn why they disagree. Long-term archives should not follow short-term fads by tossing out materials that a majority disagrees with or even wants to actively suppress. It is often true that tiny minority points of view become the accepted wisdom of future generations. This is true whether the topic is (a few easy examples): physics, racism, religion, the status of women, the germ theory of disease, or the origins of species. Without access to the original records of those ideas, it is impossible for historians to trace their development and spread. Gnuish (talk) 20:52, 30 December 2023 (UTC)[reply]
Is the Colchester Collection, specifically, notable enough for a mention in the actual article? If yes, how well does effectively publicizing that specific list in a Wikipedia page balance with the encyclopedic value that naming one adds? lethargilistic (talk) 23:02, 31 December 2023 (UTC)[reply]

Number of employees?[edit]

On 2024-02-04T10:36:30 User:2001:ee0:4bca:fd50:b50c:773b:1a40:16ba "Updated the statistics", replacing

  • "Internet Archive – Full text of "Full Filing" for fiscal year ending Dec. 2019". May 9, 2013. Archived from the original on October 30, 2021. Retrieved October 30, 2021 – via ProPublica Nonprofit Explorer.

with

I'm concerned that the new 990 no longer matches the number of employees in this article: PartI "5. Total number of individuals employed in calendar year 2019 (Part V, line 2a)" says 169, but the same field on the 990 for 2022 says 0.

That number changed to 0 in the 990, but the number of employees is still listed at 169. The number 169 does not make sense to me, which is why I haven't changed the number in the infobox. However, something is not right here.

Might someone care to suggest what to do about this? Thanks, DavidMCEddy (talk) 00:23, 5 February 2024 (UTC)[reply]

One possibility is that the 990 from 2022 that lists 0 employees has been later amended with more accurate numbers, and Nonprofit Explorer doesn't always show amended returns. My suggestion is to revert back to an earlier 990 until this is figured out. Regards, Orange Suede Sofa (talk) 00:59, 5 February 2024 (UTC)[reply]

Archive.org down?[edit]

Not resolving for me. Anyone else experiencing the same? Tuvalkin (talk) 04:59, 28 March 2024 (UTC)[reply]

Of course, AI did it:

this is our second blast of abusive traffic from an AWS customer today apparently from an AI company harvesting Internet Archive texts at an extreme rate

Tuvalkin (talk) 05:02, 28 March 2024 (UTC)[reply]