User talk:David Gerard/1.0

Article selection[edit]

Article selection should take into account the historical popularity of articles -- perhaps based on server logs. This needs to be done orthogonally to the rating system, since an article that is in fairly high demand might actually be pretty crappy. --Ilya 03:49, 27 Jul 2004 (UTC)

Oh, well spotted! Yep - David Gerard 19:24, 27 Jul 2004 (UTC)

Would it be possible to find some schools in developing countries that do have internet access and track what articles their students access for a couple of months? We might even be able to identify gaps in content by looking at their failed searches.--Payo1 10:21, 23 Feb 2005 (UTC)

Rating system and fact checking[edit]

Hi, can anyone explain to me, what the rating system is for? Is this just to have a preselection of articles? And another question: has the rating system anything to do with making Wikipedia reliable? By reliable I mean that we ensure that each sentence of an article has at least been checked twice for factual correctness. -- mkrohn 17:31, 23 Jul 2004 (UTC)

In the form it's been discussed, rather than a mark of factual verification, it seems more to be a system for seeing that articles aren't obviously crap or below suitable quality: "As I understand it, our central goal is to prevent someone from opening their WikiReader and seeing "GOATSE GOATSE ALL DAY LONG / GOATSE GOATSE THE GOATSE SONG". As a method of verifying spelling and structural quality---which, I think we can agree, is definitely needed---a trust metric would make the Wiki production-ready." (m:Article validation).

That is, something to bring rubbish articles up to okay, rather than fact-checking verification. Though we would probably need that too ... perhaps a rating factor? - David Gerard 19:27, 23 Jul 2004 (UTC)

(and I should probably pepper that whole thing with references) - David Gerard 19:27, 23 Jul 2004 (UTC)

Edit war hotspots[edit]

One possible way to deal with "edit war hotspots" might be to simply flag them as such at this point, & hand the task of dealing with them to an editor who will have the task of either selecting one version to appear in the print version, or finding a way to merge the 2 POVs.

Yes, this violates Jimbo's ideal of not forking the Print Wikipedia from the Net version, but no matter how we approach this the Print version will end up being technically a fork from the Net version -- it will be a collection of snapshots of articles that never appeared together at one time. However, it does keep the two arguably close in content & format. -- llywrch 19:56, 23 Jul 2004 (UTC)

Mmmm. Selection is going to be the most angstful part of the paper 1.0. People have been picturing this for years - I predict many will get SERIOUSLY pissed off, no matter what the decision - David Gerard 20:07, 23 Jul 2004 (UTC)

"Angstful" (if I understand the word correctly) would only describes part of it. If the idea of getting Wikipedia onto paper didn't have a deadline, we could allow the various factions to fight it out, & hopefully come to some kind of agreeement in the end; perhaps by using this deadline as a club, we might get the parties to come to an agreement -- or once they see what a 3rd party does, it may force them to a compromise after the fact.

In any case, I consider it would be a bad idea to allow the 1.0 miss its date with the printer because some hotheads still want to argue over some minor point in an article. -- llywrch 07:20, 24 Jul 2004 (UTC)

Oh, definitely. I mean having entire topic areas dropped from the final cut - David Gerard 09:22, 24 Jul 2004 (UTC)

Africa[edit]

Are we talking about the English WP? If not, what will we do about the fact that most people in Africa do not speak English? Mark Richards 21:52, 23 Jul 2004 (UTC)

I'm quoting Jimbo's dream for it. There's a reference in the links - David Gerard 00:24, 24 Jul 2004 (UTC)

I like this idea, and I'm certain that while most don't speak english, we can help all those that can. Also, if we develop Wikipedia 1.0 generally enough, we'll create a model for how to bring any language wikipedia swiftly from wiki to print -- Wikipedia on paper shouln't be a one-time thing, and after this first big push to the printing, we should hopefully don't have to do the same work again next time. ✏ Sverdrup 14:18, 28 Oct 2004 (UTC)

Since this is more than anything else a technical and logistic problem, finding the solution on English WP will naturally extend to solutions for other languages. Note here that I am somewhat dissatisfied with the choice of the term 1.0. We really shouldn't be pushing for any kind of "milestone" or "completion", but rather a means at any moment of doing "selections" at will. Think accessibility, reliability, and reputation rather than completion. Tom Haws 16:36, Mar 1, 2005 (UTC)

Good job, David[edit]

This is probably the best mission statement on Wikipedia 1.0. Sadly we are still doing this planning and not working according to the plan, but perhaps we can start working soon -- when Validation is here.

Some thoughts on the Validation, though: We should have flags not only for suitability and the current version's content, but also a flag to set for "Wee need this article in print", so that we have an indicator on lousy articles we need to improve. However, I don't think we should have too high hopes on validation; susning.nu had a rating system that didn't give too detailed info; fringed articles, satirical articles and such were give all the wrong ratings, etc. Now it seems validation will be a more serious approach, and better hidden from view for passers-by, making possible for more sincere ratings. ✏ Sverdrup 14:18, 28 Oct 2004 (UTC)

Disregard the bad english above; I'm simply too stupid and tired to express myself clearly. I was saying: "good job, david" (I got that one right) and "Validation our way will probably work, but check for lousy articles that we need". ✏ Sverdrup 14:20, 28 Oct 2004 (UTC)

Online Wikipedia > Printed version[edit]

The texts selected will need a different treatment for the printed version to ensure consistency of layout and appearance. How many many Editors have book editing experience? I do, as it is my day job, so volunteer myself to help, but how do we find out any others? Apwoolrich 14:28, 16 Feb 2005 (UTC)

As a follow up to the above we also need to be sure we can convert text existing in MediaWiki to a system a printer's typesetting gear can read. We (the publishing outfit I am involved with) insist on all texts being submitted in RTF format, which allows the author some control of the appearance of his work with Bolds, Italics etc, but which causes the page make-up program no grief. Texts submitted in MS Word come with bastard codes (automatic footnoting, bullet points etc) which foul up the page format when flowed in, and cause much work to remove the codes manually. This is a very important point and I urge conversations are begun with the MediaWiki boffins ASAP to make sure we have no problems later on. The other option would be to scan paper printouts into a PC for editing in RTF, which is not something to be recommended, since it is very time-consuming and effectively doubles the editing work Apwoolrich 16:26, 16 Feb 2005 (UTC)

It would almost certainly be something along the lines of wikitext->XML->TeX. You can be sure the resulting typography will be beautiful! Also, the publisher we were originally talking to is used to TeX - David Gerard 12:09, 23 Feb 2005 (UTC)

I'm working with a non-profit foundation that intends to use Wikipedia as a starting point to publish its own "source book". This encyclopedia project is headed up by Dr. Frank Kaufmann (a director of IIFWP, see http://www.iifwp.org/about/directors.php)

I've discussed it a bit on the mailing list and will continue to do so. -- Uncle Ed (talk) 19:19, Mar 18, 2005 (UTC)

Interesting[edit]

Very interesting and well thought-out. I very much like the way you describe 0.7, 0.8 and 0.9 and how to get there — I think it's by far the most sensible way of actually ever reaching 1.0. I'm glad that you're also providing a filter to assess the effects of the demographics of our user base. — mark ✎ 22:53, 26 May 2005 (UTC)[reply]

Trust Metrics[edit]

lkcl29oct2005: http://advogato.org - expanded out and greatly improved, such that 1) it uses SQL 2) instead of the ford-fulkersson maxflow algorithm (a vertically-scanning algorithm) you use a horizontally-scanning algorithm, which is much better as it can be "stopped" and also it would lend itself better to making SQL queries:

example:

1) you let _anyone_ create "tags" on articles:

       "i believe this article should be included on the DVD";
       "i believe this article is 'GOOD'".

2) the SECOND tier is the "trust metric" one:

       "i trust person X to make correct assessements about the
        'GOOD'ness of articles";

       "i trust person X regarding their opinions on what
       should go on the DVD".

then, you do a database join of the "tagging" against the "trust metrics" to get a list of nodes that you need to analyse:

INSERT INTO initial_seeds username="admin1", specialisation="article is good"; INSERT INTO tags username="fred", opinion="good", article_name="bob the builder" INSERT INTO certs from_username="admin1", to_username="fred",

                 specialisation="article is good"

SELECT tags.username, tags.opinion, tags.article_name,

      certs.from_username, certs.to_username, certs.level,
      initial_seeds.username, initial_seeds.specialisation

FROM tags, certs WHERE tags.username = certs.to_username AND

     initial_seeds.username = certs.from_username AND
     initial_seeds.specialisation = certs.specialisation AND
     tags.article_name ="bob the builder";

that will get you a set of people connected to the "first level seeds". you then make a SECOND set of queries of the usernames that all those people are connected to (assuming you haven't already got their names in your list. yes, you could make a temporary table and make the query exclude anyone).

you repeat this process five to seven times, depending on how "deep" you want to go. eventually you will run out of people or you hit the 7th query.