Tagpedia: a Semantic Reference to Describe and Search for Web Resources
Social Web and Knowledge Management Workshop at the 17th World Wide Web Conference (WWW), 2008
Nowadays the Web represents a growing collection of an enormous amount of contents where the need for better ways to... more
Nowadays the Web represents a growing collection of an enormous amount of contents where the need for better ways to find and organize the available data is becoming a fundamental issue, in order to deal with information overload. Keyword based Web searches are actually the preferred mean to seek for contents related to a specific topic. Search engines and collaborative tagging systems make possible the search for information thanks to the association of descriptive keywords to Web resources. All of them show problems of inconsistency and consequent reduction of recall and precision of searches, due to polysemy, synonymy and in general all the different lexical forms that can be used to refer to a particular meaning. A possible way to face or at least reduce these problems is represented by the introduction of semantics to characterize the contents of Web resources: each resource is described by one or more concepts instead of simple and often ambiguous keywords. To support these task the availability of a global semantic resource of reference is fundamental. On the basis of our past experience with the semantic tagging of Web resources and the SemKey Project, we are developing Tagpedia, a general-domain ”encyclopedia” of tags, semantically structured for generating semantic descriptions of contents over the Web, created by mining Wikipedia. In this paper, starting from an analysis of the weak points of non-semantic keyword based Web searches, we introduce our idea of semantic characterization of Web resources describing the structure and organization of Tagpedia. We introduce our first realization of agpedia, suggesting all the possible improvements that can be carried
out in order to exploit its full potential.
3 views
Seen by:Wikipedia and the Utopia of Openness: How Wikipedia Becomes Less Open to Improve its Quality
by Joris Pekel
Final paper for the Digital Methods Course of the MA New Media at the University of Amsterdam
Wikipedia has become an enormous source of information in the last decade. Because of its ubiquitous presence on the... more Wikipedia has become an enormous source of information in the last decade. Because of its ubiquitous presence on the internet and the speed of which it is updated, it has become more than a reference. It becomes 'a first rough draft of history'. In this study the changing politics of openness are analyzed. By looking at both small articles, as well as one extremely popular, the role of openness and transparency within Wikipedia is discussed. In this study I point out that in order to improve the quality of Wikipedia, it is sometimes necessary to limit the amount of openness, which is not a problem as long as the process remains completely transparent. At the same time, more transparency is needed to improve the smaller articles, which are often created by a single person.
18 views
Seen by:The Wikimedia Public Policy Initiative: Classroom Exercises as Introductions to Peer Production
by Elif Ozkaya
Co-authored with Cliff Lampe, Jonathan Obar, Paul Zube and Alcides Velazquez
Where Does the Anti-SOPA Movement Go Next?
This piece suggests that the remarkable surge of activism against the Stop Online Piracy Act (SOPA) represents a major... more This piece suggests that the remarkable surge of activism against the Stop Online Piracy Act (SOPA) represents a major shift in the longer term debate about copyright in the US. An anti-copyright movement that began among a small group of academics and activists in the 1990s now commands a broad, if shallow, base of public support for the first time. The piece draws on the work of pioneering copyright skeptics such as Lawrence Lessig and James Boyle, who drew an analogy between anti-copyright politics and the environmentalist movement to urge a reframing of the public domain as an issue of broad public concern. It shows how such ideas have increasingly resonated with a wider audience in the struggle over property rights and sharing on the Internet.
21 views
Seen by:Reviewing the author-function in the Age of Wikipedia
by Amit Ray
Originality, Imitation, and Plagiarism: Teaching Writing in the Digital Age
Caroline Eisner and Martha Vicinus, Editors
Permalink: http://hdl.handle.net/2027/spo.5653382.0001.001
Published: Ann Arbor: University of Michigan Press, 2008.
Identifying and grounding descriptions of places
Published at the GIR Workshop @ SIGIR 2006
In this paper we test the hypothesis Given a piece of text describing an object or concept our combined disambiguation... more In this paper we test the hypothesis Given a piece of text describing an object or concept our combined disambiguation method can disambiguate whether it is a place and ground it to a Getty Thesaurus of Geographical Names unique identifier with significantly more accuracy than naïve methods. We demonstrate a carefully engineered rule-based place name disambiguation system and give Wikipedia as a worked example with hand-generated ground truth and bench mark tests. This paper outlines our plans to apply the co-occurrence models generated with Wikipedia to solve the problem of disambiguating place names in text using supervised learning techniques.
Classifying Tags using Open Content Resources
Published at WSDM 2009
Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in... more Tagging has emerged as a popular means to annotate on-line objects such as bookmarks, photos and videos. Tags vary in semantic meaning and can describe different aspects of a media object. Tags describe the content of the media as well as locations, dates, people and other associated meta-data. Being able to automatically classify tags into semantic categories allows us to understand better the way users annotate media objects and to build tools for viewing and browsing the media objects. In this paper we present a generic method for classifying tags using third party open content resources, such as Wikipedia and the Open Directory. Our method uses structural patterns that can be extracted from resource meta-data. We describe the implementation of our method on Wikipedia using WordNet categories as our classification schema and ground truth. Two structural patterns found in Wikipedia are used for training and classification: categories and templates. We apply our system to classifying Flickr tags. Compared to a WordNet baseline our method increases the coverage of the Flickr vocabulary by 115%. We can classify many important entities that are not covered by WordNet, such as, London Eye, Big Island, Ronaldinho, geocaching and wii.
9 views
Seen by:View of the world according to wikipedia: Are we all little Steinbergs
Journal of Computational Science, 2011
Saul Steinberg's most famous cartoon "View of the world from 9th Avenue" depicts the world as seen by... more Saul Steinberg's most famous cartoon "View of the world from 9th Avenue" depicts the world as seen by self-absorbed New Yorkers. By analysing wikipediae of a range of different languages, we find that this particular fish-eye world view is ubiquitous and inherently part of human nature. By measuring the skew in the distribution of locations in different languages we can confirm the validity of plausible quantitative models. These models demonstrate convincingly that people all have similar world views: "We are all little Steinbergs." Our Steinberg hypothesis allows the world view of specific people to be more accurately modelled; this will allow greater understanding of a person’s discourse, either by someone else or automatically by a computer.
16 views
Seen by:La complémentarité des approches évolutionnistes et conventionnalistes. Application à l’analyse des routines permettant la viabilité des parties « ancillaires » des communautés « open-source » : le cas de la communauté WordPress (2003-2008).
Une communication au gros colloque et congrès de l'Association Française d'Economie Politique.
Wikipédia: un nouveau modèle éditorial?
Wikipédia s'inscrit dans un paysage numérique aux contours improbables et interroge parce qu'elle reste rétive à... more
Wikipédia s'inscrit dans un paysage numérique aux contours improbables et interroge parce qu'elle reste rétive à l'analyse sur les plans éditorial, social et économique. Entre produit encyclopédique et projet collaboratif, elle dessine un modèle éditorial différent, qui va bien au-delà d'un ensemble stable et validé de connaissances liées.
Tout d'abord, Wikipédia offre à observer un espace où la connaissance se construit en temps réel, où elle peut « concrètement » être appréhendée dans sa dynamique éminemment sociale. Ensuite, elle offre à penser sur les notions d'auteur, d'autorité et de crédibilité en écho à une culture libre en plein essor et sollicite l'exercice d'une action critique en résonance aux formes de participation citoyenne émergentes. Enfin, elle mise sur l'intelligence collective et s'inscrit dans un continuum numérique qui abolit les frontières entre lecteur et auteur, entre amateur et expert, et entre culture minoritaire et culture dominante du fait de son multilinguisme.
Comprendre Wikipédia, c'est parvenir à cerner les points de rupture et de continuité avec l'édition traditionnelle, à identifier les dynamiques hybrides qu'elle met en oeuvre. C'est aussi appréhender son influence sur la production éditoriale d'aujourd'hui dans ses formes concurrentielles ou alternatives. C'est enfin s'interroger plus largement, avec Camille Roth [ROT 07], sur sa viabilité et sur les synergies complexes qui se créent entre les participants et les contenus au sein de l'espace wiki.
L'édition de référence libre et collaborative: le cas de Wikipédia
L’année 2005 a été particulièrement riche en débats et controverses sur l’encyclopédie libre Wikipedia. Alors que... more
L’année 2005 a été particulièrement riche en débats et controverses sur l’encyclopédie libre Wikipedia. Alors que l’attention des médias grands publics et des traditionnels médiateurs du savoir reste le plus souvent mobilisée par des affaires de vandalisme et des problèmes de fiabilité et de qualité, les usages ne cessent de se développer. Le nombre total d’articles sur l’ensemble des versions est passé de 1 million 400.000 à 3 millions 400.000 en l’espace d’un an. Cette croissance exponentielle des contenus s’accompagne d’une augmentation forte de la fréquentation, plaçant l’encyclopédie parmi les 25 sites les plus visités au monde selon le baromètre Alexa.
Parallèlement, si les recherches sur les blogs et logiciels sociaux bénéficient d’une certaine audience, la communauté scientifique ne s’intéresse encore guère aux wikis et très marginalement à Wikipedia, en France notamment. Les premières éditions des conférences Wikimania (août 2005) et Wikisym (octobre 2005) laissent à penser que les jalons sont posés pour analyser les enjeux et le fonctionnement de ce phénomène éditorial sans précédent. La mutualisation opère aussi au niveau du Wikimedia Research Network, association qui regroupe les chercheurs travaillant sur Wikipedia ou sur d’autres projets de la fondation Wikimedia.
Malgré ces initiatives, les recherches restent peu nombreuses, mais surtout peu visibles. Wikipedia renouvelle-t-elle le genre encyclopédique en bouleversant nos représentations ? Autrement dit est-elle un produit de référence fiable donc légitime ou bien faut-il la considérer davantage comme un projet collaboratif autour d’un prétexte encyclopédique ? Les participants sont-ils mus par un militantisme à tendance anarchique ou bien trouvent-ils dans Wikipedia un terrain d’expression inédit pour des formes d’engagements diversifiées ? Voici quelques-unes des questions auxquelles nous tentons de répondre.
Après une première partie introductive sur la genèse de Wikipedia et les critiques habituellement formulées à son encontre, nous examinons la nature de ces nouveaux contenus et la manière dont ils se construisent, avant de nous intéresser aux rôles et motivations des contributeurs. Nous nous interrogeons pour finir sur les usages pédagogiques et processus d’acculturation en oeuvre.
23 views
Seen by:"What an Un-wiki Way of Doing Things": Wikipedia’s Multilingual Policy and Metalinguistic Practice
The article has been published in the Journal of Language and Politics, 10(4), 2011: http://www.benjamins.com/#catalog/journals/jlp
Wikipedia defines itself as “the biggest multilingual free-content encyclopedia on the internet“, thus featuring an... more Wikipedia defines itself as “the biggest multilingual free-content encyclopedia on the internet“, thus featuring an explicit language policy in its mission statement. Bearing in mind that the site has become the most popular source of encyclopaedic information online, its significance for public encounters with multilingualism should not be underestimated. This article offers a critical and multimodal discourse analytical approach to Wikipedia's explicit and implicit multilingual policies and practices. I examine, under “explicit metalinguistic practice“ (Woolard 1998), public disclaimers and exemplary user practice and talk on the “Multilingual Coordination“ entry. Under “implicit metapragmatics“, I shall offer a multimodal analysis of Wikipedia's multilingualism-oriented interface design; the corporate logo and its paratextual meta-commentary on a number of linguistic and journalistic websites; and a code-critical reading of Wikipedia's “Babel“ user language templates. My observations are discussed against the backdrop of postcolonialist theories on the role of English as lingua franca of the information age.

