Il text mining nelle applicazioni per la sicurezza
pubblicato su “Analisi Difesa”, Mensile di politica e analisi militare, n° 33 (aprile 2003)
L'osservazione di alcuni recenti eventi criminosi (riconducibili al fenomeno del terrorismo ocomunque della... more L'osservazione di alcuni recenti eventi criminosi (riconducibili al fenomeno del terrorismo ocomunque della criminalità) evidenziano una "escalation tecnologica" nella ideazione, progettazione e attuazione dell'azione criminale: l' uso che in alcune occasioni é stato fatto deiconvenzionali servizi di Internet (web, email, newsgroup, forum, chat, ecc.) suggerisce una profonda trasformazione di modelli organizzativi ed operativi all'interno dei gruppi criminali.
Sentiment Analysis amidst Ambiguities in YouTube Comments on Yoruba Language (Nollywood) Movies
In Proceedings of the 21st international World Wide Web Conference (WWW2012), April 16 - April 20, 2012, Lyon, France
Nollywood is the second largest movie industry in the world in terms of annual movie production. A dominant number of... more Nollywood is the second largest movie industry in the world in terms of annual movie production. A dominant number of the movies are in Yoruba language spoken by over 20 million people across the globe. The number of Yoruba language movies uploaded to YouTube and their corresponding comments is growing exponentially. However, YouTube comments made by native speakers on Yoruba movies combine English language, Yoruba language, and other commonly used “pidgin” Yoruba language words. Since Yoruba is still a resource constrained language, existing sentiment or subjectivity analysis algorithms have poor performances on YouTube comments made on Yoruba language movies. This is because of the constrained language ambiguities. In this work, we present an automatic sentiment analysis algorithm for YouTube comments on Yoruba language movies. The algorithm uses SentiWordNet thesaurus and a lexicon of commonly used Yoruba language sentiment words and phrases. In terms of precision-recall, the algorithm performs more than a state-of-the-art sentiment analysis technique by up to 20%.
Performance and Trends in Recent Opinion Retrieval Techniques
Sylvester Olubolu Orimaye, Saadat. M Alhashmi and Siew Eu-Gene
Faculty of Information Technology, Monash University
email: {sylvester.orimaye, alhashmi, siew.eu-gene}@monash.edu
The Knowledge Engineering Review (in press), Cambridge University Press.
This paper presents trends and performance of opinion retrieval techniques proposed within the last eight years. We... more This paper presents trends and performance of opinion retrieval techniques proposed within the last eight years. We identify major techniques in opinion retrieval and group them into four popular categories. We describe the state-of-the-art techniques for each category and emphasize on their performance and limitations. We then summarize with a performance comparison table for the techniques on different datasets. Finally, we highlight possible future research directions that can help solve existing challenges in opinion retrieval.
26 views
Seen by:Ingeniería del Conocimiento y del Producto
Ponencia impartida en el Curso de Ingeniería del Conocimiento y del Producto, organizado por AINVEX y Techné Research Group, en la Facultad de Ciencias de la Universidad de Granada, los días 12 y 14 de abril de 2012.
Enlace del curso:
http://secretariageneral.ugr.es/pages/tablon/*/noticias-canal-ugr/2012
Enlace a página de Techné:
http://www.ugr.es/~tep028/eventos/curso_IC_IP_2012/curso_IC_IP_2012.ph
Enlace a página de AINVEX:
http://ainvex.blogspot.com.es/2012/04/curso-ingenieria-del-conocimient
Introducir el concepto de Vigilancia Tecnológica, herramienta de la Ingeniería del Conocimiento, para el desarrollo de... more
Introducir el concepto de Vigilancia Tecnológica, herramienta de la Ingeniería del Conocimiento, para el desarrollo de productos comerciales innovadores como fuente prometedora de empleo para las nuevas generaciones de licenciados e ingenieros es el objetivo del curso ”Ingeniería del Conocimiento y del Producto”, organizado por el Grupo de investigación “Techné. Ingeniería del Conocimiento y del Producto” y la Asociación de Investigadores Extranjeros, AINVEX; con la financiación del Vicerrectorado de Estudiantes, y la colaboración de la Facultad de Ciencias de la Universidad de Granada
Según explica Rafael Bailón Moreno, profesor del Departamento de Ingeniería Química e investigador responsable de “Techné, Ingeniería del Conocimiento y del Producto”, “La industria moderna para ser competitiva necesita lanzar al mercado productos cada vez más innovadores y competitivos. La dinámica es tan rápida e implica tantos factores de tipo comercial, científico, tecnológico, sociológico, etcétera, que los responsables de tomar decisiones en las empresas, así como los técnicos y científicos que desarrollan los productos, deben estar perfectamente informados y tener un conocimiento fiable y preciso de las últimas innovaciones en productos comerciales”.
De esta manera, la información científica y tecnológica, procedente de los miles e incluso millones de documentos publicados tales como artículos de revista científica, patentes de invención, informes tecnológicos, publicaciones en prensa de divulgación, anuncios publicitarios, etcétera, es tan amplia que no es humanamente abarcable por la simple lectura de todos estos documentos. Este proceso se denomina Vigilancia Tecnológica y emplea, para ser eficaz, los denominados Sistemas de Conocimiento, basados en el análisis de palabras asociadas. Los sistemas de conocimiento pueden leer automáticamente el contenido textual de miles y miles de documentos y convertir esta información en conocimiento positivo para que las personas que deban tomar decisiones lo hagan de forma razonada. Estas técnicas de Ingeniería del Conocimiento son muy útiles en muchos ámbitos científicos y empresariales, como la búsqueda de empleo, por ejemplo, y son de vital importancia en el desarrollo de productos comerciales innovadores y competitivos.
En el curso “Ingeniería del Conocimiento y del Producto” los alumnos se introducirán de forma práctica en las técnicas de Vigilancia Científica y Tecnológica, basadas en la denominada teoría actor-red y que emplea el análisis de palabras asociadas para extraer el conocimiento positivo de miles de documentos a la vez. Se les indicará, en una primera etapa, cómo utilizar estas técnicas, que incluyen la cartografía del conocimiento, en la búsqueda inteligente de empleo. En este bloque temático se realizarán prácticas con el software Redes 2005 y se accederá a diversos portales de empleo.
Los alumnos fabricarán crema cosmética diseñada previamente con requerimientos tecnocientíficos
En un segundo bloque, los alumnos se introducirán en la Ingeniería del Producto de mano de la Vigilancia Tecnológica. De acuerdo con la teoría actor-red, los elementos científicos y tecnológicos son de igual importancia que los elementos sociológicos y de costumbres de los usuarios, en el desarrollo de productos comerciales. La parte práctica incluirá búsquedas de patentes de invención y de marcas comerciales en la Oficina Española de Patentes y Marcas; y se harán ejercicios de interpretación de etiquetas de productos que están en el mercado (en especial sobre la información respecto de la composición de estos productos).
Como los campos de la detergencia, productos de higiene, cosméticos, de alimentación y farmacéuticos, son dentro de la industria, los campos donde el producto es más relevante, los alumnos fabricarán en el laboratorio una crema cosmética que se habrá diseñado previamente atendiendo a requerimientos tecnocientíficos y de costumbres sociales y de uso. Este producto tendrá los requisitos de cualquier producto de los que hay en el mercado y cada alumno se llevará su propio tarro de crema a su casa y podrán usarlo exactamente igual que uno adquirido en el comercio.
113 views
Seen by:A pragmatic approach to Alceste analysis
Kalampalikis, N. & Moscovici, S. (2005). Une approche pragmatique de l’analyse Alceste, Cahiers Internationaux de Psychologie Sociale, 66, 15-24.
Semantic approach is very often the only one used for the interpretation of the automatic discourse analysis results.... more Semantic approach is very often the only one used for the interpretation of the automatic discourse analysis results. In this article we are interested in one of these software, Alceste, often used by the researchers working in the field of social representations. We support that a pragmatic approach of communication and language is essential from theoretical and practical point of view. From an empirical illustration, we underline the possibility of advancing the indirect traces of communication in the vocabulary, thanks to the concept of temperature of information and, there even, we show how to obtain pragmatic indications beside semantic indications obtained by this software.
L'apport de la méthode Alceste dans l'analyse des représentations sociales
Kalampalikis, N. (2003). L’apport de la méthode Alceste dans l’étude des représentations sociales. In J.-C. Abric (Ed.), Méthodes d’étude des représentations sociales. Paris, Editions Erès, pp. 147-163.
Nowcasting Events from the Social Web with Statistical Learning
Co-authored with Nello Cristianini. Accepted for publication in ACT TIST.
We present a general methodology for inferring the occurrence and magnitude of an event or phenomenon by exploring the... more We present a general methodology for inferring the occurrence and magnitude of an event or phenomenon by exploring the rich amount of unstructured textual information on the social part of the web. Having geo-tagged user posts on the microblogging service of Twitter as our input data, we investigate two case studies. The first consists of a benchmark problem, where actual levels of rainfall in a given location and time are inferred from the content of tweets. The second one is a real-life task, where we infer regional Influenza-like Illness rates in the effort of detecting timely an emerging epidemic disease. Our analysis builds on a statistical learning framework, which performs sparse learning via the bootstrapped version of LASSO to select a consistent subset of textual features from a large amount of candidates. In both case studies, selected features indicate close semantic correlation with the target topics and inference, conducted by regression, has a significant performance, especially given the short length –approximately one year– of Twitter's data time series.
The DEFT Text-mining Challenge,
Published in BULAG n. 35, 2011
Text mining applies information-extracting algorithms on large natural language text collections. The DEFT text-mining... more
Text mining applies information-extracting algorithms on large natural language text collections. The DEFT text-mining challenges have been an opportunity to demonstrate the diversity of techniques in this field, and yielded high-quality French written text corpora. The paper states possible definitions for text mining, along with its particular meaning within DEFT. Each campaign has been organized along definite steps, giving rise to specific problems, among which, the adjustment of measuring scales associated with opinion documents. Last, we examine matters in opinion meaning (present in the 2007 and 2009 campaigns) and the question of subjectivity in texts ant its processing by statistic or symbolic methods.
A System to Filter Unwanted Messages from OSN User Walls
Marco Vanetti, Elisabetta Binaghi, Elena Ferrari, Barbara Carminati and Moreno Carullo.
To be published in "IEEE Transactions on Knowledge and Data Engineering (TKDE)".
One fundamental issue in today On-line Social Networks (OSNs) is to give users the ability to control the messages... more One fundamental issue in today On-line Social Networks (OSNs) is to give users the ability to control the messages posted on their own private space to avoid that unwanted content is displayed. Up to now OSNs provide little support to this requirement. To fill the gap, in this paper, we propose a system allowing OSN users to have a direct control on the messages posted on their walls. This is achieved through a flexible rule-based system, that allows users to customize the filtering criteria to be applied to their walls, and a Machine Learning based soft classifier automatically labeling messages in support of content-based filtering.
Studying Group Behaviors: A tutorial on text and network analysis methods
IEEE Signal Processing Magazine, Co-Authored with Chris Magee
Many important technical and policy decisions are made by small groups, especially by deliberative committees of... more Many important technical and policy decisions are made by small groups, especially by deliberative committees of technical experts. Such committees are charged with fairly combining information from multiple perspectives to reach a decision that one person could not make alone. Committees are social entities and are therefore affected by any number of mechanisms recorded in the social sciences. Our challenge is to determine which of these mechanisms are likely to be encountered in the deliberative process and to evaluate how they might impact upon decision outcomes. In particular, we examine the role of committee deliberations on the U.S. Food and Drug Administration's (FDA's) advisory panels.
Reconceptualising searching and screening: How new technologies might change the way that we identify studies
Poster presentation at the 2011 Cochrane Colloquium.
Suggested citation:
Thomas J, & O'Mara AJ. (2011, Oct). Reconceptualising searching and screening: How new technologies might change the way that we identify studies. Presented at the 19th Cochrane Colloquium, 19-22 October 2011, Madrid, Spain.
Background
Typical reviews deal with the ‘information explosion’ by narrowing their search for studies (e.g.,... more
Background
Typical reviews deal with the ‘information explosion’ by narrowing their search for studies (e.g., applying search filters). Relevant evidence can be missed through this approach. Current methods to minimise the risk of missing relevant studies involve searching broadly and screening potentially tens of thousands of records, which is not always practical. Resource-efficient approaches that maximise sensitivity are needed.
Objectives
To evaluate whether new technologies allow us to search broadly without increasing the screening workload through semi-automated screening approaches. Specifically, we evaluate two types of text mining: a support vector machine using active learning (Wallace et al., 2010) and TerMine term clustering.
Methods
Text mining techniques were employed in an ongoing review to prioritise records for screening and to classify the records automatically as includes or excludes. Screening prioritisation was assessed by comparison with a ‘baseline inclusion rate’ and through the novel application of power calculations. Classification was assessed through the stability of the classifier and the calculation of performance metrics (precision, recall, F-values).
Results
Screening prioritisation worked when sufficient information was provided to the text mining tool; in the ongoing review, only 25% of all records were screened manually to identify the expected total number of includes. Classification reduced the manual screening required in all reviews evaluated, although it worked better for some datasets than others.
Conclusions
Systematic reviews need to develop ways of handling the growing amount of evidence available. Text mining is a promising approach that shifts the emphasis of identification from the searching stage to screening. Reconceptualising searching permits broad searches to be conducted and allows reviewers to be more precise in estimating the number of potentially missing relevant studies than can be achieved by narrowing the search process. Areas for further development are suggested.
Viva la nano-revolución! A semantic analysis of Spanish national press
Forthcoming in the journal 'Science Communication'
This study analyses nanotechnology’s anchoring and codification in the Spanish national press to determine the... more This study analyses nanotechnology’s anchoring and codification in the Spanish national press to determine the thematic contexts in which this technology has been discussed. Latent semantic analysis was applied to identify themes based on semantic clusters and their longitudinal evolution. This analysis was carried out on a corpus of more than 600 articles from the most prominent Spanish national newspapers and includes articles from 1997 to 2009. Findings indicate an overall positive coverage and dominant thematic clusters related to national policies, economic development and business opportunities. Surprisingly, controversies on nanotechnology are present in the early years of coverage but have become marginal over time, in contradiction with a general trend that emerged from previous studies on media representations of new technologies.
Two-layered Blogger identification model integrating profile and instance-based methods
by Amr Ahmed
Haytham Mohtasseb and Amr Ahmed.
Published in "Journal of Knowledge and Information Systems", Springer. April 2011.
DOI: 10.1007/s10115-011-0398-0
This paper introduces a two-layered framework that improves the result of authorship identification within larger... more This paper introduces a two-layered framework that improves the result of authorship identification within larger sample numbers of bloggers as compared with earlier work. Previous studies are mainly divided into two categories: profile-based and instance-based methods. Each of these approaches has its advantages and limitations. The two-layered framework presented here integrates the two previous approaches and presents a new solution to a key problem in authorship identification, namely the drop in accuracy experienced as the number of authors increases. The paper begins by illustrating the regular instance-based core model and the investigated features. It then introduces a new psycholinguistic profile representation of authors, presents similarity grouping extraction over profiles, and applies blogger identification utilizing the two-layered approach. The results confirm the improvement introduced by the proposed two-layered approach against our regular classifier, as well as a selected baseline, for an extended number of users.
Language Independent System for Document Context Extraction
by Michal Novák
PŘIBIL, Jiří, KINCL, Tomáš, BÍNA, Vladislav, NOVÁK, Michal. Language Independent System for Document Context Extraction. San Francisco 19.10.2011 – 21.10.2011. In: Proceedings of the World Congress on Engineering and Computer Science 2011. San Francisco : International Association of Engineers, 2011, s. 51–55. ISBN 978-988-18210-9-6.
At this time all people, especially managers and businessmen, are exposed to the ever-present information pollution.... more At this time all people, especially managers and businessmen, are exposed to the ever-present information pollution. This is why tools of business intelligence are of great importance; nevertheless the current methods can hardly cope with large and unstructured text sources like World Wide Web that currently becomes more and more important. To achieve this main goal we have to find and verify satisfactorily reliable methods for automatic extraction of a main context of a document, i.e., multidimensional structured characterization representing the main topic of the document. To cope with the multilingual sources we have to develop approaches that would not be dependent on the language of the source and that would not need any additional language dependent tools (like thesauri). In our conception, the context is dynamic – it means that a classification of a document will not be dependent only on the document in question but also on the corpus; the expansion of a corpus can result in a change of a document classification.
International Higher Education Online Marketing: A Cross-Cultural Study of GLOBE Clusters
by Michal Novák
KINCL, Tomáš, NOVÁK, Michal, ŠTRACH, Pavel. International Higher Education Online Marketing: A Cross-Cultural Study of GLOBE Clusters. Milan 14.11.2011 – 15.11.2011. In: Proceedings of The 17th International Business Information Management Association Conference (Creating Global Competitive Economies: A 360-degree Approach). Milan, Italy : International Business Information Management Association (IBIMA), s. 1335–1343. ISBN 978-0-9821489-6-6.
Higher education is dynamic global industry with highly competitive and developed market. Universities strive to... more Higher education is dynamic global industry with highly competitive and developed market. Universities strive to communicate their international programs to impress prospective students interested in studying abroad. International education is provided in local environment and influenced by local culture and characteristics, however it might be perceived as a global product satisfying needs of students worldwide, attaining consistent positioning and referring to similar values on all markets. This study addresses a question, whether there is a difference between communicated characteristics of international programs among universities from various cultures. Websites of seventy universities coming from different GLOBE cultural clusters are analyzed through data-mining methods. The analysis suggests that marketing communications to international students do not stand on cultural grounds as there are only marginal differences between international program communications across the world. The major difference in the group of prime international higher education providers was found between the GLOBE Anglo universities and the rest of the world.
Free Text In User Reviews: Their Role In Recommender Systems
by Maria Terzi
3rd Workshop on Recommender Systems and the Social Web, Held in conjunction with ACM RecSys’11 on 23rd October in Chicago, IL, USA
As short free text user-generated reviews become ubiquitous on the social web, opportunities emerge for new approaches... more As short free text user-generated reviews become ubiquitous on the social web, opportunities emerge for new approaches to recommender systems that can harness users‟ reviews in open text form. In this paper we present a first experiment towards the development of a hybrid recommender system which calculates users‟ similarity based on the content of users‟ reviews. We apply this approach to the movie domain and evaluate the performance of LSA, a state-of-the-art similarity measure, at estimating users‟ reviews similarity. Our initial investigation indicates that users‟ similarity is not well reflected in traditional score-based recommender systems which solely rely on users‟ ratings. We argue that short free text reviews can be used as a complementary and effective information source. However, we also find that LSA underperforms when measuring the similarity of short, informal user-generated reviews. For this we argue that further research is needed to develop similarity measures better suited to noisy short text.
NLP and Text Mining: An Application Proposal for Reader Comments in Online Journals
(2009). INFORMing the Globe. San Diego: INFORMS .
Discovery and analysis of email-driven business processes
by Marco Stuit
Marco Stuit, Hans Wortmann, Nick Szirbik, Jan Roodenburg
Information Systems, 37(2), pages 142-168, 2012.

