Organization Mining using Online Social Networks
by Michael Fire
Co-authored with Rami puzis and Yuval Elovici
Mature, developed social networking services is one of the greatest assets todays’ organization may have. However, it... more Mature, developed social networking services is one of the greatest assets todays’ organization may have. However, it is also a non-negligible threat to the organization confidentiality. Many details on organizations are exposed on social networking websites by their members along with personal information. In this paper we analyze several commercial organizations by mining data their employees exposed on Facebook, LinkeIn, and other publicly available sources. Using a web crawler designed for this purpose we extract a network of informal social relationships of employees of a given target organization. Our results show that, using centrality analysis and machine learning techniques applied on the structure of the informal relationships network, it is possible to identify leadership roles within the organization. It is also possible to get valuable non trivial insights on the organization structure by clustering this network and gathering publicly available information on the employees within eac h cluster. Organizations willing to conceal, their structure, location and specialization of branches, the identity of leaders, etc. must enforce strict policies controlling the use of social media by their employees
Large Social Networks can be Targeted for Viral Marketing with Small Seed Sets
Paulo Shakarian, Damon Paulo, IEEE/ACM ASONAM 2012
In a "tipping" model, each node in a social network, representing an individual, adopts a behavior if a... more
In a "tipping" model, each node in a social network, representing an individual, adopts a behavior if a certain number of his incoming neighbors previously held that property. A key problem for viral marketers is to determine an initial "seed" set in a network such that if given a property then the entire network adopts the behavior. Here we introduce a method for quickly finding seed sets that scales to very large networks. Our approach finds a set of nodes that guarantees spreading to the entire network under the tipping model. After experimentally evaluating 31 real-world networks, we found that our approach often finds such sets that are several orders of magnitude smaller than the population size. Our approach also scales well - on a Friendster social network consisting of 5.6 million nodes and 28 million edges we found a seed sets in under 3.6 hours. We also find that highly clustered local neighborhoods and dense network-wide community structure together suppress the ability of a trend to spread under the tipping model.
Privacy, Online Advertising and Marketing Techniques: The Paradoxical Disappearance of the User
by Vian Bakir
Co-written with Andy McStay. Published in Ethical Space: the International Journal of Communication Ethics, 3(1), (2006)
The critiques literatures on new media and cybercultures, online privacy regulation, and online marketing techniques... more The critiques literatures on new media and cybercultures, online privacy regulation, and online marketing techniques to explore their effacement of users.
A Composite Strategy for the Legal and Ethical Use of Data Mining
by Brett Landry
International Journal of Management, Knowledge and Learning, 1(1), 27–43
Dinah Payne
University of New Orleans, USA
Brett J. L. Landry
University of Dallas, USA
An increasingly popular business practice, data mining provides or the extraction of information from existing data to... more
An increasingly popular business practice, data mining provides or the extraction of information from existing data to identify trends such as consumer purchasing practices and can foster greater efficiency in companies’ marketing efforts. There are corresponding costs associated with data mining, as well. The most difficult issue surrounding data mining is that of individual
privacy rights and the costs associated with the potential alteration of ‘traditional’ privacy rights. This paper seeks to review basic definitional information on data mining and provide a strategy for companies’ successful, meaningful and ethical use of data mining as presented for meaningful knowledge
generation.
Keywords: management, learning, knowledge, data mining, text mining, privacy, ethics, strategy
Data mining in publishing: a nice feature or a necessity?
originally published as: Fernandez-Steeger, T.M., Zander, F., Callsen, S., Steinberg, S., Brauns, N. (2004): Data mining in publishing - a nice feature or a necessity ?. In: Zanasi, A. et. al. (Eds.): Data Mining V.,
253 - 262.
The two distribution channels in the publishing business – retail and subscription – compete against but at the same... more The two distribution channels in the publishing business – retail and subscription – compete against but at the same time complement each other. The following three examples show the importance and potentials of data mining in this stress field. The first example shows a model using Markov chains that will explain the coherence of these units. The other examples show how to estimate the medium lifetime of subscriptions and subsequently the maximum IPO. The last example finally explains how to create test groups for market research by employing supervised clustering and points out the advantages of using this method.
As You Can See: Applying Visual Collaborative Filtering to Works of Art
published in Digital Humanities Quarterly
Art historically relevant visual knowledge can be deconstructed and the resulting components of this visual knowledge... more Art historically relevant visual knowledge can be deconstructed and the resulting components of this visual knowledge — visual discernments — lend themselves to be socially negotiated. Individual visual experts (like connoisseurs) do not share some grand and undividable cognitive cataloguing system; they are attentive to piecemeal visual discernments and the patterns in which these occur in reality. In conventional scholarly communication sophisticated tools to discuss perceptual patterns are lacking. This paper not only proposes a theoretical model of visual knowledge accumulation, but also describes a practical implementation, Art.Similarities, which is designed as a prototype of such a sophisticated tool. Using a custom-made interface it records visual behavior: the non-verbally expressed visual similarity judgments of distributed individuals. Users can be assigned to groups according to the qualities of their judgments. These qualities may be distilled from emerging similarity patterns. The implications of individual judgments in different user groups may vary considerably. Emerging patterns can be assessed both according to human analysis and statistical procedures. Most studies on art evaluation are attentive to either the characteristics of works, or the characteristics of observers. In this study both are considered as interdependent entities consistently.
9 views
Von Kundenhaltbarkeit und Kundenwert zum modernen Kundenservice - Entwicklung von Kundenwertmodellen und Optimierung der Kundenbindung.
Zander, F., Brauns, N., Fernandez-Steeger, T.M. (2005): Von Kundenhaltbarkeit und Kundenwert zum modernen Kundenservice - Entwicklung von Kundenwertmodellen und Optimierung der
Kundenbindung. In: Rödel, E., Bödeker, R-H. (Hrsg.): SAS: Verbindung von Theorie und Praxis, 9. Konf. d. SAS-Anwender in Forschung und Entwicklung (KSFE) 2005. 03.-04. Mär. 2005, Berlin, Aachen, 391-406.
Die Kundenbindung und der Wert eines Kunden gewinnen immer mehr an Bedeutung. Der Kundenwert gilt nicht nur als... more
Die Kundenbindung und der Wert eines Kunden gewinnen immer mehr an Bedeutung. Der Kundenwert gilt nicht nur als Messgröße für die aktuelle Rentabilität, sondern definiert auch die Rahmenbedingungen für kundenspezifische Maßnahmen in Vertrieb, Marketing und Service innerhalb des Abonnementsgeschäfts der Bauer Verlagsgruppe.
Die Kundenhaltbarkeit dient als Schlüsselparameter für die Bestimmung des zukünftigen Kundenwertes. Es werden sowohl unterschiedliche Verfahren der Prognose der Abonnementdauer verglichen, als auch die wichtigsten Parameter und Haltbarkeiten für Neu- und Altkunden analysiert. Weiterhin wird gezeigt, dass eine Datenbasis von zwei Jahren ausreichend ist, um verlässliche Haltbarkeitsprognosen mit einem Zeithorizont von 10 bis 30 Jahren zu erstellen. Dazu wird zwischen Makro- und Mikromodellen unterschieden. Makromodelle lassen sich zur Steuerung der Neukundengewinnung erfolgreich anwenden. Bei der Evaluierung von Kundenbindungsmaßnahmen ist es sinnvoller, auf Mikromodelle zurückzugreifen, um somit kundenspezifische Eigenschaften stärker zu berücksichtigen. Die Makro- und die Mikroebene werden in einem zweistufigen Ansatz in SAS integriert. Exakt prognostizierte Haltbarkeiten lassen trotzdem nur eine Schätzung des zukünftigen Kundenwertes zu. Je nach Schätzformel sind Fehler beim Kundenwert von bis zu 20% möglich. Mit zuverlässige Verfahren ist es dennoch möglich, den Kundenwert genauer zu ermitteln und den Fehler unter der 5%-Grenze zu halten.
Anhand der Kundenwerte ist es möglich, die Abonnenten z.B. über ein Scoring-Modell in verschiedene Gruppen (ABCD) zu klassifizieren. An Beispielen wird gezeigt, wie die Verwendung von Kundenwertgruppen zur Optimierung der Kundenbindung aber auch zur Anpassung des Kundenservices angewendet werden kann. Da die Rentabilität auch noch durch andere Faktoren wie Cross- und Up-Selling-Potentiale beeinflusst wird, werden Lösungen gezeigt, wie sich diese Potentiale in die Klassifizierung integrieren lassen.
Praktische Anwendungen aus dem Data Mining
Fernandez-Steeger, T.M., Zander, F. (2003): Praktische Anwendungen aus dem Data Mining. 7. Konf. F. SAS-Anwender in Forschung und Entwicklung (KSFE) 2003. 20.-21. Feb. 2003, Potsdam, Heidelberg.
Data Mining Methoden werden in der VKG Verlagvertriebs KG in den drei Kernbereichen Abonnement, Logistik und... more
Data Mining Methoden werden in der VKG Verlagvertriebs KG in den drei Kernbereichen Abonnement, Logistik und Einzelhandel für unterschiedliche Aufgaben in der Planung, Kontrolle und im Qualitätsmanagement verwendet. Dabei kommen verschiedene Werkzeuge von SAS zum Einsatz.
Hier sollen Verfahren vorgestellt werden, mit denen aufgrund von Kundenanalysen auf deren Kündigungs- bzw. Reklamationswahrscheinlichkeit geschlossen werden kann. Obwohl beide Probleme zunächst ähnlich erscheinen, unterscheiden sich die Modell- und Lösungsansätze. Ein wesentlicher Unterschied liegt z.B. darin, dass es bei der Reklamationsmodellierung keinen Zielwert gibt. Weiterhin tritt hier eine bedeutendere und insofern modellrelevante Inter-aktion zwischen Kunden und Firma auf, was die Komplexität des Modells erhöht.
Es wird kurz auf die speziellen Anforderungen im Preprocessing eingegangen, die notwendig sind um heterogene Datensätze bzw. Missing Values systematisch und sinngerecht umzuwandeln. Im Wesentlichen wird aber auf die Modellanforderungen und Schwierigkeiten bei der Modellierung der beide Probleme mit logistischen Regressionen und Neuronalen Netzen eingegangen. Besonderes Augenmerk wird auf die Eignung von Stufenfunktion zur Bewertung der Zielvariable gelegt und dieser ein Gruppenmodell gegenübergestellt.
Abschließend wird der Nutzen solcher Modelle bzw. deren Einsatz als Simulations- bzw. Steuerungswerkzeug in der Praxis des Verlagsvertriebs z.B. der „rollierenden Aboplanung“ (sliding window Technik) und deren Einbindung in die bestehenden Informations- (OLAP) und Controllingsysteme dargestellt.
The role of Emotional Stability in Twitter Conversations
by Fabio Celli
Celli, F., Rossi, L. (2012) The role of Emotional Stability in Twitter Conversations. In Proceedings of Workshop on Semantic Analysis in Social Media, in conjunction with EACL 2012, Avignon.
Data without Boundaries - second call for proposals
by TARKI Social Research Institute
Support for transnational access to official microdata Support for transnational access to official microdata
Concurrent Activity Recognition for Clinical Work
In 2012 IEEE World Congress in Computational Intelligence
We present an approach to learning to recognize concurrent activities based on multiple data streams. One example is... more We present an approach to learning to recognize concurrent activities based on multiple data streams. One example is recognition of concurrent activities in hospital operating rooms based on multiple wearable and embedded sensors. This problem differs from standard time series classification in that there is no natural single target dimension, as multiple activities are performed at the same time. Hence, most existing approaches fail. The key innovations that allow us to tackle this problem is (1) learning to recognize base activities from raw sensor data, (2) creating artificial joint activities from base activities using frequent pattern mining and (3) handling temporal dependency using virtual evidence boosting.
Towards automatic retrieval of idioms in French newspaper corpora
Degand, Liesbeth; Bestgen, Yves (2003). Towards automatic retrieval of idioms in French newspaper corpora. In : Literary and Linguistic Computing : the journal of digital scholarship in the humanities (2003), p. 249-259. doi: 10.1093/llc/18.3.249
Effective Hypertensive Treatment using Data mining in Saudi Arabia
by Eng. Mohammad Khubeb Siddiqui
Journal of Clinical Monitoring and Computing, DOI: 10.1007/s10877-010-92602, Vol 24 No. 5 Springer’s Netherlands, October 2010.
In the present investigation, the data sets of (Non Communicable Diseases) NCD risk factors a standard report of Saudi... more In the present investigation, the data sets of (Non Communicable Diseases) NCD risk factors a standard report of Saudi Arabia 2005 in collaboration with World Health Organisation have been employed for regression analysis using data mining technique and that leads to the prediction of which treatment contributes more to improvement in hypertension. The Oracle Data Mining (ODM) tool has been used for the analysis of data. The data sets for different age groups in case of blood pressure treatment for hypertension for Male using different modes have been studied. The age group is in between of 15 years to 64 years. Data mining is an appropriate and sufficiently sensitive method to analyze outcomes of which mode of treatment is more effective to which age group. This has been predicted using Oracle Data miner.
