Academia.eduAcademia.edu

Data mining: Consumer privacy, ethical

Mauri  Ribeiro
This paper
A short summary of this paper
30 Full PDFs related to this paper
Human Systems Management 22 (2003) 157–168 157 IOS Press Data mining: Consumer privacy, ethical policy, and systems development practices Christina Cary, H. Joseph Wen ∗ and Pruthikrai Mahatanankoon School of Information Technology, College of Applied Science and Technology, Illinois State University, Normal, IL 61790-5150, USA Abstract. The growing application of data mining to boost corporate profits is raising many ethical concerns especially with regards to privacy. The volume and type of personal information that is accessible to corporations these days is far greater than in the past. This causes many consumers to be greatly concerned about potential violations of their privacy by current data collection and data mining techniques and practices. The purpose of this study is to identify the ethical issues associated with data mining and the potential risks to a corporation that is believed to be operating in an unethical manner. The paper reviewed the relevant ethical policies and proposed ten data mining systems development practices that can be incorporated into a software development lifecycle to prevent these risks from materializing. Keywords: Data mining, consumer privacy, ethical policy, software development Christina Cary is a graduate student Pruthikrai Mahatanankoon is an As- in Applied Computer Science at School sistant Professor of Information Sys- of Information Technology at Illinois tems at School of Information Technol- State University. Her research interests ogy at Illinois State University. He holds include systems integration, Net tech- a Bachelor’s degree in Computer Engi- nology, software development and Web neering from King Mongkut’s Univer- services design. sity of Technology Thonburi, Thailand, a MS in Management Information Sys- tems, and a MS in Computer Science from Fairleigh Dickinson University. He receives a PhD in Management Informa- tion Systems from the Claremont Grad- uate University. He has published arti- cles in Encyclopedia of Information Sys- H. Joseph Wen is an associate profes- tems, DSI proceedings, and other academic book chapters. His cur- sor of Information Systems at School of rent research interests focus upon Internet technology usage and Information Technology at Illinois State abuse in the workplace, mobile commerce, web services, quantitative University. He holds a PhD from Vir- research methods, and virtual workplace and virtual organizations. ginia Commonwealth University. He has published over 90 papers in academic refereed journals, book chapters, ency- clopedias and national conference pro- ceedings. Dr. Wen has received over six million dollars research grants from var- 1. Introduction ious State and Federal funding sources. His areas of expertise are Internet re- search, electronic commerce (EC), trans- With the ever-increasing availability of personal in- portation information systems, and soft- formation in electronic form, many new uses and po- ware development. He has also worked as a senior developer and project manager for various software development contracts since tential misuses of such data have become possible 1988. and easy to implement with technologies such as data warehousing and data mining. This raises many ethical issues concerning the violation of consumer’s privacy * Corresponding author: H. Joseph Wen, School of Information and ownership of their personal data. These ethical is- Technology, College of Applied Science and Technology, Illinois sues and risks need to be identified and addressed by State University, Normal, IL 61790-5150, USA. Tel.: +1 309 438 businesses whenever they attempt new applications of 7756; E-mail: hjwen@ilstu.edu. data mining technology. 0167-2533/03/$8.00  2003 – IOS Press. All rights reserved 158 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices The practice of data mining attempts to extract even less than a year when it contracted with a data-mining more information from existing data by finding a cor- company to identify customers that would likely be in- relation or trend in the data. It is also called knowledge terested in AT&T’s new flat-fee wireless service [7]. discovery by some because data miners do not know This is typical of the type of results seen in data mining specifically what they are looking for before they find success stories. However, the success of data mining it. They are seeking to discover new insights from the depends on the quality of the original data and of the data in their databases. The most popular use of data ability of data miners to distinguish meaningful corre- mining for businesses today is as a tool for identify- lations from misleading patterns in the data. Accord- ing consumers to target in direct marketing campaigns. ing to Matthews, hundreds of companies were already The potential benefits of data mining to business can using data mining by 1997 in an effort to boost profits. be huge. By targeting only consumers who have been However, a survey of these companies by an IT con- identified as being more receptive to a particular prod- sulting firm found that almost three-quarters of them uct or service, businesses can save money on market- had not realized any substantial benefit and those that ing while increasing their customer base substantially. had did not receive near the amount of return that they This is merely the most obvious use of data mining. had expected. Nonetheless, visions of returns like those There are many other ways that businesses can use data gained by AT&T continue to drive companies toward mining to boost or protect profits. For example, insur- this new technology, making data mining itself an $8 ance or lending companies might use data mining to billion industry [7]. classify customers according to the potential risk that There are many ethical risks involved in data min- they pose or manufacturing companies might use data ing at all stages of the process from when the original mining on their manufacturing data to identify ineffi- data is collected to when the insights gained from data ciencies or new opportunities in their processes. mining are put to use. In fact, one of the areas of the Most companies already have vast amounts of elec- most ethical concern to the public is that of the initial tronic data just sitting in databases or data warehouses gathering of consumer data before any mining is done. or can easily buy customer data from a variety of In the past, most consumer data in a business’ data- sources. This makes data mining seem like an attrac- bases was transactional. Information was stored about tive and inexpensive proposition. However, the poten- voluntary transactions that the customer chose to make tial costs of data mining can be substantial and unex- with the company. This data was necessary for the pected. These costs can be both tangible and intangible transaction and the customer was fully aware of its col- and result from consumer opposition to data mining lection. However, businesses today are gaining increas- practices that may be seen as unethical. If a business ingly more personal information without the average use of consumer data is viewed as violating consumer consumer being aware of the collection or transfer of privacy or trust then resulting costs can be loss of cus- this data. Much of this is done with the help of the In- tomers, falling stock prices, and projects that must be ternet in several ways. The most ethically questionable cancelled in the face of public opinion. This study fo- has been referred to as “dataveilliance” [14]. This is the cuses on the ethical concerns and issues raised by the electronic monitoring of people’s actions or communi- mining of consumer data and proposes ten systems de- cations over the Internet and is done without the aver- velopment practices for incorporating an ethical risk age individual ever being aware of it. The type of data management strategy into the development of applica- collected in this manner can show things such as a per- tions that gather or use personal information for data son’s interests, beliefs, associations, purchasing habits, mining related activities. what kind of advertisements they respond to, what kind of discussions they engage in, what kind of people they talk to, and what type of lifestyle they lead. This is 2. Corporate data mining and data collection done in several ways. Some companies use cookies to track individuals. For example, one such company, Before delving into the costs and risks of data min- DoubleClick, places a cookie on an individual’s com- ing, let us first review how beneficial data mining really puter when he first visits a DoubleClick-affiliated site. is to a business seeking to improve customer relations This cookie contains an identification number. When- or target new customers. Some companies have re- ever the individual visits another DoubleClick spon- ceived dramatic results from data mining. AT&T Wire- sored site or clicks on a DoubleClick affiliated ad, that less was able to increase it’s subscriber base by 20% in information is entered into DoubleClick’s database us- C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 159 ing the identification number to build a profile of that keting campaign or be sold to third parties to be used individual. Other companies or web sites use other for such purposes. These forms are entirely voluntar- methods to record customer clicks or use web server ily, yet usually the user is denied services or inconve- logs to track individuals using the IP address of their nienced if he does not answer these questions. Also, he computer since information such as IP address and the is usually not given the option of opting-in or out when type of software the individual is using is sent to the it comes to future uses of the data that he provides. web server whenever a request for a web page is made. Internet search engines can also be used to find infor- mation about people as many things such as posts to 3. Ethical theory and data mining policy message boards or newsgroups are often archived on the Internet. Several ethical theories can be used to guide data This type of tracking and information gathering is collection and data mining techniques and practices. perfectly legal at the moment. The Internet is treated A policy that arises from data mining and customer pri- as a public sphere and these companies claim that they vacy includes relationships with and responsibility to- are doing nothing wrong. However, ethical issues arise ward customers. While consumers are generally con- because of the secretive or deceptive way in which the cerned with making tradeoffs between privacy and in- practices are carried out. Companies like DoubleClick dividual security, control of personal information, and claim that their data gathering is beneficial to the con- mere convenience, companies on occasion make trade- sumer because it helps tailor marketing to their in- off between customer’s privacy and company’s prof- terests and reduces the amount of irrelevant market- itability, which in most cases; it is difficult to determine ing and communications that they receive. This claim a definite solution to data mining practices. seems to be contradicted by the way in which they What makes a company’s data mining policy overly conduct their data-gathering. The DoubleClick system complex is that there are many integrated stakehold- is designed to be transparent to the user and the user ers that will influence any ethical data mining decision is not given the option of opting-in or asked for his and prudent stakeholder analysis is required to incor- consent [9]. If the user manages to see the fine print porate such decision. Stakeholder analysis includes all or DoubleClick name written discretely on a site he people that could be affected by such policy, such as can choose to opt-out by disabling the use of cookies managers, employees, customers, stockholders, com- through his browser software. That is, if he knows how. petitors, etc. Therefore, the ability for any company to The majority of Internet users are still confused and effectively implement a data mining policy depends on uneducated about these types of technologies that are the situation that has specific and unique relationships part of the Internet. with various stakeholders, both internal and external to Another very popular information gathering tech- the company. Not to make the policy too complicated, nique on the Internet is that of the registration form. our recommended data mining policies are based on Part of the Internet’s mass appeal has always been the the macro relationships that only exist between com- idea of getting information or services for free and panies and customers; these simplified policies allow when an individual browses the Internet he is indeed managers to find distinctive practices that match with courted by a wide variety of businesses offering him in- their companies’ data mining applications and needs. formation or services to draw him into their sites. How- Incorporate existing ethical theories into data min- ever, web sites are costly to maintain and Internet users ing policies requires several assumptions. Ethical theo- can come and go with the click of a button. In order to ries assume that companies have free choices and ratio- ensure a return on their investments, businesses need to nal judgment to “do the right thing” by making various establish a more permanent relationship with the users tradeoffs between achieving benefits for customers and who visit their sites. Therefore they ask the user to fill situations in general, and reaping advantages of data out a registration form, ostensibly for the purpose of mining applications. providing better service to him. However, the questions Developing of a data mining policy is based on on these forms go well beyond the realm of relevance three distinct ethical theories: utilitarian, deontolog- to usage of the site. They seek to determine things such ical, and natural rights. Utilitarian policy is derived as income level, lifestyle, education, interests, and pur- from a utilitarianism view which examines the conse- chasing habits. This information can then be used by quence(s) of an action. An action might decrease util- the company to target the individual with its own mar- ity (i.e. individual needs and happiness) for some cus- 160 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices tomers and increase it for others. Therefore, data min- cerned about potential violations of their privacy, the ing practices are deemed appropriate if they increase natural rights policy will lessen those consumers’ con- the overall utility of customers. For example, Dou- cerns by focusing on specific issues regarding individ- bleClick claim that their data gathering is beneficial to ual privacy. the consumer because it helps to customize their in- Obviously, each policy has its virtue and drawbacks. terests. On contrary, deontological policy does not ex- Table 1 summarized the ultimate goals of each pol- amine the consequence(s) from data mining practices, icy and its drawback. Management is advised to con- but advocates the duty and responsibilities that com- duct thorough stakeholder analysis before implement- panies have to their customers. Companies need to be ing such policy. Furthermore, these policies can be as- truthful about their actions and no matter how good the sorted to derive at a specific outcome ranging from the consequences of their actions may be, some actions are most flexible policy (utilitarian) to most stringent pol- always wrong. For example, when customers are not icy (natural rights). given the opportunity to opt-out from companies’ data mining practices, or when the companies are mining the data for its secondary use without any consumers’ 4. Privacy, ownership, and consent consent, deontological policy is being violated. The third view focuses upon the basic natural rights (i.e., Three issues come into play here, privacy, owner- the rights to life, liberty, property, privacy) of con- ship, and consent. Many consumers feel that their pri- sumers, and companies must honor those rights. Since vacy is violated by these information-gathering prac- personal information is accessible for data mining pur- tices. The data-gathering companies claim the infor- poses, it may cause many consumers to be greatly con- mation they are gathering is a public good gathered in a Table 1 Three Aspects of Data Mining Policy Data Mining Policy Goal of Policy Drawbacks Utilitarian Policy Maximize consumers’ utility by implementing data • Difficult to determine all the consequences (total mining practices that optimize consequences for all utility) of data mining practices the affected consumers • Consumer utility is based on personal subjective views, which vary from one person to another, therefore, any particular data mining outcome might decrease utility for some group of people and increase it for others • Difficult to identify which consumer groups will reap the most benefits Deontological Policy Minimize the use of data mining techniques and • Inflexible and rigid for companies to utilize data practices by focusing on the contractual obligations mining technology to its true potential and responsibilities that companies have for their • Less responsive to consumer and market demand consumers • Consumers consent is required if the data mining practices go beyond what is outlined in the policy • Difficult to identify consumer groups to whom companies have the highest obligations when it comes to data mining practices Natural Rights Policy The use of any data mining techniques or practices • Clear distinction must be communicated to are prohibited if they violate consumers’ rights to consumers between their fundamental rights (i.e., privacy liberty, privacy, etc.) and their contractual rights (i.e., business agreement of personal information usage) to avoid consumer dissatisfactions • Nearly eliminate secondary use of data mining information • Consumers’ consent is required for any new data mining algorithm to be utilized C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 161 public sphere and that therefore privacy is not being vi- possible to allow the consumer to have the right of giv- olated [9]. They assume the users consent to use the in- ing informed consent for each use of his data. Much of formation gathered when the user voluntarily uses ser- the data being mined is historical and it is not clear how vices that are monitored or fills out a form. They do not new data being collected will eventually be used [16]. make an effort to inform the user of the future uses of For this reason, even if the previous arguments of busi- his data or to provide him with a means of opting-out ness regarding informed consent hold up with tradi- of the practice. In the case of registration forms, many tional uses of the data, they fall apart when that data businesses do provide a privacy policy for the user to is mined. Another issue with data mining is that even read before submitting the form. However, these pri- if the individual only provides information that he is vacy policies can be difficult to understand and may comfortable sharing, the inferences drawn from that not be consistently followed by the organization. Also, data can reveal information that he does not want re- the user doesn’t usually have much choice but to agree vealed and which may be harmful to him if it falls into if he wants to use the services provided by the web site. the wrong hands. Most individuals would not even re- The argument that these businesses do have the consent alize that it is possible to reveal this more sensitive in- of the consumers is fairly weak. It is notable that these formation through inference and some would view it companies try very hard to avoid a situation that would as an unauthorized appropriation [6]. require the consumer to explicitly opt-in because it is A hypothetical example will make this issue more unlikely that most of them would do so [9]. The issue clear. Suppose a bank asks its customers to fill-out a of ownership brings a sharp contradiction in the argu- routine survey of interests and lifestyle factors when ment of business to light. They believe that privacy is they first open an account in order to tailor their ser- not violated because the information that they are gath- vices to each customer. Later that data is mined and ering is a public good. Yet they then turn around and several correlations are found that cause the bank to treat it as a private good when they claim ownership of define a group of people that it classifies as a bad credit the data and attempt to sell it [12]. risk. Then a customer who falls into this group is de- Another concern is in the type of data being col- nied a loan because the interests and lifestyle answers lected. Some types of personal information are seen he gave on a survey match the profile of this new group as being more sensitive than others. What compli- of which he isn’t even aware. Or less drastically, maybe cates this issue is that sensitivity level varies accord- a customer simply isn’t given the choice of partici- ing to the individual [16]. One consumer may view in- pating in some new opportunity or service because he come level as very sensitive and private but not care doesn’t match another profile constructed through data who knows their marital status. Another consumer may mining. have reasons to view their marital status as very sen- Here, individuals are being targeted for or denied sitive information that might affect their employabil- services on the basis of questionable inferences drawn ity or some other potential opportunity while not car- from questionable data without their ever being aware ing who knows their income level. This makes it diffi- of any potential bias. Why refer to the inferences and cult for a company to gauge customer reaction to new data as questionable? Data mining is not carried-out uses of their data. Aside from the information collected with scientific rigor. The quality or randomness of the publicly from sources such as the Internet, much in- original data is not strictly verified and therefore the formation is also bought from more seemingly private significance of inferences drawn from the data must be sources. This information can include credit history, fi- in question [7]. However, even if the data is of high nancial information, employment history, and possibly quality and is a good random sample, the inferences some medical information [8]. Many consumers would might still not be significant. This is because there may be surprised to know that these types of information be other confounding factors causing a correlation to are routinely bought. be found that are not readily apparent. An amusing il- We have seen that there are already serious ethi- lustration of an erroneous correlation given by Math- cal concerns with just the collection and distribution ews [7] is that of the relationship between stork pop- of personal information in its raw form. These con- ulations and the birth rates of certain European coun- cerns and problems are then exacerbated by the prac- tries. The countries with bigger stork populations have tice of data mining. The purpose of data mining is to more babies. So, either storks really do deliver babies discover new insights and new uses for the information or there is a confounding factor such as land-area that that companies already have. This makes it nearly im- is causing this insignificant correlation. 162 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices The question of data quality also raises the question was designed to provide small businesses with the abil- of an individual’s control over his data. The methods ity to do targeted direct marketing by searching for cus- used to collect the data are not perfect and even if the tomers that matched certain profiles according to data data is correct when it is gathered, personal data is of- provided by Equifax, a credit reporting agency. Lotus ten very time-sensitive and expires rapidly. If individ- developed the product believing that it would be a very uals do not have the power to review and edit the data successful and significant part of their product offer- that a company stores about them, then they may be ing. The public outcry they received at its announce- judged on the basis of inaccurate data. This is the ba- ment came as a complete surprise and they were forced sis for parts of the Fair Credit Reporting Act that give to scrap a significant development effort [13]. Initial consumers the right to request copies of, and demand research and ethical considerations during the initial corrections to, any credit reports kept about them by planning of this project might have prevented its fail- credit reporting agencies. Allowing consumers to view ure. the information about them that is used for data mining Aside from public opinion, government penalties for would make the process both more ethical and more violating privacy regulations can be substantial. These accurate. Organizations like EPIC (the Electronic Pri- types of regulations and laws vary widely by jurisdic- vacy Information Center) believe that this type of Fair tion so it can be difficult for a large and distributed Reporting legislation should be extended to cover more corporation to ensure compliance with all of them [8]. types of businesses that store customer profiles [5]. Some penalties can be incurred on a per record basis However, for most companies this would be highly im- which would add up to large sums of money if data was practical to implement. routinely handled in an improper manner. One final issue is the security of the data collected There are many different viewpoints that have to be and mined. If companies are going to gather and in- considered when addressing issues of privacy and eth- fer sensitive data about individuals they must make a ical data handling. First, it will be useful to review a reasonable effort to protect it from unauthorized ac- philosophical examination of privacy and its benefits cess and from unethical use by employees or outside to the individual and society. Then the current views of agents. This must be done on two levels. First, the data- government, corporations, and the general public will base itself must be protected from access by unautho- be explored. rized users. Second, authorized users must be granted The two main theories of privacy are the restricted varying levels of access to the data [10]. According access and the control theories. The restricted access to Radcliffe, “the key to passing all forms of regula- theory defines privacy as an individual’s ability to re- tory muster is defining ‘personally identifiable infor- strict access to his personal information [15]. On the mation’ and then limiting access to that information on other hand, the control theory defines privacy as the in- a need to know basis” [11]. This is necessary for obey- dividual’s ability to exercise control over his informa- ing some privacy laws as well as for meeting the pri- tion [15]. Neither theory is all-encompassing. Is an in- vacy expectations of individuals. Restricting access to dividual’s information no longer considered to be pri- certain objects or fields in a database to protect more vate if he loses the ability to restrict access or to exer- sensitive data can be difficult to do effectively espe- cise control or should its privacy still be protected? cially with data mining applications. Even if a user can Regardless of which definition of privacy one agrees only see part of the data, he can often infer the values with, the next question is whether or not it is a public of more sensitive fields [16]. and social good that should be protected. The general What are the potential costs of these risks to a cor- consensus is that privacy can be both beneficial and poration? Many companies have underestimated the detrimental to the individual and society. Some level strength of consumer opposition to their practices and of privacy needs to be insured for all individuals in or- been hurt by public outrage. The DoubleClick corpora- der to maintain a healthy society. Privacy allows in- tion was forced by public outrage to abort a large deal dividuals to have freedom of thought and freedom to to combine it’s Internet tracking profiles with another develop themselves and pursue their interests without company’s database of names and addresses in order fear of retribution [9]. This allows them to make a more to make the profiles identifiable with a person [9]. An- rich and diverse contribution to society and public dis- other company, Lotus Corp., had to abandon a poten- course. However, a society could not function with to- tially very lucrative product called Households when tal privacy. A certain limit to privacy is needed to facil- public outrage caused their stock to fall. Households itate commerce and communication between individu- C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 163 als. An extreme view of privacy would impede the gov- 5.1. Consider the expectations of the customers ernments ability to enforce laws and regulations and make it difficult for businesses to promote economic growth [9]. Before a project is undertaken that involves the use “In the United States there is no general right of pri- of personal data, steps should be taken to assess the vacy” [8]. The constitution does not address the issue customers’ expectations of the company’s behavior. and the courts do not recognize a general right to pri- Regardless of whether or not the company feels that it vacy. Statutes that grant a legal right to privacy usually is acting in an ethical manner, it needs to predict how result from the response of the lawmakers to a particu- the customers will perceive its actions [8]. It does not lar issue. Because of this, privacy laws and statues tend really matter how right the company believes itself to to be narrowly defined to apply to very specific situa- be if the customer does not agree. tions [8]. They are also often inconsistent and limited to a particular government jurisdiction. This makes it difficult to assess the legality of a course of action. 5.2. Develop a customer-oriented privacy policy Less privacy helps corporations to know and reach their customers better. They seek to justify loss of con- sumer privacy by making a utilitarian argument that Whenever customer data is to be collected, the cus- the customer’s economic benefit offsets the loss of pri- tomer should be presented with a privacy policy. A sur- vacy. By giving up some privacy and giving corpo- vey of web users found that “86% of respondents be- rations the means to target only the consumers that lieve that participation in information-for-benefits pro- are most likely to be interested, consumers are help- grams is a matter of individual privacy choice” [2]. ing to lower the cost of the products that they buy. As In the same survey, 82% of respondents indicated that Kevin O’Connor, CEO of DoubleClick said in his com- having a privacy policy would affect their decision to pany’s defense, “The less people spend on marketing, participate. This policy should not be written with the the cheaper products are” [9]. It is also argued that the purpose of protecting the company from legal action. It consumer benefits by receiving fewer irrelevant solici- should be written with the goal of adequately inform- tations. ing the customer so that he can provide real consent The average American consumer is not convinced to present and future uses of his data. To accomplish by claims that his economic benefit will outweigh his this goal, clear and precise language should be used to loss of privacy [9]. He is very concerned about the col- inform the customer as specifically as possible of how lection and use of his personal information and its dis- his data will be handled and used. The customer should tribution or sale to third parties. Unless explicit consent also be provided with information about alternative op- is given for each use of his data he will feel that his tions or avenues of recourse if he does not wish to con- privacy has been violated and that the company using sent to the practices outlined in the privacy policy. it has been deceitful in its practices. 5.3. Follow the spirit of the privacy policy internally 5. Ten data mining systems development practices The power and sensitivity of public opinion, in this Simply having a good privacy policy to show the area, dictates that corporations act to self-regulate their customers is not enough. In order to maintain a re- practices related to the handling and use of personal lationship of honesty and trust, the spirit of the pri- data. Following existing laws and regulations alone vacy policy must be followed in a consistent manner will not be enough to protect a corporation from the throughout the organization [8]. Blatantly violating the risk of damage from a negative public perception of terms of the privacy policy will alienate customers and their practices. This study will first discuss ten data may leave the company vulnerable to legal action. Fol- mining systems development practices for preventing lowing the letter of the privacy policy but finding ways these risks from materializing. Then a guideline for to circumvent the spirit of the privacy policy will not incorporating these practices into a software develop- put the company at risk for legal action but does in- ment life cycle will be presented. crease the risk of loss of customer trust. 164 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 5.4. Research and understand all laws that may have 5.7. Give customers more control over their data jurisdiction over sensitive data If customers feel that they have control over their Because of the wide variation in privacy laws, it data then they will feel that they retain a greater level of is necessary to carefully identify all jurisdictions that privacy. This can be done in any of several ways. First, may claim authority over the company’s actions. Then the customer can be given some control over what in- the relevant legislation and precedents for each juris- formation to provide in the first place. Allow the cus- diction should be reviewed to ascertain that none of tomer to opt-in or out of answering questions that are them would be violated by a proposed new practice [8]. not absolutely necessary for any transactions that the customer is participating in. Practicing full disclosure and honesty when gathering the information is essen- 5.5. Stay current on new developments and public tial to gaining true informed consent from the customer opinion and giving him a sense of control. Second, the cus- tomer can be given the ability to review and correct his Ethics are based on commonly accepted values of data. This will protect the customer from being evalu- a society. Therefore it is necessary to monitor shifts ated on the basis of incorrect data as well as protect the or new developments in these values. The best way to company from making bad business decisions based on stay in touch with current opinion is to review publica- invalid inferences caused by incorrect data. Third, the tions of consumer groups as well as industry advisory customer can be given the chance to rate the sensitiv- groups. It is also important to follow new cases in the ity of each piece of data to address the fact that indi- news as they appear, to see what issues are raising a lot viduals “are not equally protective of all values in their of controversy. records” [1]. Then data that is rated above a particular sensitivity level can be excluded from data mining op- erations. Fourth, allow the customer to choose which 5.6. Control access to data warehouses uses of his data will be allowed or disallowed. A recent survey found that “over 80% of web consumers did not Protect customers by protecting their data. A com- want web sites to resell their personal information and pany’s responsibility to its customers does not stop yet 72% would be willing to give demographic infor- with its own intentional uses of their data. In order mation as long they were made aware of how it would to retain customer trust and satisfaction, the data must be used” [12]. This shows how much an effort to make also be protected from unauthorized uses. As stated the customer feel informed and in control can alleviate above, this requires security on two levels. First the their feelings of privacy violation. database or data warehouse itself must be physically protected from unauthorized access using network se- 5.8. Evaluate the quality of source data curity measures, a secure operating system, a secure physical environment and user authentication mea- Performing data mining on inaccurate data will lead sures [10]. Second, access to the data itself should be to inaccurate inferences and conclusions. Since the restricted to users who need that particular data for au- data for data mining operations is often pulled from a thorized activities. In most cases this requires some variety of internal and/or external sources it is often in- type of Multi Level Security model (MLS) [16]. An consistent and has varying levels of accuracy. Also, the MLS model seeks to protect sensitive data by carefully nature of personal information is that it tends to expire defining which data fields or objects are accessible by rapidly. When basing business decisions on the results each user or user class. Implementing MLS effectively of data mining, companies need to evaluate the qual- can be complicated by the possibility of users inferring ity of those results by first evaluating the quality of the the values of restricted data from the values of accessi- underlying data. ble data. This is especially true when users have access to data mining tools which are made to do just that. 5.9. Develop a corporate code of conduct This is a good justification for limiting access to data mining applications to a small group of developers and This should establish standards for acceptable prac- users. tices and treatment of customers. C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 165 5.10. Perform an ethical review for each new use stage would be a section of the requirements specifi- of data cation that lists requirements that were added as a re- sult of this research in order to address the ethical con- The previous practices are more general measures to cerns. These requirements are to be specially identi- maintain consistent and ethical conduct in all handling fied in a separate, high-priority section of the specifi- of customer data. This practice applies more specifi- cation so that they are not carelessly re-prioritized or cally to new development projects or applications of cut at a later stage of development as the project suffers data mining. Before development on a new project is from feature-creep or time and cost pressures. With begun, it should pass through a formal ethical review each of these requirements should be listed a brief de- process. The purpose of this review is to identify any scription of the concern that each is meant to address ethical concerns that the project might raise and to de- or a reference to the concern in the ethical summary termine what potential risks there are from either neg- document. At the end of the requirements stage, an- ative public opinion or legal action. The project should other review should be conducted. This review should be evaluated against information gathered in Practices verify that the special requirements do adequately ad- 1, 4 and 5, and the company’s privacy policy as well dress the original ethical concerns. It should also eval- as its corporate code of conduct. In addition to these uate the requirements specification to identify new eth- sources, further guidance can be sought by consult- ical concerns that may be raised by any expansion of ing codes or recommendations published by industry scope or addition of unforeseen requirements. If any groups such as the Association for Computing Machin- new concerns are found then the process of research- ery’s code of ethics (www.acm.org). If any risks or po- ing and formulating requirements to address these con- tential conflicts are identified then a plan for mitigat- cerns should be repeated. ing these risks should be developed. If the risks can not The responsibility of the design phase is to see that be effectively mitigated then the project will probably all of the special ethical requirements are incorporated need to be abandoned or rethought. into the product design. Additional considerations dur- These practices should be incorporated into the data ing this stage will be mainly concerned with the de- mining systems development lifecycle of all projects sign of the user interface for applications that collect that will potentially manipulate customer data as well data from customers and security roles for applications as being part of the standard operations of the corpora- that mine such data. User interfaces for customer appli- tion. Figure 1 shows a set of recommendations for in- cations should be designed to meet standards of hon- corporating these practices into a typical software de- esty and clarity as prescribed by the corporate code of velopment lifecycle (SDLC). conduct. Back office applications that mine or manip- During the project initiation and planning stage ulate customer data need to be designed with adequate where projects are evaluated at the enterprise level, and complex security to protect the customer data from care should be taken to identify those projects that have unauthorized use. The design of these features should potential for ethical risk. Projects that are identified be subjected to an ethical review at the end of the de- at this stage would then be required to follow formal sign phase. No additional deliverables will exist for procedures for identifying and managing ethical risks this stage. The purpose of the review is simply to ap- throughout their lifecycles. prove the final design. Once such a project moves into its planning stage, The build phase should proceed normally with no a preliminary ethical review should be conducted as additional steps. The deliverables of this phase will be described under Practice 10. The added deliverable of reviewed during the testing phase. During testing, a this stage would be a summary report from the ethical special set of test cases should be applied to verify that review that lists ethical concerns that were identified the final product does meet the ethical requirements and steps that will be taken to mitigate them. and conform to company standards. Non-conformance During the requirements stage, customer expec- should be treated as a critical defect and fixed before tations, applicable laws, and current public opinion release. should be researched with regard to each concern listed After release of the product, a post-mortem of the in the ethical summary report. The best way to handle project should include a review of the effectiveness this would be to consult with actual customers or con- of the application of ethical risk management based sumer groups as well as a corporate lawyer that spe- on customer acceptance and satisfaction with the final cializes in these issues. The added deliverable of this product. 166 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices Fig. 1. Software development life cycle with embedded ethical requirements. 6. Lotus households software development project agency, Equifax. The Households software on the CD- ROM would allow the user to search for individuals The failed Households project that was cited earlier that matched certain criteria and develop a mailing list. will now be revisited to show how the project could The user would then purchase the right to use the mail- have benefited from following an ethical risk man- ing list from Lotus who would supply the user with a agement lifecycle. The purpose of Lotus’ Households key to decrypt the data. The CD-ROM did not contain project was to provide small businesses with an inex- raw customer data. Instead it contained data that had pensive way of gaining targeted mailing lists to market been mined from Equifax’s credit records. The result their products and services. The final product was to be of the data mining was that each individual was given a CD-ROM containing data on individuals that was ob- values for certain categories or criteria on which the tained through a partnership with the credit-reporting user could then search. C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 167 Lotus believed that their product would provide a to give the individuals some feeling of being in con- great service and expected a favorable response from trol. For example, a letter could have been sent to each the public. By their standards they had acted to ade- individual announcing the intentions of the project and quately protect the privacy of individuals by encrypt- giving the individual the opportunity to opt-out when it ing the data on the disks, preventing searches by name, was still feasible to do so. Another option would have disallowing the distribution of purchased mailing lists, been to send surveys to the individuals to get their in- and limiting the types of data that was included. The put on how the product should be designed. public was not informed of the details of the House- If Lotus had followed a lifecycle like the one rec- holds project until an announcement was made shortly ommended in this study, they would have been made before its scheduled release. Lotus was shocked by the aware of the potential ethical problems early in the de- immediate public outcry and tried to take steps to reas- velopment. Ethical reviews at critical stages of devel- sure the public of the safety of their information. They opment that followed the suggested systems develop- offered to remove information from the disks for any- ment practices would have identified the problems and one that called and requested to be removed. However, solutions before it was too late. because the product was already being implemented and the number of people calling in was in the thou- sands, this proved to be unfeasible. Lotus was even- 7. Conclusion tually forced to cut its losses and abandon the prod- uct [13]. Today’s technological environment, which allows The Households product was a sound concept that for the increasingly rapid and efficient sharing and col- fulfilled a business need. Its failure could have been lection of information across the globe, provides new prevented. Practices 1, 6, and 7 would have helped opportunities and new risks for corporations seeking to prevent the public’s negative reaction. Practice 1 is to utilize customer data in new ways. The ease of data to consider the expectations of the customers. Lotus’ manipulation through data mining makes it difficult to shock at the public’s reaction shows that this was Lo- avoid unethical behavior when working with customer tus’ biggest point of failure. Had they made an effort data. The public reaction to perceived misuses is then at the beginning of the project to gauge customer reac- made even more powerful by communication channels tion they would not have had to throw away a finished such as the Internet that provide a forum for consumers product. This is why a comprehensive ethical review to express their discontent and influence the opinions during the planning and requirements stages is neces- of a great many others. In this environment, negative sary. The worst case scenario is that they simply would public opinion can have immediate and disastrous fi- have abandoned the project before wasting money on nancial repercussions for businesses that are perceived development. The best case scenario is that they would to be acting in an unethical manner. have found a way to address the public’s concerns and Implementing ethical data mining systems devel- design a product that was more acceptable. opment practices such as the ones recommended in Practice 6 states that access to customer data should this study will benefit corporations by helping them to be strictly controlled and protected to maintain cus- build and maintain trusting relationships with their cus- tomer trust. In this case the public felt that Lotus had tomers. At the same time it will prevent real economic not taken adequate precautions to ensure that the data losses resulting from negative public reaction or legal on the disks was not misused or improperly accessed. action in response to unethical practices. The product design should have been reviewed before implementation to ensure that the data was protected well enough to make the public feel reasonably secure. References Finally, Lotus presented this product to the public as something that was already decided and completed. [1] D. Agrawal and C.C. Aggarwal, On the design and quantifica- tion of privacy-preserving data mining algorithms, ACM PODS The individuals whose data was being used were never (2001), 247–255. given the opportunity to opt-in or opt-out, review their [2] R. Agrawal and R. Srikant, Privacy-preserving data mining, in: data, or to provide input on how the data was to be Proceedings of the ACM SIGMOD, 2000, pp. 439–450. used. This violates Practice 7. Giving each individual [3] P. Bradley, J. Gehrke, R. Ramakrishnan and R. Srikant, Scaling real control over his data may not have been feasible; mining algorithms to large databases, Communications of the however some small measures could have been taken ACM 45(8) (2002), 38–43. 168 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices [4] P. Brey, Disclosive computer ethics, Computers and Society [12] E. Rose, Balancing Internet marketing needs with consumer (2000), 10–16. concerns: A property rights framework, Computers and Society [5] J. DiSabatino, Unregulated databases hold personal data, Com- (2001), 17–21. puterworld 36(4) (2002), 7. [13] R.A. Spinello, Chapter 5: Privacy and information access, in: [6] J.S. Fulda, Solution to a philosophical question concerning data Case Studies in Information Technology Ethics, Prentice Hall, mining, Computers and Society (1999), 6–7. Upper Saddle River, New Jersey, 2003, pp. 104–111. [7] R. Matthews, If chocolate then champagne, World Link 13(2) [14] H.T. Tavani, Privacy online, Computers and Society (1999), (2000), 12–13. 11–19. [8] J. Montãna, Data mining: A slippery slope, Information Man- [15] H.T. Tavani and J.H. Moor, Privacy protection, control of infor- agement Journal 35(4) (2001), 50–54. mation, and privacy-enhancing technologies, Computers and [9] J. Morse and S. Morse, Teaching temperance to the cookie Society (2001), 6–11. monster: Ethical challenges to data mining and direct market- [16] K. Wahlstrom and J.F. Roddick, On the impact of knowledge ing, Business and Society Review 107(1) (2002), 76–97. discovery and data mining, in: Conferences in Research and [10] T. Priebe and G. Pernul, Towards OLAP security design – sur- Practice of Information Technology, 2001, pp. 1–12. vey and research issues, ACM DOLAP (2000), 33–40. [11] D. Radcliff, Guarding the data warehouse gate, Computerworld 35(4) (2001), 44–45.