Human Systems Management 22 (2003) 157–168 157
IOS Press
Data mining: Consumer privacy, ethical
policy, and systems development practices
Christina Cary, H. Joseph Wen ∗ and Pruthikrai Mahatanankoon
School of Information Technology, College of Applied Science and Technology, Illinois State University, Normal,
IL 61790-5150, USA
Abstract. The growing application of data mining to boost corporate profits is raising many ethical concerns especially with
regards to privacy. The volume and type of personal information that is accessible to corporations these days is far greater than
in the past. This causes many consumers to be greatly concerned about potential violations of their privacy by current data
collection and data mining techniques and practices. The purpose of this study is to identify the ethical issues associated with
data mining and the potential risks to a corporation that is believed to be operating in an unethical manner. The paper reviewed
the relevant ethical policies and proposed ten data mining systems development practices that can be incorporated into a software
development lifecycle to prevent these risks from materializing.
Keywords: Data mining, consumer privacy, ethical policy, software development
Christina Cary is a graduate student Pruthikrai Mahatanankoon is an As-
in Applied Computer Science at School sistant Professor of Information Sys-
of Information Technology at Illinois tems at School of Information Technol-
State University. Her research interests ogy at Illinois State University. He holds
include systems integration, Net tech- a Bachelor’s degree in Computer Engi-
nology, software development and Web neering from King Mongkut’s Univer-
services design. sity of Technology Thonburi, Thailand,
a MS in Management Information Sys-
tems, and a MS in Computer Science
from Fairleigh Dickinson University. He
receives a PhD in Management Informa-
tion Systems from the Claremont Grad-
uate University. He has published arti-
cles in Encyclopedia of Information Sys-
H. Joseph Wen is an associate profes- tems, DSI proceedings, and other academic book chapters. His cur-
sor of Information Systems at School of rent research interests focus upon Internet technology usage and
Information Technology at Illinois State abuse in the workplace, mobile commerce, web services, quantitative
University. He holds a PhD from Vir- research methods, and virtual workplace and virtual organizations.
ginia Commonwealth University. He has
published over 90 papers in academic
refereed journals, book chapters, ency-
clopedias and national conference pro-
ceedings. Dr. Wen has received over six
million dollars research grants from var- 1. Introduction
ious State and Federal funding sources.
His areas of expertise are Internet re-
search, electronic commerce (EC), trans- With the ever-increasing availability of personal in-
portation information systems, and soft- formation in electronic form, many new uses and po-
ware development. He has also worked as a senior developer and
project manager for various software development contracts since tential misuses of such data have become possible
1988. and easy to implement with technologies such as data
warehousing and data mining. This raises many ethical
issues concerning the violation of consumer’s privacy
* Corresponding author: H. Joseph Wen, School of Information
and ownership of their personal data. These ethical is-
Technology, College of Applied Science and Technology, Illinois
sues and risks need to be identified and addressed by
State University, Normal, IL 61790-5150, USA. Tel.: +1 309 438 businesses whenever they attempt new applications of
7756; E-mail: hjwen@ilstu.edu. data mining technology.
0167-2533/03/$8.00 2003 – IOS Press. All rights reserved
158 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices
The practice of data mining attempts to extract even less than a year when it contracted with a data-mining
more information from existing data by finding a cor- company to identify customers that would likely be in-
relation or trend in the data. It is also called knowledge terested in AT&T’s new flat-fee wireless service [7].
discovery by some because data miners do not know This is typical of the type of results seen in data mining
specifically what they are looking for before they find success stories. However, the success of data mining
it. They are seeking to discover new insights from the depends on the quality of the original data and of the
data in their databases. The most popular use of data ability of data miners to distinguish meaningful corre-
mining for businesses today is as a tool for identify- lations from misleading patterns in the data. Accord-
ing consumers to target in direct marketing campaigns. ing to Matthews, hundreds of companies were already
The potential benefits of data mining to business can using data mining by 1997 in an effort to boost profits.
be huge. By targeting only consumers who have been However, a survey of these companies by an IT con-
identified as being more receptive to a particular prod- sulting firm found that almost three-quarters of them
uct or service, businesses can save money on market- had not realized any substantial benefit and those that
ing while increasing their customer base substantially. had did not receive near the amount of return that they
This is merely the most obvious use of data mining. had expected. Nonetheless, visions of returns like those
There are many other ways that businesses can use data gained by AT&T continue to drive companies toward
mining to boost or protect profits. For example, insur- this new technology, making data mining itself an $8
ance or lending companies might use data mining to billion industry [7].
classify customers according to the potential risk that There are many ethical risks involved in data min-
they pose or manufacturing companies might use data ing at all stages of the process from when the original
mining on their manufacturing data to identify ineffi- data is collected to when the insights gained from data
ciencies or new opportunities in their processes. mining are put to use. In fact, one of the areas of the
Most companies already have vast amounts of elec- most ethical concern to the public is that of the initial
tronic data just sitting in databases or data warehouses gathering of consumer data before any mining is done.
or can easily buy customer data from a variety of In the past, most consumer data in a business’ data-
sources. This makes data mining seem like an attrac- bases was transactional. Information was stored about
tive and inexpensive proposition. However, the poten- voluntary transactions that the customer chose to make
tial costs of data mining can be substantial and unex- with the company. This data was necessary for the
pected. These costs can be both tangible and intangible transaction and the customer was fully aware of its col-
and result from consumer opposition to data mining lection. However, businesses today are gaining increas-
practices that may be seen as unethical. If a business ingly more personal information without the average
use of consumer data is viewed as violating consumer consumer being aware of the collection or transfer of
privacy or trust then resulting costs can be loss of cus- this data. Much of this is done with the help of the In-
tomers, falling stock prices, and projects that must be ternet in several ways. The most ethically questionable
cancelled in the face of public opinion. This study fo- has been referred to as “dataveilliance” [14]. This is the
cuses on the ethical concerns and issues raised by the electronic monitoring of people’s actions or communi-
mining of consumer data and proposes ten systems de- cations over the Internet and is done without the aver-
velopment practices for incorporating an ethical risk age individual ever being aware of it. The type of data
management strategy into the development of applica- collected in this manner can show things such as a per-
tions that gather or use personal information for data son’s interests, beliefs, associations, purchasing habits,
mining related activities. what kind of advertisements they respond to, what kind
of discussions they engage in, what kind of people they
talk to, and what type of lifestyle they lead. This is
2. Corporate data mining and data collection done in several ways. Some companies use cookies
to track individuals. For example, one such company,
Before delving into the costs and risks of data min- DoubleClick, places a cookie on an individual’s com-
ing, let us first review how beneficial data mining really puter when he first visits a DoubleClick-affiliated site.
is to a business seeking to improve customer relations This cookie contains an identification number. When-
or target new customers. Some companies have re- ever the individual visits another DoubleClick spon-
ceived dramatic results from data mining. AT&T Wire- sored site or clicks on a DoubleClick affiliated ad, that
less was able to increase it’s subscriber base by 20% in information is entered into DoubleClick’s database us-
C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 159
ing the identification number to build a profile of that keting campaign or be sold to third parties to be used
individual. Other companies or web sites use other for such purposes. These forms are entirely voluntar-
methods to record customer clicks or use web server ily, yet usually the user is denied services or inconve-
logs to track individuals using the IP address of their nienced if he does not answer these questions. Also, he
computer since information such as IP address and the is usually not given the option of opting-in or out when
type of software the individual is using is sent to the it comes to future uses of the data that he provides.
web server whenever a request for a web page is made.
Internet search engines can also be used to find infor-
mation about people as many things such as posts to 3. Ethical theory and data mining policy
message boards or newsgroups are often archived on
the Internet. Several ethical theories can be used to guide data
This type of tracking and information gathering is collection and data mining techniques and practices.
perfectly legal at the moment. The Internet is treated A policy that arises from data mining and customer pri-
as a public sphere and these companies claim that they vacy includes relationships with and responsibility to-
are doing nothing wrong. However, ethical issues arise ward customers. While consumers are generally con-
because of the secretive or deceptive way in which the cerned with making tradeoffs between privacy and in-
practices are carried out. Companies like DoubleClick dividual security, control of personal information, and
claim that their data gathering is beneficial to the con- mere convenience, companies on occasion make trade-
sumer because it helps tailor marketing to their in- off between customer’s privacy and company’s prof-
terests and reduces the amount of irrelevant market- itability, which in most cases; it is difficult to determine
ing and communications that they receive. This claim a definite solution to data mining practices.
seems to be contradicted by the way in which they What makes a company’s data mining policy overly
conduct their data-gathering. The DoubleClick system complex is that there are many integrated stakehold-
is designed to be transparent to the user and the user ers that will influence any ethical data mining decision
is not given the option of opting-in or asked for his and prudent stakeholder analysis is required to incor-
consent [9]. If the user manages to see the fine print porate such decision. Stakeholder analysis includes all
or DoubleClick name written discretely on a site he people that could be affected by such policy, such as
can choose to opt-out by disabling the use of cookies managers, employees, customers, stockholders, com-
through his browser software. That is, if he knows how. petitors, etc. Therefore, the ability for any company to
The majority of Internet users are still confused and effectively implement a data mining policy depends on
uneducated about these types of technologies that are the situation that has specific and unique relationships
part of the Internet. with various stakeholders, both internal and external to
Another very popular information gathering tech- the company. Not to make the policy too complicated,
nique on the Internet is that of the registration form. our recommended data mining policies are based on
Part of the Internet’s mass appeal has always been the the macro relationships that only exist between com-
idea of getting information or services for free and panies and customers; these simplified policies allow
when an individual browses the Internet he is indeed managers to find distinctive practices that match with
courted by a wide variety of businesses offering him in- their companies’ data mining applications and needs.
formation or services to draw him into their sites. How- Incorporate existing ethical theories into data min-
ever, web sites are costly to maintain and Internet users ing policies requires several assumptions. Ethical theo-
can come and go with the click of a button. In order to ries assume that companies have free choices and ratio-
ensure a return on their investments, businesses need to nal judgment to “do the right thing” by making various
establish a more permanent relationship with the users tradeoffs between achieving benefits for customers and
who visit their sites. Therefore they ask the user to fill situations in general, and reaping advantages of data
out a registration form, ostensibly for the purpose of mining applications.
providing better service to him. However, the questions Developing of a data mining policy is based on
on these forms go well beyond the realm of relevance three distinct ethical theories: utilitarian, deontolog-
to usage of the site. They seek to determine things such ical, and natural rights. Utilitarian policy is derived
as income level, lifestyle, education, interests, and pur- from a utilitarianism view which examines the conse-
chasing habits. This information can then be used by quence(s) of an action. An action might decrease util-
the company to target the individual with its own mar- ity (i.e. individual needs and happiness) for some cus-
160 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices
tomers and increase it for others. Therefore, data min- cerned about potential violations of their privacy, the
ing practices are deemed appropriate if they increase natural rights policy will lessen those consumers’ con-
the overall utility of customers. For example, Dou- cerns by focusing on specific issues regarding individ-
bleClick claim that their data gathering is beneficial to ual privacy.
the consumer because it helps to customize their in- Obviously, each policy has its virtue and drawbacks.
terests. On contrary, deontological policy does not ex- Table 1 summarized the ultimate goals of each pol-
amine the consequence(s) from data mining practices, icy and its drawback. Management is advised to con-
but advocates the duty and responsibilities that com- duct thorough stakeholder analysis before implement-
panies have to their customers. Companies need to be ing such policy. Furthermore, these policies can be as-
truthful about their actions and no matter how good the sorted to derive at a specific outcome ranging from the
consequences of their actions may be, some actions are most flexible policy (utilitarian) to most stringent pol-
always wrong. For example, when customers are not icy (natural rights).
given the opportunity to opt-out from companies’ data
mining practices, or when the companies are mining
the data for its secondary use without any consumers’ 4. Privacy, ownership, and consent
consent, deontological policy is being violated. The
third view focuses upon the basic natural rights (i.e., Three issues come into play here, privacy, owner-
the rights to life, liberty, property, privacy) of con- ship, and consent. Many consumers feel that their pri-
sumers, and companies must honor those rights. Since vacy is violated by these information-gathering prac-
personal information is accessible for data mining pur- tices. The data-gathering companies claim the infor-
poses, it may cause many consumers to be greatly con- mation they are gathering is a public good gathered in a
Table 1
Three Aspects of Data Mining Policy
Data Mining Policy Goal of Policy Drawbacks
Utilitarian Policy Maximize consumers’ utility by implementing data • Difficult to determine all the consequences (total
mining practices that optimize consequences for all utility) of data mining practices
the affected consumers
• Consumer utility is based on personal subjective
views, which vary from one person to another,
therefore, any particular data mining outcome might
decrease utility for some group of people and
increase it for others
• Difficult to identify which consumer groups will reap
the most benefits
Deontological Policy Minimize the use of data mining techniques and • Inflexible and rigid for companies to utilize data
practices by focusing on the contractual obligations mining technology to its true potential
and responsibilities that companies have for their
• Less responsive to consumer and market demand
consumers
• Consumers consent is required if the data mining
practices go beyond what is outlined in the policy
• Difficult to identify consumer groups to whom
companies have the highest obligations when it
comes to data mining practices
Natural Rights Policy The use of any data mining techniques or practices • Clear distinction must be communicated to
are prohibited if they violate consumers’ rights to consumers between their fundamental rights (i.e.,
privacy liberty, privacy, etc.) and their contractual rights (i.e.,
business agreement of personal information usage) to
avoid consumer dissatisfactions
• Nearly eliminate secondary use of data mining
information
• Consumers’ consent is required for any new data
mining algorithm to be utilized
C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 161
public sphere and that therefore privacy is not being vi- possible to allow the consumer to have the right of giv-
olated [9]. They assume the users consent to use the in- ing informed consent for each use of his data. Much of
formation gathered when the user voluntarily uses ser- the data being mined is historical and it is not clear how
vices that are monitored or fills out a form. They do not new data being collected will eventually be used [16].
make an effort to inform the user of the future uses of For this reason, even if the previous arguments of busi-
his data or to provide him with a means of opting-out ness regarding informed consent hold up with tradi-
of the practice. In the case of registration forms, many tional uses of the data, they fall apart when that data
businesses do provide a privacy policy for the user to is mined. Another issue with data mining is that even
read before submitting the form. However, these pri- if the individual only provides information that he is
vacy policies can be difficult to understand and may comfortable sharing, the inferences drawn from that
not be consistently followed by the organization. Also, data can reveal information that he does not want re-
the user doesn’t usually have much choice but to agree vealed and which may be harmful to him if it falls into
if he wants to use the services provided by the web site. the wrong hands. Most individuals would not even re-
The argument that these businesses do have the consent alize that it is possible to reveal this more sensitive in-
of the consumers is fairly weak. It is notable that these formation through inference and some would view it
companies try very hard to avoid a situation that would as an unauthorized appropriation [6].
require the consumer to explicitly opt-in because it is A hypothetical example will make this issue more
unlikely that most of them would do so [9]. The issue clear. Suppose a bank asks its customers to fill-out a
of ownership brings a sharp contradiction in the argu- routine survey of interests and lifestyle factors when
ment of business to light. They believe that privacy is they first open an account in order to tailor their ser-
not violated because the information that they are gath- vices to each customer. Later that data is mined and
ering is a public good. Yet they then turn around and several correlations are found that cause the bank to
treat it as a private good when they claim ownership of define a group of people that it classifies as a bad credit
the data and attempt to sell it [12]. risk. Then a customer who falls into this group is de-
Another concern is in the type of data being col- nied a loan because the interests and lifestyle answers
lected. Some types of personal information are seen he gave on a survey match the profile of this new group
as being more sensitive than others. What compli- of which he isn’t even aware. Or less drastically, maybe
cates this issue is that sensitivity level varies accord- a customer simply isn’t given the choice of partici-
ing to the individual [16]. One consumer may view in- pating in some new opportunity or service because he
come level as very sensitive and private but not care doesn’t match another profile constructed through data
who knows their marital status. Another consumer may mining.
have reasons to view their marital status as very sen- Here, individuals are being targeted for or denied
sitive information that might affect their employabil- services on the basis of questionable inferences drawn
ity or some other potential opportunity while not car- from questionable data without their ever being aware
ing who knows their income level. This makes it diffi- of any potential bias. Why refer to the inferences and
cult for a company to gauge customer reaction to new data as questionable? Data mining is not carried-out
uses of their data. Aside from the information collected with scientific rigor. The quality or randomness of the
publicly from sources such as the Internet, much in- original data is not strictly verified and therefore the
formation is also bought from more seemingly private significance of inferences drawn from the data must be
sources. This information can include credit history, fi- in question [7]. However, even if the data is of high
nancial information, employment history, and possibly quality and is a good random sample, the inferences
some medical information [8]. Many consumers would might still not be significant. This is because there may
be surprised to know that these types of information be other confounding factors causing a correlation to
are routinely bought. be found that are not readily apparent. An amusing il-
We have seen that there are already serious ethi- lustration of an erroneous correlation given by Math-
cal concerns with just the collection and distribution ews [7] is that of the relationship between stork pop-
of personal information in its raw form. These con- ulations and the birth rates of certain European coun-
cerns and problems are then exacerbated by the prac- tries. The countries with bigger stork populations have
tice of data mining. The purpose of data mining is to more babies. So, either storks really do deliver babies
discover new insights and new uses for the information or there is a confounding factor such as land-area that
that companies already have. This makes it nearly im- is causing this insignificant correlation.
162 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices
The question of data quality also raises the question was designed to provide small businesses with the abil-
of an individual’s control over his data. The methods ity to do targeted direct marketing by searching for cus-
used to collect the data are not perfect and even if the tomers that matched certain profiles according to data
data is correct when it is gathered, personal data is of- provided by Equifax, a credit reporting agency. Lotus
ten very time-sensitive and expires rapidly. If individ- developed the product believing that it would be a very
uals do not have the power to review and edit the data successful and significant part of their product offer-
that a company stores about them, then they may be ing. The public outcry they received at its announce-
judged on the basis of inaccurate data. This is the ba- ment came as a complete surprise and they were forced
sis for parts of the Fair Credit Reporting Act that give to scrap a significant development effort [13]. Initial
consumers the right to request copies of, and demand research and ethical considerations during the initial
corrections to, any credit reports kept about them by planning of this project might have prevented its fail-
credit reporting agencies. Allowing consumers to view ure.
the information about them that is used for data mining Aside from public opinion, government penalties for
would make the process both more ethical and more violating privacy regulations can be substantial. These
accurate. Organizations like EPIC (the Electronic Pri- types of regulations and laws vary widely by jurisdic-
vacy Information Center) believe that this type of Fair tion so it can be difficult for a large and distributed
Reporting legislation should be extended to cover more corporation to ensure compliance with all of them [8].
types of businesses that store customer profiles [5]. Some penalties can be incurred on a per record basis
However, for most companies this would be highly im- which would add up to large sums of money if data was
practical to implement. routinely handled in an improper manner.
One final issue is the security of the data collected There are many different viewpoints that have to be
and mined. If companies are going to gather and in- considered when addressing issues of privacy and eth-
fer sensitive data about individuals they must make a ical data handling. First, it will be useful to review a
reasonable effort to protect it from unauthorized ac- philosophical examination of privacy and its benefits
cess and from unethical use by employees or outside to the individual and society. Then the current views of
agents. This must be done on two levels. First, the data- government, corporations, and the general public will
base itself must be protected from access by unautho- be explored.
rized users. Second, authorized users must be granted The two main theories of privacy are the restricted
varying levels of access to the data [10]. According access and the control theories. The restricted access
to Radcliffe, “the key to passing all forms of regula- theory defines privacy as an individual’s ability to re-
tory muster is defining ‘personally identifiable infor- strict access to his personal information [15]. On the
mation’ and then limiting access to that information on other hand, the control theory defines privacy as the in-
a need to know basis” [11]. This is necessary for obey- dividual’s ability to exercise control over his informa-
ing some privacy laws as well as for meeting the pri- tion [15]. Neither theory is all-encompassing. Is an in-
vacy expectations of individuals. Restricting access to dividual’s information no longer considered to be pri-
certain objects or fields in a database to protect more vate if he loses the ability to restrict access or to exer-
sensitive data can be difficult to do effectively espe- cise control or should its privacy still be protected?
cially with data mining applications. Even if a user can Regardless of which definition of privacy one agrees
only see part of the data, he can often infer the values with, the next question is whether or not it is a public
of more sensitive fields [16]. and social good that should be protected. The general
What are the potential costs of these risks to a cor- consensus is that privacy can be both beneficial and
poration? Many companies have underestimated the detrimental to the individual and society. Some level
strength of consumer opposition to their practices and of privacy needs to be insured for all individuals in or-
been hurt by public outrage. The DoubleClick corpora- der to maintain a healthy society. Privacy allows in-
tion was forced by public outrage to abort a large deal dividuals to have freedom of thought and freedom to
to combine it’s Internet tracking profiles with another develop themselves and pursue their interests without
company’s database of names and addresses in order fear of retribution [9]. This allows them to make a more
to make the profiles identifiable with a person [9]. An- rich and diverse contribution to society and public dis-
other company, Lotus Corp., had to abandon a poten- course. However, a society could not function with to-
tially very lucrative product called Households when tal privacy. A certain limit to privacy is needed to facil-
public outrage caused their stock to fall. Households itate commerce and communication between individu-
C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 163
als. An extreme view of privacy would impede the gov- 5.1. Consider the expectations of the customers
ernments ability to enforce laws and regulations and
make it difficult for businesses to promote economic
growth [9]. Before a project is undertaken that involves the use
“In the United States there is no general right of pri- of personal data, steps should be taken to assess the
vacy” [8]. The constitution does not address the issue customers’ expectations of the company’s behavior.
and the courts do not recognize a general right to pri- Regardless of whether or not the company feels that it
vacy. Statutes that grant a legal right to privacy usually is acting in an ethical manner, it needs to predict how
result from the response of the lawmakers to a particu- the customers will perceive its actions [8]. It does not
lar issue. Because of this, privacy laws and statues tend really matter how right the company believes itself to
to be narrowly defined to apply to very specific situa- be if the customer does not agree.
tions [8]. They are also often inconsistent and limited
to a particular government jurisdiction. This makes it
difficult to assess the legality of a course of action. 5.2. Develop a customer-oriented privacy policy
Less privacy helps corporations to know and reach
their customers better. They seek to justify loss of con-
sumer privacy by making a utilitarian argument that Whenever customer data is to be collected, the cus-
the customer’s economic benefit offsets the loss of pri- tomer should be presented with a privacy policy. A sur-
vacy. By giving up some privacy and giving corpo- vey of web users found that “86% of respondents be-
rations the means to target only the consumers that lieve that participation in information-for-benefits pro-
are most likely to be interested, consumers are help- grams is a matter of individual privacy choice” [2].
ing to lower the cost of the products that they buy. As In the same survey, 82% of respondents indicated that
Kevin O’Connor, CEO of DoubleClick said in his com- having a privacy policy would affect their decision to
pany’s defense, “The less people spend on marketing, participate. This policy should not be written with the
the cheaper products are” [9]. It is also argued that the purpose of protecting the company from legal action. It
consumer benefits by receiving fewer irrelevant solici- should be written with the goal of adequately inform-
tations. ing the customer so that he can provide real consent
The average American consumer is not convinced to present and future uses of his data. To accomplish
by claims that his economic benefit will outweigh his this goal, clear and precise language should be used to
loss of privacy [9]. He is very concerned about the col- inform the customer as specifically as possible of how
lection and use of his personal information and its dis- his data will be handled and used. The customer should
tribution or sale to third parties. Unless explicit consent also be provided with information about alternative op-
is given for each use of his data he will feel that his tions or avenues of recourse if he does not wish to con-
privacy has been violated and that the company using sent to the practices outlined in the privacy policy.
it has been deceitful in its practices.
5.3. Follow the spirit of the privacy policy internally
5. Ten data mining systems development practices
The power and sensitivity of public opinion, in this Simply having a good privacy policy to show the
area, dictates that corporations act to self-regulate their customers is not enough. In order to maintain a re-
practices related to the handling and use of personal lationship of honesty and trust, the spirit of the pri-
data. Following existing laws and regulations alone vacy policy must be followed in a consistent manner
will not be enough to protect a corporation from the throughout the organization [8]. Blatantly violating the
risk of damage from a negative public perception of terms of the privacy policy will alienate customers and
their practices. This study will first discuss ten data may leave the company vulnerable to legal action. Fol-
mining systems development practices for preventing lowing the letter of the privacy policy but finding ways
these risks from materializing. Then a guideline for to circumvent the spirit of the privacy policy will not
incorporating these practices into a software develop- put the company at risk for legal action but does in-
ment life cycle will be presented. crease the risk of loss of customer trust.
164 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices
5.4. Research and understand all laws that may have 5.7. Give customers more control over their data
jurisdiction over sensitive data
If customers feel that they have control over their
Because of the wide variation in privacy laws, it data then they will feel that they retain a greater level of
is necessary to carefully identify all jurisdictions that privacy. This can be done in any of several ways. First,
may claim authority over the company’s actions. Then the customer can be given some control over what in-
the relevant legislation and precedents for each juris- formation to provide in the first place. Allow the cus-
diction should be reviewed to ascertain that none of tomer to opt-in or out of answering questions that are
them would be violated by a proposed new practice [8]. not absolutely necessary for any transactions that the
customer is participating in. Practicing full disclosure
and honesty when gathering the information is essen-
5.5. Stay current on new developments and public tial to gaining true informed consent from the customer
opinion and giving him a sense of control. Second, the cus-
tomer can be given the ability to review and correct his
Ethics are based on commonly accepted values of data. This will protect the customer from being evalu-
a society. Therefore it is necessary to monitor shifts ated on the basis of incorrect data as well as protect the
or new developments in these values. The best way to company from making bad business decisions based on
stay in touch with current opinion is to review publica- invalid inferences caused by incorrect data. Third, the
tions of consumer groups as well as industry advisory customer can be given the chance to rate the sensitiv-
groups. It is also important to follow new cases in the ity of each piece of data to address the fact that indi-
news as they appear, to see what issues are raising a lot viduals “are not equally protective of all values in their
of controversy. records” [1]. Then data that is rated above a particular
sensitivity level can be excluded from data mining op-
erations. Fourth, allow the customer to choose which
5.6. Control access to data warehouses uses of his data will be allowed or disallowed. A recent
survey found that “over 80% of web consumers did not
Protect customers by protecting their data. A com- want web sites to resell their personal information and
pany’s responsibility to its customers does not stop yet 72% would be willing to give demographic infor-
with its own intentional uses of their data. In order mation as long they were made aware of how it would
to retain customer trust and satisfaction, the data must be used” [12]. This shows how much an effort to make
also be protected from unauthorized uses. As stated the customer feel informed and in control can alleviate
above, this requires security on two levels. First the their feelings of privacy violation.
database or data warehouse itself must be physically
protected from unauthorized access using network se- 5.8. Evaluate the quality of source data
curity measures, a secure operating system, a secure
physical environment and user authentication mea- Performing data mining on inaccurate data will lead
sures [10]. Second, access to the data itself should be to inaccurate inferences and conclusions. Since the
restricted to users who need that particular data for au- data for data mining operations is often pulled from a
thorized activities. In most cases this requires some variety of internal and/or external sources it is often in-
type of Multi Level Security model (MLS) [16]. An consistent and has varying levels of accuracy. Also, the
MLS model seeks to protect sensitive data by carefully nature of personal information is that it tends to expire
defining which data fields or objects are accessible by rapidly. When basing business decisions on the results
each user or user class. Implementing MLS effectively of data mining, companies need to evaluate the qual-
can be complicated by the possibility of users inferring ity of those results by first evaluating the quality of the
the values of restricted data from the values of accessi- underlying data.
ble data. This is especially true when users have access
to data mining tools which are made to do just that. 5.9. Develop a corporate code of conduct
This is a good justification for limiting access to data
mining applications to a small group of developers and This should establish standards for acceptable prac-
users. tices and treatment of customers.
C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 165
5.10. Perform an ethical review for each new use stage would be a section of the requirements specifi-
of data cation that lists requirements that were added as a re-
sult of this research in order to address the ethical con-
The previous practices are more general measures to cerns. These requirements are to be specially identi-
maintain consistent and ethical conduct in all handling fied in a separate, high-priority section of the specifi-
of customer data. This practice applies more specifi- cation so that they are not carelessly re-prioritized or
cally to new development projects or applications of cut at a later stage of development as the project suffers
data mining. Before development on a new project is from feature-creep or time and cost pressures. With
begun, it should pass through a formal ethical review each of these requirements should be listed a brief de-
process. The purpose of this review is to identify any scription of the concern that each is meant to address
ethical concerns that the project might raise and to de- or a reference to the concern in the ethical summary
termine what potential risks there are from either neg- document. At the end of the requirements stage, an-
ative public opinion or legal action. The project should other review should be conducted. This review should
be evaluated against information gathered in Practices verify that the special requirements do adequately ad-
1, 4 and 5, and the company’s privacy policy as well dress the original ethical concerns. It should also eval-
as its corporate code of conduct. In addition to these uate the requirements specification to identify new eth-
sources, further guidance can be sought by consult- ical concerns that may be raised by any expansion of
ing codes or recommendations published by industry scope or addition of unforeseen requirements. If any
groups such as the Association for Computing Machin- new concerns are found then the process of research-
ery’s code of ethics (www.acm.org). If any risks or po- ing and formulating requirements to address these con-
tential conflicts are identified then a plan for mitigat- cerns should be repeated.
ing these risks should be developed. If the risks can not The responsibility of the design phase is to see that
be effectively mitigated then the project will probably all of the special ethical requirements are incorporated
need to be abandoned or rethought. into the product design. Additional considerations dur-
These practices should be incorporated into the data ing this stage will be mainly concerned with the de-
mining systems development lifecycle of all projects sign of the user interface for applications that collect
that will potentially manipulate customer data as well data from customers and security roles for applications
as being part of the standard operations of the corpora- that mine such data. User interfaces for customer appli-
tion. Figure 1 shows a set of recommendations for in- cations should be designed to meet standards of hon-
corporating these practices into a typical software de- esty and clarity as prescribed by the corporate code of
velopment lifecycle (SDLC). conduct. Back office applications that mine or manip-
During the project initiation and planning stage ulate customer data need to be designed with adequate
where projects are evaluated at the enterprise level, and complex security to protect the customer data from
care should be taken to identify those projects that have unauthorized use. The design of these features should
potential for ethical risk. Projects that are identified be subjected to an ethical review at the end of the de-
at this stage would then be required to follow formal sign phase. No additional deliverables will exist for
procedures for identifying and managing ethical risks this stage. The purpose of the review is simply to ap-
throughout their lifecycles. prove the final design.
Once such a project moves into its planning stage, The build phase should proceed normally with no
a preliminary ethical review should be conducted as additional steps. The deliverables of this phase will be
described under Practice 10. The added deliverable of reviewed during the testing phase. During testing, a
this stage would be a summary report from the ethical special set of test cases should be applied to verify that
review that lists ethical concerns that were identified the final product does meet the ethical requirements
and steps that will be taken to mitigate them. and conform to company standards. Non-conformance
During the requirements stage, customer expec- should be treated as a critical defect and fixed before
tations, applicable laws, and current public opinion release.
should be researched with regard to each concern listed After release of the product, a post-mortem of the
in the ethical summary report. The best way to handle project should include a review of the effectiveness
this would be to consult with actual customers or con- of the application of ethical risk management based
sumer groups as well as a corporate lawyer that spe- on customer acceptance and satisfaction with the final
cializes in these issues. The added deliverable of this product.
166 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices
Fig. 1. Software development life cycle with embedded ethical requirements.
6. Lotus households software development project agency, Equifax. The Households software on the CD-
ROM would allow the user to search for individuals
The failed Households project that was cited earlier that matched certain criteria and develop a mailing list.
will now be revisited to show how the project could The user would then purchase the right to use the mail-
have benefited from following an ethical risk man- ing list from Lotus who would supply the user with a
agement lifecycle. The purpose of Lotus’ Households key to decrypt the data. The CD-ROM did not contain
project was to provide small businesses with an inex- raw customer data. Instead it contained data that had
pensive way of gaining targeted mailing lists to market been mined from Equifax’s credit records. The result
their products and services. The final product was to be of the data mining was that each individual was given
a CD-ROM containing data on individuals that was ob- values for certain categories or criteria on which the
tained through a partnership with the credit-reporting user could then search.
C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices 167
Lotus believed that their product would provide a to give the individuals some feeling of being in con-
great service and expected a favorable response from trol. For example, a letter could have been sent to each
the public. By their standards they had acted to ade- individual announcing the intentions of the project and
quately protect the privacy of individuals by encrypt- giving the individual the opportunity to opt-out when it
ing the data on the disks, preventing searches by name, was still feasible to do so. Another option would have
disallowing the distribution of purchased mailing lists, been to send surveys to the individuals to get their in-
and limiting the types of data that was included. The put on how the product should be designed.
public was not informed of the details of the House- If Lotus had followed a lifecycle like the one rec-
holds project until an announcement was made shortly ommended in this study, they would have been made
before its scheduled release. Lotus was shocked by the aware of the potential ethical problems early in the de-
immediate public outcry and tried to take steps to reas- velopment. Ethical reviews at critical stages of devel-
sure the public of the safety of their information. They opment that followed the suggested systems develop-
offered to remove information from the disks for any- ment practices would have identified the problems and
one that called and requested to be removed. However, solutions before it was too late.
because the product was already being implemented
and the number of people calling in was in the thou-
sands, this proved to be unfeasible. Lotus was even- 7. Conclusion
tually forced to cut its losses and abandon the prod-
uct [13]. Today’s technological environment, which allows
The Households product was a sound concept that for the increasingly rapid and efficient sharing and col-
fulfilled a business need. Its failure could have been lection of information across the globe, provides new
prevented. Practices 1, 6, and 7 would have helped opportunities and new risks for corporations seeking
to prevent the public’s negative reaction. Practice 1 is to utilize customer data in new ways. The ease of data
to consider the expectations of the customers. Lotus’ manipulation through data mining makes it difficult to
shock at the public’s reaction shows that this was Lo- avoid unethical behavior when working with customer
tus’ biggest point of failure. Had they made an effort data. The public reaction to perceived misuses is then
at the beginning of the project to gauge customer reac- made even more powerful by communication channels
tion they would not have had to throw away a finished such as the Internet that provide a forum for consumers
product. This is why a comprehensive ethical review to express their discontent and influence the opinions
during the planning and requirements stages is neces- of a great many others. In this environment, negative
sary. The worst case scenario is that they simply would public opinion can have immediate and disastrous fi-
have abandoned the project before wasting money on nancial repercussions for businesses that are perceived
development. The best case scenario is that they would to be acting in an unethical manner.
have found a way to address the public’s concerns and Implementing ethical data mining systems devel-
design a product that was more acceptable. opment practices such as the ones recommended in
Practice 6 states that access to customer data should this study will benefit corporations by helping them to
be strictly controlled and protected to maintain cus- build and maintain trusting relationships with their cus-
tomer trust. In this case the public felt that Lotus had tomers. At the same time it will prevent real economic
not taken adequate precautions to ensure that the data losses resulting from negative public reaction or legal
on the disks was not misused or improperly accessed. action in response to unethical practices.
The product design should have been reviewed before
implementation to ensure that the data was protected
well enough to make the public feel reasonably secure. References
Finally, Lotus presented this product to the public
as something that was already decided and completed. [1] D. Agrawal and C.C. Aggarwal, On the design and quantifica-
tion of privacy-preserving data mining algorithms, ACM PODS
The individuals whose data was being used were never
(2001), 247–255.
given the opportunity to opt-in or opt-out, review their
[2] R. Agrawal and R. Srikant, Privacy-preserving data mining, in:
data, or to provide input on how the data was to be Proceedings of the ACM SIGMOD, 2000, pp. 439–450.
used. This violates Practice 7. Giving each individual [3] P. Bradley, J. Gehrke, R. Ramakrishnan and R. Srikant, Scaling
real control over his data may not have been feasible; mining algorithms to large databases, Communications of the
however some small measures could have been taken ACM 45(8) (2002), 38–43.
168 C. Cary et al. / Data mining: Consumer privacy, ethical policy, and systems development practices
[4] P. Brey, Disclosive computer ethics, Computers and Society [12] E. Rose, Balancing Internet marketing needs with consumer
(2000), 10–16. concerns: A property rights framework, Computers and Society
[5] J. DiSabatino, Unregulated databases hold personal data, Com- (2001), 17–21.
puterworld 36(4) (2002), 7. [13] R.A. Spinello, Chapter 5: Privacy and information access, in:
[6] J.S. Fulda, Solution to a philosophical question concerning data Case Studies in Information Technology Ethics, Prentice Hall,
mining, Computers and Society (1999), 6–7. Upper Saddle River, New Jersey, 2003, pp. 104–111.
[7] R. Matthews, If chocolate then champagne, World Link 13(2) [14] H.T. Tavani, Privacy online, Computers and Society (1999),
(2000), 12–13. 11–19.
[8] J. Montãna, Data mining: A slippery slope, Information Man- [15] H.T. Tavani and J.H. Moor, Privacy protection, control of infor-
agement Journal 35(4) (2001), 50–54. mation, and privacy-enhancing technologies, Computers and
[9] J. Morse and S. Morse, Teaching temperance to the cookie Society (2001), 6–11.
monster: Ethical challenges to data mining and direct market- [16] K. Wahlstrom and J.F. Roddick, On the impact of knowledge
ing, Business and Society Review 107(1) (2002), 76–97. discovery and data mining, in: Conferences in Research and
[10] T. Priebe and G. Pernul, Towards OLAP security design – sur- Practice of Information Technology, 2001, pp. 1–12.
vey and research issues, ACM DOLAP (2000), 33–40.
[11] D. Radcliff, Guarding the data warehouse gate, Computerworld
35(4) (2001), 44–45.