Design and Development of Privacy Preserving Techniques for Data Stream Mining by Pareshkumar Mahendrabhai Solanki
Material type:
- TT000070 SOL
Item type | Current library | Collection | Call number | Status | Date due | Barcode | Item holds | |
---|---|---|---|---|---|---|---|---|
![]() |
NIMA Knowledge Centre | Reference | TT000070 SOL (Browse shelf(Opens below)) | Not For Loan | TT000070 | |||
![]() |
NIMA Knowledge Centre | Reference | TT000070 SOL (Browse shelf(Opens below)) | Not For Loan | TT000070-1 | |||
![]() |
NIMA Knowledge Centre | Reference | TT000070 SOL (Browse shelf(Opens below)) | Not For Loan | TT000070-2 |
Guided by: Dr. Sanjay Garg With Synopsis and CD 11EXTPHDE58
ABSTRACT:
Data mining is the crucial field of pulling out information from bulky dataset with
diverse application areas such as healthcare, banking and financial, telecommunication,
shopping records, personal data and so on. These applications frequently
produce huge volume of data which is stored statically and dynamically in the
available network. The mined statistics can be in the form of clusters, patterns,
rules and classification. Distribution of such data is demonstrated to be advantageous
for data mining application. This dataset frequently encompasses classifiable
information individually and consequently freeing such data may result in
privacy breaches. Preserving privacy while delivering data is a fundamental study
area in data security and also it is a major issue in delivering individual exact sensitive
information. Efficient preservation of data proprietor’s privacy is a crucial
issue while broadcasting the data for analysis purpose. As per our knowledge,
dataset is an essential asset for industry in order to take a decision by examining
it. In order to distribute the data along side preserving privacy, the data proprietor
must come up with a result which accomplishes the double goal of privacy
preservation as well as accuracy of data mining task, mostly clustering and classification.
Data mining can be valuable in many applications, but due to insufficient
protection the data may be abused for other goals. It is essential to prevent
revealing of not only the individual confidential information but also the critical
knowledge. Generally, data proprietors do not find it safe to publish datasets for
mining purpose because of their worry that releasing of data may compromise
an individual’s private information. Perturbation and Anonymizing datasets before
releasing overcomes such a fear as it guarantees secrecy of personal information.
But, protecting personal information and achieving mining results as close
as that of with original datasets poses great challenges. The Proposed research work tries to find out solutions for this growing concern. Several algorithms have
been proposed that understand the characteristics of the dataset and perturb either
sensitive attribute values or keep sensitive attribute’s values unchanged and
anonymized quasi-identifier’s values. Various data perturbation and anonymization
based algorithms proposed so far have focused mainly on static data and very
few are on data streams. Heuristic based data perturbation has been proposed
where privacy has been maximized through computed tuple values for each instance
and user define sensitive drift with minimum information loss. Proposed
algorithm has been evaluated to measure information gain and to achieve privacy.
Many datasets contain multiple sensitive attributes so, there is a need to provide
perturbation and anonymization to preserve the privacy. Based on this concern,
the research work is also carried out for detail analysis of data anonymization
alternatives and proposed heuristic based PRIVACYearn based multi-iterative kanonymization
and perturbation approach in data stream. This approach also proposes
to find out the best fit generalization that leads to minimum loss of information
and better protection of individual’s privacy. Finally, we have proposed
heuristic based geometric data perturbation in data stream. Developed algorithms
for data perturbation and anonymization have been tested using wide range of
standard datasets over frequently used mining algorithms like, K-Mean clustering
and Naive Bayes classification.
There are no comments on this title.