Friday 23 October 2015

Privacy-Preserving Multi-Keyword Search in Information Networks

Abstract
In emerging information networks, it is crucially important to provide efficient search on distributed documents while preserving their owners’ privacy, for which privacy preserving indexes or PPI presents a possible solution. An understudied problem for the PPI techniques is how to provide differentiated privacy preservation in the presence of multi-keyword document search. The differentiation is necessary as terms and phrases bear innate differences in their semantic meanings.
In this paper we present -MPPI, the first work to provide the distributed document search with quantitatively differentiated privacy preservation. In the design of -MPPI, we identified a suite of challenging problems and proposed novel solutions. For one, we formulated the quantitative privacy computation as an optimization problem that strikes a balance between privacy preservation and search efficiency. We also addressed the challenging problem of secure -MPPI construction in the multi domain information network which lacks mutual trusts between domains. Towards a secure -MPPI construction with practically acceptable performance, we proposed to optimize the performance of secure multi-party computations by making a novel use of secret sharing. We implemented the -MPPI construction protocol with a functioning prototype. We conducted extensive experiments to evaluate the prototype’s effectiveness and efficiency based on a real-world dataset.
Aim
The aim is to provide differentiated privacy preservation in the presence of multi-keyword document search.
Scope
To implement the -MPPI, a new PPI abstraction which can quantitatively control the privacy leakage for multi-keyword document search.
Existing system
Secure Indexing on Untrusted Servers
The existing system is data indexing in P2P networks. Those P2P indices are built on top of and distributed to Distributed Hash Tables (or DHT).
Privacy Definitions for Anonymization
 Publishing public-use data about individuals without revealing sensitive information has received a lot of research attentions in the last decade. Various privacy definitions have been proposed and gained popularity, including k-anonymity, l-diversity,and differential privacy. In particular, in a k-anonymized dataset, each record is indistinguishable from at least k−1 other records. This idea is applied in the PPI setting; most existing PPI uses the grouping notion to make servers k-anonymized in the public-use PPI. We propose a non-grouping  - MPPI which demonstrates the promise for better quality of privacy preservation.  -MPPI utilizes a new privacy definition,  -PHRASE-PRIVACY, to particularly address the privacy with multi-term document searches. The most relevant privacy definition to our  -PHRASE-PRIVACY degree is r-confidentiality which also addresses the privacy preservation of a PPI system for public use. However, r-confidentiality does not particularly consider the case of multi-term phrases.
Disadvantages
·      Existing work focuses on the single-term phrase protection.
·      In the age of cloud computing, data users, while enjoying a multitude of benefits from the cloud (e.g. cost effectiveness and data availability), are simultaneously reluctant or even resilient to use the clouds, as they lose data control.
Proposed System
This project -MPPI currently assumes a centralized entity for index serving, it is straightforward to extend -MPPI’s architecture to a P2P network;  -MPPI can be served as a P2P index if a DHT structure is imposed on the information network which achieves better load balancing and scalability.This project proposes  -MPPI for multi-term phrase publication with quantitative privacy control in emerging information networks. We propose several practical approaches for the secure construction of an  -MPPI system in an environment without mutual trusts, while being able to provide the multi-term privacy. For practical performance of secure computations, we propose an MPC-reduction technique based on the efficient use of secret sharing schemes. We also discovered a common-term vulnerability and proposed a term-mixing solution. Through both simulation-based and real experiments,
Advantages
Comparing to existing work on secure data serving in the cloud the PPI scheme is unique in the sense that
1) Data is stored in plain-text (i.e. without encryption) in the PPI server, which makes it possible for efficient and scalable data serving with rich functionality. Without use of encryption, PPI preserves user privacy by adding noises to obscure the sensitive ground truth information.
2) Only coarse-grained information (e.g. the possession of a searched phrase by an owner) is stored in the PPI server, while the original content which is private is still maintained and protected in the personal servers, under the user-specified access control rules.
System Architecture
The PPI System

 

SYSTEM CONFIGURATION


Hardware Requirements
  • Speed                  -    1.1 Ghz
  • Processor              -    Pentium IV
  • RAM                    -    512 MB (min)
  • Hard Disk            -    40 GB
  • Key Board                    -    Standard Windows Keyboard
  • Mouse                  -    Two or Three Button Mouse
  • Monitor                -     LCD/LED
 Software requirements
  • Operating System              : Windows 7             
  •  Front End                           : ASP.Net and C#
  • Database                             : MSSQL
  • Tool                                    : Microsoft Visual studio

References
Yuzhe Tang, Ling Liu “PRIVACY-PRESERVING MULTI-KEYWORD SEARCH IN INFORMATION NETWORKS”, IEEE Transactions on Knowledge and Data Engineering (Volume: PP,  Issue: 99 ) March 2015.

No comments:

Post a Comment