الفهرس | Only 14 pages are availabe for public view |
Abstract The advancement of information technologies has enabled various organizations (e.g., census agencies, hospitals) to collect large volumes of sensitive personal data (e.g., census data, medical records. Data in its original form, however, typically contains sensitive information about individuals, and publishing such data will violate individual privacy and poses potential privacy risks. It is a major concern when sharing or publishing the data between one to many sources for research purpose and data analysis. Sensitive information of data owners must be protected. To deal with these privacy issues, data must be anonymized so that no sensitive information about individuals can be disclosed from published data while data distortion is minimized to ensure the usefulness of data in practice. Currently, the large number of data publishing models and methods have been proposed in order to protect personal privacy and security. Researchers have proposed new methods, namely k-anonymity, ℓ- diversity, t-closeness for data privacy. The k-anonymity privacy requirement for publishing microdata requires that each equivalence class (i.e., a set of records that are indistinguishable from each other with respect to certain “identifying” attributes) contains at least k records. Recently, several researchers have recognized that k-anonymity cannot prevent attribute disclosure. The method of ℓ-diversity has been proposed to address this; ℓ-diversity requires that each equivalence class have at least ℓ well-represented value for each sensitive attribute. However, a major drawback of these techniques that they cannot prevent the similarity attack on the data privacy because they did not consider the semantic relation between the sensitive attributes of the data. This thesis presents an extensive study of this problem. It focus primarily on notions of anonymity that are defined with respect to individual identity, or with respect to the value of a sensitive attribute. In this thesis, a semantic anonymization approach is proposed. This approach is based on the Domain based on semantic rules and the data owner rules to overcome the similarity attacks. It cap the belief of an adversary inferring a sensitive value in a published data set to as high as that of an inference based on the relationship between sensitive data. The semantic meaning is that when an adversary sees a record in a published data set, s/he will have a lower confidence that the record belongs to a victim than not. Finally, the traditional model and semantic anonymization model performances are tested by measuring the information loss, utility metrics and privacy level. In these situations, the data distributor is often faced with a quandary: on one hand, it is important to protect the anonymity and personal information of individuals. While one the other hand, it is also important to preserve the utility of the data for research. The simulation results of the proposed model show a significant enhancement in terms of privacy level but the data utility is decreased. |