The proposed algorithm quantifies the susceptibility of each attribute present in user’s dataset to effectively preserve users’ community privacy. This paper presents a novel data anonymization algorithm that significantly improves users’ community privacy without sacrificing the guarantees on anonymous data utility in publishing data. Most of the existing privacy models ignore the impact of susceptible attributes on user’s community privacy and they mainly focus on preserving the individual privacy in the released data. More susceptible types of attributes enable multiple users’ unique identifications and sensitive information inferences more easily, and their presence in published data increases users’ community privacy risks. Each item of user attributes impacts users’ community privacy differently, and some types of attributes are highly susceptible. As a result, explicit disclosure of private information of a specific users’ community can occur from the privacy preserved published data. User attributes such as gender, age, and race, may allow an adversary to form users’ communities based on their values, and launch sensitive information inference attack subsequently. User attributes affect community (i.e., a group of people with some common properties/attributes) privacy in users’ data publishing because some attributes may expose multiple users’ identities and their associated sensitive information during published data analysis. The synthetic dataset helps to gain more datasets by combining some objects with different backgrounds. As a result, we create both a dataset both manually and synthetically. The apparatus has a specific shape and form, thus the dataset had to have a limited number. Creating datasets for chemistry apparatus is not as difficult as creating a human object. In this research, we compare the difference between creating a COCO dataset manually and creating a synthetic COCO dataset. Datasets need to be constructed and transformed correctly. Helping the model to understand datasets like humans do is one of the important processes of machine learning. The quality and quantity of the data gathered will determine how good the predictive model can be. The goal of training is to develop an accurate model that answers some questions and in order to train a model, we need to collect a dataset. The model is created from a process called training. In order to create machine learning, we need to build a model. The data generator is also highly configurable, with a sophisticated control parameter set for different “similarity/diversity” levels. A good match is obtained between the generated data and the target profiles and distributions, which is competitive with other state of the art methods. The empirical tests confirm that our approach generates a dataset which is both diverse and with a good fit to the target requirements, with a realistic modeling of noise and fitting to communities. In the following work, we present and validate an approach for populating a graph topology with synthetic data which approximates an online social network.
However, this presents a series of challenges related to generating a realistic dataset in terms of topologies, attribute values, communities, data distributions, correlations and so on. One possible solution to both of these problems is to use synthetically generated data. Two of the difficulties for data analysts of online social networks are (1) the public availability of data and (2) respecting the privacy of the users.