Retrieving Encrypted Query from Encrypted Database Depending on Symmetric Encrypted Cipher System Method

More and more data is available on database every day. The greater the amount of data in database led to create a problem in process and retrieving the required data. Security is one of the significant challenges that people are faced over the entire world in every aspect of their lives. Databases are vulnerable to attack from internal and external threats. One of security dialogues is data encryption/decryption whenever data being transmitted over communication lines may be protected by encrypting the data, which can be decrypted only by the authorized person. The retrieval from big encrypted database stills a big problem. The proposed system presents a new method used to retrieve data from (encrypted database; encrypted compressed database or encrypted dynamic clusters). These data retrieved represents the answers to the user query. In this research the retrieving from big encrypted database was processed by matching cipher query with encrypted database. The proposed system uses clustering technique to build block of data according to the encrypted user query (entries or requirements). The comparison includes the retrieving time that was required from matching plain query with plain data and cipher query with cipher data. In traditional system the retrieving was done by decrypting the entire database or decrypting part of it to find the data that matched the user query. This would be consumed too large time. The work of this paper


Introduction
Big Data concerns with large-volume, complicated, growing data sets in multiple and independent sources.Data storage and Data collection has become more complex.Big Data is now expanding quickly in all science and engineering domains including physical, biological and biomedical sciences, etc.The most essential challenge for Big Data applications is to search the large amount of data and extract useful information or knowledge for future works [1].Data Retrieval encompasses extracting the desired data from a database.The two primary forms of the retrieved data are reports and queries.To retrieve the desired data the user offer a set of criteria by a query.The ability to query and retrieve data based on some user defined criteria is a necessary feature of the data storage and retrieval subsystem [2].
The retrieved data may be stored in a file, printed, or viewed on the screen.In traditional database management systems, information retrieval is often performed using keywords contained within fields of each record [3].So, for faster retrieving the compression methods were used to compress the database files and the dynamic clustering method was used to build clusters in dynamic way, these clusters contain information about the required query data, so the retrieving data become much faster than the retrieving from original and compressed files.
The retrieving methods included restoring from plain files and restoring from encrypted files.
The study of cryptography has always had interesting research area.It is already known that security of data is the primary interest in the public network.Encryption and decryption is the process of cryptography technique which should be provided secrecy of the data over the network.In the real world there are so many organizations working on large databases over a public network, so the security is of prime concern.Encryption can be an effective process of protecting information, and is widely used for data security in many applications [4].Data compression seeks to reduce the number of bits used to store or transmit data.It encompasses a wide variety of software and hardware compression techniques which can be so unlike one another that they have little in common except that they compress data [5].Data Compression methods are divided into two types 1) lossless compression method and 2) lossy compression method [6].In this paper we used Lossless data compression techniques.In lossless data compression, the combination of data is preserved without loss any information.In this paper and k-means with maximum gain ratio was mentioned in my paper [6].This paper has presented the design and implementation of the retrieving methods that was applied on plain and encrypted files.This research is organized as follow.Section one showed the introduction, Section two presents data compression, Section three explains major clustering techniques , Section four explains major data retrieval methods, Section five shows the data security, Section six explains the proposed work, Section seven presents experiments and results and Section eight offers the conclusion.

Data Compression
Data Compression is essentially defined as a technique to reduce the size of data.There are several data compression Techniques available which are used for efficient transmission and storage of the data with less memory space [6].during compression [8].Lossy data compression accepts a certain loss of accuracy in exchange for greatly increased compression.Lossy compression proves effective when applied to graphics images and digitized voice [5].In this paper the clustering technique

Clustering Data
Data mining is the extraction of hidden predictive information from large databases, is a powerful technology with great possibility to help companies focusing on the most important information in their data warehouses [9].Clustering is data mining technique of grouping objects or data into clusters in which objects within the cluster have high similarity, but are very dissimilar to objects in the other clusters.Similarities and Dissimilarities are measured on the attribute values which describes the objects.Clustering methods are used to formulate and label the data, for data compression and model construction, for detection of outliers etc.
Common approach of all clustering methods is to find clusters center which represent each cluster.Based on the similarity metric and input vector cluster center helps in determining which cluster is nearest or most similar one.Many clustering methods have been developed and are categorized from many aspects such as partitioning methods, hierarchical methods, density methods, grid based method, and model based methods.Data set can be numeric or categorical [10].The two main types of cluster analysis methods these are the nonhierarchical, which divide a dataset of N items into M clusters, and the hierarchical, which output nested dataset in which pairs of items or clusters are successively linked.In the information retrieval (IR) field, cluster analysis has been used to create groups of documents with the goal of benefiting the efficiency and effectiveness of retrieval [11].the object and the cluster is assigned to the cluster.This process will remain continue until the criterion function meets.

K-means method
Algorithm 1 k-mean [10] Input: C: the number of cluster and D: A data set containing m objects.
Output: A set of C cluster. Begin: 1: Choose m objects randomly from dataset as the initial cluster centers; End_until In which, E is total square error of all the objects in the data cluster, xi is the vector of the i-th element of the dataset, mi is mean value of cluster Ci (x and m are both multi-dimensional).K-means is the most important clustering technique that has been used widely in the field of IR.It was grouped data objects into k clusters [12].

Data retrieval
Database is an organized collection of data.More specifically, Databases are electronic collections of information, which allows data to be easily accessed, manipulated and updated.
In other words, a database is used by an organization as a method of storing, managing and retrieving information.Each item in a database is a record and each record consists of a set of fields.The database was used to retrieve items in a list or a periodical database [13].In databases, data retrieval is the process of obtaining and extracting data from a database, based on a query provided by the user or application.It enables the fetching of data from a database in order to store it in a file, print it, viewed on the screen and/or use it within an application [14].Information retrieval (IR) is finding items (usually documents) of an unstructured nature (usually text) that meets an information need from within large collections (usually stored on computers) [15].The difference between information retrieval and data retrieval is summarized in the following table:   [17].An index for a file in a database system works in much the same way as the index in the textbook [18].Keyword Searching best used method for searching new terms (items), special words, jargon or slang.Phrase searching is a way to retrieve records containing specific phrases.A phrase search will locate only records containing the specified (inputted) words [3].Keyword query is easy and flexible because it does not require from the database user to know details about the database schema.The goal of information retrieval is to identify documents which best match user needs.While the goal of data retrieval is to identify table records which best match user needs [19].The bellow figure shows the Boolean operations.The number of keys used [21]: (1) If the sender and receiver uses same key then it is said symmetric key (or) single key (or) traditional encryption.
(2) If the sender and receiver use different keys then it is said public key encryption.
In this paper the first type (symmetric key) encryption was used to encrypt and decrypt the data by using single secret key shared between the sender and the receiver.

Retrieving from Plain Database
The retrieving of data in this case is conducted by matching plain query with plain of (database, compressed database and dynamic built clusters).The user (client) is entering plain query, the matching process run at the server with plain database and results returned to the client in the plain form.The clusters built dynamically based on the user query using dynamic clustering algorithm.The proposed system can be explained in figure (3).While not end of x do 5: Search about the required query using (keyword strategy; indexing strategy or phrase and Boolean operation strategy).

6:
Return all the records that match the required query data.7: End while.8: Else if the query not exists in the historical file then 9: While not end of the compressed file 10: Search about the required query in compressed file using (keyword strategy; indexing strategy or phrase and Boolean operation strategy).11: Open the compressed file and match the query data with data in the compressed file.12: Open new file for saving clusters that was extracted from matching the user query.13: Save user query in historical file.Because of the data when transmitted over the communication channels vulnerable to attack from the hackers, therefore the data must be protected.Database Security is the mechanism that protects the database against intentional or accidental threats.So, in this case the clusters created in dynamic way but the encrypted database was used instead of plain databases that were used in the previous case.The clusters were built based on the encrypted user queries (user entries or user requirements) in encrypted form and not in plain form.The proposed system for the current case includes several stages and these are explained in the following steps: 1. Input encrypted compressed database.
2. Return file consists of encrypted data.
3. Input the user query to the dynamic clustering algorithm.
4. Apply an encryption algorithm for the user query (using same encryption algorithm that was used in step5).
5. Analyzer: searching the encrypted user query if it was existed in the historical file or not and

Return the results
The retrieving in this case includes three phases and these are: Phase 1: the first phase in the client, At this stage, the user is entered the query and the query is encrypted using an encryption algorithm with a symmetric key Phase 2: this phase is working in the server, this phase works on matching encrypted query with encrypted database and also works on the retrieving process.
Phase 3: the third phase is working in the client; this phase includes the decryption process.The decryption is done for the retrieved data only and there is not needed to decrypt entire database.The proposed system for retrieving from encrypted database can be explained in figure (4).
In figure (4) the files (F1 and F2) represent the results of compression operation.The F1 file includes the clusters items and it was larger size than the F2 file.The F2 file contains database  Output: answers return the required records that match encrypted user query. Begin: 1: Encrypt user query by: 2: Using addition method // The same method which was used encrypt the database// 3: Add secret shared key to Encrypt the query // The key shared between client and server and it was represented same key that was used to encrypt the database// 4: The output is encrypted query let it EQ.5: Open the historical file to check if the encrypted query exists or not exists in the historical file.

6:
If the query exists in the historical file then 7: Fetch the path of the file that contains the data of the EQ and then open this file let it y.8: While not end of y do 9: Search about the required EQ using (keyword strategy; indexing strategy or phrase and Boolean operation strategy).10: Return all the records that match the required EQ data.11: End while.12: Else If the query not exists in the historical file then 13: While not end of the encrypted compressed file let it EC 14: Search about the required EQ using (keyword strategy; indexing strategy or phrase and Boolean operation strategy).15: Open EC file and match the EQ data with data in the EC file.16: Open new file for saving clusters that was extracted from matching the user EQ. 17: Save user EQ in historical file.18: End while 19: End If 20: For all returned records apply decryption operation by using the following steps: 21: Using subtraction method // The same method which was used for the encryption process// 22: Add secret shared key to decrypt the retrieved records // The key shared between client and server and it was represented same key that was used for the encryption process// 23: Return the results in plain form.24: End for 25: display the results to the user (client).End.
In algorithm (3) the steps from 1 to 4 will be in the client site and represent phase 1, the steps from 5 to 20 will be in the server site and assimilate phase 2 and the steps from 21 to 25 will

Conclusion
The data retrieval process is considered the main objective of this research and not the encryption process.So, the simple encryption operation was used to measure the performance of the data retrieving method (by matching encrypted query with encrypted data).We are in the process of application of this work using the proposed method for retrieving data with the application of one of strong encryption algorithms.Most of the operations occur on the penetration of communication channels on the outgoing records so we need encrypt the database to protect it from attackers.In conventional systems the query process from a large encrypted database needs too large time because it needed to decrypt this database as a whole or a part of it and then recovered records is encrypted and sent to the client.The proposed system improved the performance of the retrieving algorithm by decreasing the consumed time.The proposed system works on matching cipher query with encrypted database consequently gaining time during not decrypting whole database or part of database.
Therefore, the proposed system solves this problem is by sending an encrypted query, working encrypted search and returning encrypted results.The results shows that the proposed system for retrieving from big encrypted data get a good results in decreasing the consumed time for retrieving data.Dynamic clustering algorithm has been worked on improving data retrieval time in both cases of 1) The retrieving from plain database and 2) The retrieving from encrypted database.
‫المستخدم‬ .‫جدا‬ ‫كبير‬ ‫وقت‬ ‫تستهلك‬ ‫أن‬ ‫شأنه‬ ‫من‬ ‫وهذا‬ .‫مشفرة‬ ‫بيانات‬ ‫قاعدة‬ ‫من‬ ‫باالستعالم‬ ‫للمستخدمين‬ ‫يسمح‬ ‫البحث‬ ‫هذا‬ ‫العمل‬ ‫واسترجاع‬ ‫مشفرة‬ ‫بيانات‬ ‫قاعدة‬ ‫مع‬ ‫المشفر‬ ‫االستعالم‬ ‫مقارنة‬ ‫على‬ ‫يعمل‬ ‫فإنه‬ ‫ذلك،‬ ‫من‬ ‫وبدال‬ ‫البيانات،‬ ‫قاعدة‬ ‫تشفير‬ ‫فك‬ ‫دون‬ ‫شك‬ ‫في‬ ‫النتائج‬ ‫و‬ ‫مشفر.‬‫ل‬ ‫التشفير‬ ‫عملية‬ ‫وليس‬ ‫البحث‬ ‫هذا‬ ‫من‬ ‫الرئيسي‬ ‫الهدف‬ ‫هي‬ ‫البيانات‬ ‫استرجاع‬ ‫عملية‬ ‫تعتبر‬ .‫تم‬ ‫لذلك،‬ ‫البيانات‬ ‫مع‬ ‫المشفرة‬ ‫االستعالم‬ ‫مطابقة‬ ‫طريق‬ ‫(عن‬ ‫البيانات‬ ‫استرجاع‬ ‫طريقة‬ ‫أداء‬ ‫لقياس‬ ‫بسيطة‬ ‫تشفير‬ ‫عملية‬ ‫استخدام‬ ‫المشفرة).‬‫المفتاحية:‬ ‫الكلمات‬ ، ‫العنقدة‬ ICM ‫ال‬ ، ‫مشفر‬ ‫استعالم‬ ‫مطابقة‬ ، ‫التشفير‬ ‫فك‬ ، ‫تشفير‬ ‫مشفرة.‬‫بيانات‬ ‫قاعدة‬ ‫مع‬ results is used to build the dynamic clusters.The Data Mining is defined as an extraction of hidden information from large databases.It has large possibility helps the Libraries and information centers to focus on most important information in their data warehouse.There are several techniques for data mining these are: 1) classification 2) clustering 3) prediction (regression) 4) decision trees 5) sequential patterns and 6) association rules [7].In this paper the clustering technique was used to solve the problem of accessing big data.the modified k-means clustering methods and its variants (k-means with medium probability and k-means with maximum gain ratio) was used to build clusters depending on special centers, also the dynamic clustering methods was used to build small clusters depending on the user query.The ICM, modified k-means, k-means with medium probability Data Compression technique takes the advantage of repetition series of data in order to provide a potential cost saving associated with transmitting less amount of data, reduces storage requirement and reduces the probability of transmission errors.Data compression techniques are divided into two main classes.Those are (i) Lossless data Compression and (ii) Lossy data compression.In lossless data compression, the compression Process is carried out without loss of data or Information

(
Improved K-means, K-means With Medium Probability and K-means With Maximum Gain Ratio) algorithms were used as lossless compression algorithm and the results have been used to build the dynamic clusters in plain and cipher forms.
: centroid based method K-means is one of the most commonly used clustering techniques due to its simplicity and speed.The k-means method takes the input parameter, k, and partitions the data into k clusters by assigning each object to its closest cluster centroid (the mean value of the variables for all objects in that particular cluster) based on the distance measure used.The k-means method work as follows.Randomly k objects are selected; each object represents a cluster mean or center.Object which is most similar or close to cluster mean based on the distance between

2 :
Until there are no changes in the mean values 3: Use the estimated means to classify m objects into k clusters based on similarity measured (Calculate mean value of the objects for each cluster i and make replacing old mean Data retrieval typically requests writing and executing data retrieval or extraction commands or queries on a database based on the query provided by the user.The retrieval process has been begun with the user entering a query.The query entered by the user can be a one word or it can be a sentence[16].Searching strategies includes: keyword and subject searching, Boolean operators, truncation, phrase searching, search limiting, and nesting [3].Boolean searching is a method based on logic.Logical conditions return a Boolean result based on an expression supplied by the user.Most online databases and internet search engines based on Boolean searches.The Boolean operators AND, OR, NOT (or AND NOT).Using AND narrow your search.It retrieves records that contain both of the search items or keywords that Ghassan H. Abdul-Majeed Alaa Kadhim F. Rasha Subhi Ali 181 Vol: 13 No:1 , January 2017 DOI : http://dx.doi.org/10.24237/djps.1301.103CP-ISSN: 2222-8373 E-ISSN: 2518-9255 you specify.Using OR expand your search.It retrieves records that contain either of the search items (terms) or keywords that you specify, but not necessarily both.Using NOT narrows the search.It retrieves records that do not contain a search item (term) in your search

Figure 1 :
Figure 1: Boolean OperationThe Boolean operations, keyword, indexing and phrase searching methods are used in this paper for purpose of data retrieving.

Figure 3 :
Figure 3: The structure of dynamic clustering method on plain database file

16 :
Return the results to the user (client).Database consists of important information used by enterprises, companies, persons ....etc.
such as (names of columns, data types of columns,….Etc) and also includes clusters centers.If the encrypted query was not found in the historical file then the searching process would be done.The searching process includes matching the encrypted query with the encrypted compressed database file consequently generating new clusters dynamically.These clusters includes the records have a relation with the encrypted query, the encrypted query would be added to the historical file and the encrypted records are sent to the client.User at the client can do decryption process to the received records.If the encrypted query was found in the historical file then the matching process would be worked directly between the encrypted query and the records which were included in the clusters were generated dynamically.Of course these clusters created using dynamic clustering algorithm.The results would be sent to the client in the encrypted form.The decryption process would be done at the client site using decryption algorithm with shared key.The retrieving from encrypted data steps can be explained in the algorithm3.

Algorithm 3 :
Figure 4: scenario of encrypted query from encrypted database client site and represent phase 3.In this research the Unicode conversion for the characters was used instead of using character code.It was used because it was taken more range than character code.The encryption process for the query in the client site and the decryption process to the returned results at the client site depended on the used encryption method.Always the searching in encrypted database consumes big time because of it needed to decryption either all the database or some database columns and this problem was solved by using cipher with cipher matching.The following figures showing a comparison between these two methods.

Figure 5 :Figure 6 :
Figure 5: The traditional searching method on encrypted database

Figure 7 :Figure 8 :Figure 9 :Figure 10 :
Figure 7: Retrieval Time in Both Cases of Plain and Encrypted data Figures(8,9 and 10)  showed total time that is consumed for answering the query from dynamic clustering results much faster than retrieving from original or compressed databases.This is for both cases of matching plain with plain data or cipher with cipher data.The results showed that: 1) the larger retrieving time from the original database file is 36.865seconds and the smaller retrieving time is 0.097 seconds, 2) the larger retrieving time from the compressed database file is 110.853seconds and the smaller retrieving time is 0.0001 seconds and 3) the larger retrieving time from the dynamic clustering file is 2.620 seconds and the smaller retrieving time is 0.0001 seconds.These three points in the case of retrieving data by matching plain with plain data.The results of comparison cipher with cipher showed that: 1) the larger retrieving time from the original database file is 40.291seconds and the smaller retrieving time is 0.102 seconds, 2) the larger retrieving time from the compressed database file is 104.821seconds and the smaller retrieving time is 0.001 seconds and 3) the larger retrieving time from the dynamic clustering file is 2.618 seconds and the smaller retrieving time is 0.0001 seconds.The results of comparison cipher with cipher with decoding time showed that: 1) the larger retrieving time from the original database file is 40.687seconds and the smaller retrieving time is 0.1021 seconds, 2) the larger retrieving time from the compressed database file is 107.47 seconds and the smaller retrieving time is 0.0011 seconds and 3) the larger retrieving time from the dynamic clustering file is 5.267seconds and the smaller retrieving time is 0.0002 seconds.The average time that was consumed for retrieving data of 35 queries is shown below:Table (4) The average retrieval time