Quantifying the Data Currency’s Impact on the Profit Made by Data Brokers in the Internet of Things Based Data Marketplace

— With the widespread use of the Internet of Things, a large volume of data is generated through the Internet. Businesses rely heavily on the generated data, especially personal data, to improve their services or create new revenue streams. Data brokers collect, analyze, and sell this data to those businesses (also known as data consumers), creating a personal data trading ecosystem. Many deem the current personal data marketplace untrustworthy with the lack of transparency in the relationship between data providers and data brokers/consumers. Also, existing personal data quality models used by these trading models do not accurately reflect the characteristics of the real-world situation represented by the data. That is why creating a modified trading model incorporating key data quality metrics in the data pricing process is crucial, as high-quality data is more valuable to businesses. Outdated data is one of the most common data quality defects, leading to loss of potential customers, wasted budgets, and reduced customer satisfaction. The data trading model should also facilitate some degrees of controls to data providers while maximizing the data brokers profits. This study provides a data trading model that includes the willingness to sell of data providers, the willingness to buy of data consumers, and a data quality model that incorporates data currency as a key metric. The urgency of using such a data quality model comes from personal data being liable to temporal decline and the need to build a trustworthy ecosystem. The study uses experimental quantitative research and existing personal datasets to show the impact of data currency on the profit of data brokers in the IoT based data marketplace.


I. INTRODUCTION
The Internet of Things (IoT) has emerged as a disruptive technology that generates large volume of data, especially personal data, driving the attention of service providers. The IoT based data marketplace is the go-to ecosystem for obtaining personal data for most organizations. In the IoT based data marketplace, data brokers collect, analyze, and sell personal data (data about data providers) to data consumers (service providers) to help them improve their service quality or create new revenue streams.
A study by Elvy [1] shows that people are interested in valuing their privacy with proper incentives or benefits and this concept is also known as the willingness-to-sell (WTS) of personal data. The study shows that individuals are willing to share their data for some incentives or improved service quality. A data trading model that satisfies all stakeholders' needs should include the willingness to sell of data providers, willingness to buy of data consumers, and an accurate personal data quality assessment method. The different dimensions of data quality have been intensively discussed in much research as it is crucial to the accuracy of decisionmaking systems. As Heinrich et al. [2] stated, making informed and effective decisions depends on the quality of the underlying data. A Gartner report indicates that the average financial loss from poor data quality amounts to $9.7 million per year [3], which indicates the importance of having an optimal data quality approach in pricing datasets.
According to Cichy et al. [4], the three most important dimensions of personal data quality are completeness, accuracy, and currency. Completeness refers the level to which data are sufficient in describing their real worldcounterpart while accuracy describes the extent to which data are correct. Data currency, the metric of our interest, refers to the degree to which the data is current with the world it is modeling.
In our dynamic society, over 66% of personal information stored in a database is outdated at the end of each [5]. That is why outdated data has been one of the most common quality defects, leading to huge financial losses. Existing studies on personal data trading models do not include data currency as a determining factor of the quality of personal data. That is why building a data trading model that incorporates data currency with the two other dimensions would be an excellent opportunity to meet the data consumers needs while maximizing the data brokers' profit. This study is designed to analyze the impact of data currency on the profit of data brokers in the IoT based data market ecosystem.

II. RELATED WORKS
Many studies have been conducted to create a trustworthy data trading ecosystem. While some studies focus on the technologies other studies focus on the data consumption side of the spectrum such as privacy and security challenges, data trading schemes, and data quality assessment. Also, prior studies have focused on maximizing the profit of data brokers while meeting the needs of other data consumers, providing different data trading market such as oligopoly and strong competitive market.
This chapter provides a review of literature or related works that represent the foundation of this study, starting with a definition and evolution IoT.

A. Evolution of the Internet of Things
While the term Internet of Things (IoT) has been widely used, there is no standard definition or understanding of what it encompasses [6]. The concept of IoT was initially introduced in 1999 by Kevin Ashton. Ashton referred to it as a uniquely identifiable inter-operable connected object with Radio-Frequency Identification (RFID). While the internet relies on data created by people, IoT is about data created by things [7]. The adoption of IoT devices is widespread across all industries, including health, military, finances, and food [8]. For instance, modern cars have autonomous features that let them interact with their surroundings in real-time, and home appliances have embedded systems for remote control and improved decision-making. In the health industry, the growing disproportionate ratio of medical professionals per number of patients is being resolved with the large adoption of IoT technologies [9]. A Gartner report anticipated that by 2020, IoT device adoption would reach 25 billion devices, which represents a five-fold increase compared to 2015 [10]. That's why IoT as emerged as disruptive technology to collect large volume of personal data.

B. Data-Driven Decision Making
Improvements in data collection, storage, and processing capabilities have created a new land of opportunities for firms in the past decade [11]. Whether to gain competitive advantage or create new streams of revenue, data-driven decision making is crucial in the success of businesses in this modern world. Much research suggests that data-driven decisions are associated with better performance in different industries [12]. Other studies have analyzed the magnitude of the relationship, whether data-driven decision making can be a source of competitive advantage.
Building data-driven decision-making systems help make decisions based on facts instead of biases. That's why Business boards usually rely on them to make objective and informed decisions. Some of these decisions include, but are not limited to, driving profits and sales, establishing good management behavior, optimizing operations, and improving team performance [13]. These decision support systems are powered by Artificial Intelligence (AI) solutions. AI is usually referred to as the ability of a machine to learn, process, and perform human-like tasks. According to Daugherty et al. [14]. AI-enabled systems have grown exponentially since the emergence of the Internet of Things [15] According to a survey conducted by Garner technology [16]. AI is listed as the number one strategic technology with its ability to facilitate decision-making, reinvent business models and ecosystems, reshape customer experience management.

C. IoT Security, Privacy, and Trust
IoT is perceived today as the most influential emerging technology. However, there is an increasing concern about IoT and privacy, security, and trust. Indeed, many of these privacy and security concerns have still not been raised and resolved formally despite the increasing number of connected devices. Existing studies show that sone of the main roadblocks to IoT growth are its security and privacy concerns [17]. In a research result, Debnath et al. [18] stated that privacy and security concerns have impacted the intended use of IoT. Sicari et al. [19] stated that the security challenges in IoT are access control, privacy, policy enforcement, confidentiality, trust, secure middleware, mobile security, and authentication. Younan et al. [20] studied IoT practices and several challenges of IoT adoption, such as data management, data mining, privacy and security, and provided recommendations to fully leverage these technologies. All the research findings tend to draw the general conclusion that it is challenging to anonymize and secure IoT, thus creating privacy concerns. Conversely, a survey conducted Ponemon institute shows that IoT owners are willing to trade their personal data for some insensitive [21].
Moreover, the concept of trust in technology is complicated to dissect when no formal consensus exists in the information systems literature. Trust can be associated with source reputation and reliability [19]. According to Misura and Zagar [22], application trustworthiness can be evaluated quantitatively by the similarity between the behavior users expect from an application and the actual behavior. It describes how users feel while interacting with the IoT device or application. The lack of transparency in the collection and usage of personal data can create an untrustworthy ecosystem, affecting the adoption of IoT technologies. Because of this, trust management is essential to the Internet of Things (IoT) to ensure accurate data fusion and mining, competent services with context-aware intelligence, and improved user privacy and data security.

D. IoT based Data Marketplace and Trading Models
Data is fast becoming the most valuable resource globally, outclassing oil and its substitutes. Today, more personal data rhymes with more revenue, better service quality, and costeffective services. The role of a data broker is to facilitate the collection of the and make it accessible to data consumers. This is done after acquiring data from data providers. Although there is not enough transparency on how the data is collected and why it is collected, data producers or IoT device owners are willing to trade away their information in exchange for money or improved services. This is usually done through a contract or privacy and security agreement. However, many data producers believe that the data marketplace is not trustworthy because of the lack of transparency on how the data usage. A survey conducted by Benndorf et al. [24] shows that incentivized settings yielded a higher number of people willing to trade their personal data. According to Elvy [1], many data producers are willing to sell their personal data if this transparency issue is resolved.
Existing studies have mainly focused on the relationship between data brokers and data consumers, focusing on the willingness-to-buy of data consumers. This is mainly because the marketplace is primarily controlled by data brokers and data consumers [24]. However, data providers can play a major role in facilitating the existing transactions with a lot more transparency to build a trustworthy data ecosystem. Liang et al. [25] proposed different data marketplaces, such as monopoly, strong competition, and oligopoly. Yu et al. [27] proposed mobile data trading to trade data as a quantity between mobile users based on the demand and demand uncertainty. Zhao et al. [26] proposed a data privacyprotection trading market based on blockchain technologies for better availability, with no single point of failure. Similarly, Al-Fagih et al. [28] proposed a data marketplace with a well-defined pricing scheme for the public sensingframework, considering the quality of services and trust factors.
Additionally, Jang et al. [29] modeled a data market in the IoT environment, where data sources are varied and independent. Oh et al. [30] proposed a personal data trading model to help data traders maximize their profits, using the WTS and WTB of the data provider and data consumer, respectively. This trading model is relying on a subscriptionbased market. We have devised a WTS and WTB formula and derived the trading profit by subtracting the WTB from the WTS.
Nakamato et al. [31] have proposed frameworks that rely on the traditional centralized brokered approach, also known as client-server models. In this client-server model approach, the server provides all necessary functionality of the marketplace, such as storage, search mechanism, and device registry; and the client registers to it to publish their devices or queries to facilitate matchmaking or trading. However, there are some concerns with this client-server model approach because it is expensive to maintain, and it is challenging to create a versatile cloud environment to support different IoT based Data formats. Gupta et al. [32] proposed a decentralized marketplace that takes into consideration heterogeneity and a variety of data generated from IoT devices. This decentralized model used a Peer-2-Peer (P2P) type of communication model in which all computations and storage needs are distributed across multiple IoT devices. It is based on blockchain technology with a focus on decentralization, immutability, and security. Bajoudah and Missier [33] have also conducted similar studies on the IoT blockchain data marketplace. They proposed a smart contract between data consumer and broker and a published-subscribe architecture, respectively. Their proposed framework was quite limited because they did not solve the problem of service discovery, reliability, and scalability.

E. Personal Data Quality Assessment
The accuracy of a prediction or decision system depends on the data quality. It is essential to assess and analyze data quality to make informed and effective decisions. That is why a comprehensive analysis of big data quality standards and quality assessment methods is necessary for a highly datadriven market. Data quality refers to "the measure of the agreement between the data views presented by an information system and that same data in the real world" [13], [34]. Al-Salim et al. [36] broke down data quality into a multi-dimensional construct, including accuracy, completeness, consistency, and currency.
Each dimension provides a different perspective on data quality. Many researchers have developed other metrics for a quantitative assessment of each dimension. For instance, Hinrichs [37] provided a data quality metric model to assess the correctness of data w, as (1).
where wm is the corresponding real-world value and d a domain specific distance measure. Chen et al. [38] provides a data consistency assessment function (DCAF) to evaluate the consistency of a dataset. The function utilizes basic expansion equations either based on a generalized inverse on the shape information or a polynomial approach. Hinrichs [37] provides a data currency assessment model based on the mean attribute update frequency denoting how often the attribute values are updated on average within a specific period, and the age of attribute value (also known as the age between the assessment of currency and the acquisition or update of the attribute value).
Therefore, the data quality metric models provide a framework for the measurement with respect to each dimension and are represented by a unique metric value. Decision-makers can use well-founded data quality metrics as an indicator of the reliability of the underlying data or their decision-making systems.

III. PROBLEM STATEMENT, HYPOTHESIS STATEMENT, AND RESEARCH QUESTIONS
Currently, most of existing personal data trading models do not reflect our dynamic society, where personal data are subject to change at high frequencies. It has become more challenging to meet data consumers' needs in terms of personal data quality. One of the main characteristics of the generated data is that it's liable to temporal decline [40]. That is why over 66% of personal data collected is classified as outdated yearly and some businesses cannot rely on them to support their decision-making systems [5]. A report shows that 75% of businesses have made bad decisions due to incorrect or outdated data, which is a major concern in today's competitive market. That is why personal data trading models should incorporate data currency as a critical metric in their data quality assessment. A dataset that is current with the world it is modeling should have higher values than old data (not necessarily outdated).

A. Problem Statement
The problem is that without considering data currency, many data-driven business decisions cannot serve the business needs.

B. Hypothesis Statement
The data currency impacts the profit of data brokers when trading personal data in the IoT based data marketplace.

C. Research Question
How does data currency impact the profit of data brokers when trading personal data in the IoT based data marketplace?

A. Method
In this study, we attempt to establish a cause-effect relationship from one variable (independent) to another (dependent) where the value of the independent variable can influence or change the dependent variable. The independent variable is the data currency, which is a value between 0 and 1, defining the assessed value of the age of the personal dataset (a value of 1 denoting current dataset). The personal dataset used in this study was collected between 2010 and 2017. That is why the data currency was assessed during that time frame using 2017 as the current date. Other data quality metrics like data correlation and completeness are derived from the dataset metadata. On the other hand, the dependent variable is the profit of data brokers after trading a specific dataset. This approach allows us to influence and change data currency and observe the impact on the data broker's profits.
To analyze the impact of data currency on the profit of data brokers, we compare two different trading models' values and objectively determine the most profitable one for data brokers. The trading model used here is a variation of the model proposed by Oh et al. [30] that incorporates the willingness-to-sell of data providers, the willingness-to-buy of data consumers, and a modified data quality model.
According to Oh et al. [30], the WTS is the limit selling price at which a data provider decides to sell the specific dataset, which can be expressed as a function of data privacy and data cost. Because all individuals have their different opinions regarding the value of their data based on the data type, the willingness-to-sell function is a cumulative distribution function of the limit selling price Xk of k th personal data type, that is ( " ) = { " ≤ " }, which is expressed as (2).
In this function, " is the privacy-awareness factor and ck is the cost of the k th data type. Oh et al. [30] show that the willingness-to-sell increases for less private data and more money offered. Hence, the privacy-awareness factor influences data providers' willingness to sell.
The WTB is the limit buying price Y to which a data consumer decides to buy the personal dataset from the data broker. The willingness-to-buy decreases when data price increases as a function of data quality. Also, a data consumer will not buy the dataset for any price above that Y value. Hence, the willingness-to-buy function is defined as The willingness to buy function is inversely proportional to limit buying price and proportional to the personal data quality and given by (3).
where & is the limit buying price and is the personal data quality. Function 3 shows that the willingness to buy decreases as the limit buying price increases; and it increases as data quality factor increases.
Finally, the existing data quality models are based on data correlation and quantity. The quantity refers to the size and completeness of the data; the data correlation is how well personal datasets can be combined to help identify an individual through additional processing. This research adds a third personal data quality metric, data currency to the existing model. The data currency refers to the degree to which the data is current with the world it models. It is a probabilistic variable that depends on the age of the data in year (a value of 1 denoting current data and value of 0 denoting outdated data). The previous data quality model, , is defined as a function of data correlation and completeness.
This is the simplified data quality functions proposed in previous work [29], [41] up to second order terms. In this function, N is the number of personal data providers, rij is the correlation between the i th and j th data type, and is the willingness-to-sell function (2). The modified data quality function, , is expressed as (5).
where ti and tj are the assessed currency value of the correlated data.
While data correlation is a randomly generated value between 0 and 1 with uniform distribution, the data currency value is derived using the probabilistic proposed by Hinrichs et al. [37] to assess personal data currency.
The profit of data is the result of the difference between the cost of buying data from providers and selling data to consumers. According to Oh et al. [30], the cost of buying all personal data types is shown in (6).
where N is the number of data providers and ci is the cost of the i th personal data. Similarly, the cost of selling all datasets to data consumers is (7).
where M is the number of data providers interested and & is the cost of the dataset. Hence the data profit function U is defined as (8).
Oh et al. [30] also show that the optimal price that maximizes the profit function of data brokers is the same as the personal data quality model using the first-order derivative on the revenue function (7). So, the revenue function can be replaced with the personal data quality function in (8). Also, because the profit function is concave, the maximum profit can be deduced from the first-order derivative of the profit function as (9).
The main difference between the two profit functions is the data currency factor added to the proposed model (function 8), which is critical in assessing a dataset quality. As stated by Heinrich et al. [35], the time aspect of personal data quality is key in assessing data quality in a dynamic society. Comparing the output of the two trading models will help understand the impact of data currency on the profit of data brokers in the IoT based Data marketplace.

B. Population and Sample
A publicly accessible dataset will be used to answer the research question. It is a short message service (SMS) text containing personal data from mobile phones. Also, the result of a survey conducted by Trend Micro Ponemon Institute [21] on privacy awareness will be used to evaluate the personal data trading model.
The SMS dataset is publicly provided by Kaggle, which is considered today the world's largest community of data scientists. The dataset contains over 42,000 messages collected between 2010 and 2017 from random IoT devices. 32% of data can be grouped into 12 different types, including but not limited to delivery, payment, reservation, and appointment. No null values were found in this dataset during the collection period. For this study, four different personal data types will be considered: credit card data, location, purchase history, and bookings. Also, only a fixed random number of SMS text data for each type were selected. That number is consistent across different types and refers to the number N of data providers. The same approach will be applied to find the total number M of data consumers. Also, since the number of data providers can only affect the size of the resulting values, there are no literal restrictions on the volume of parameters during the selection process.
To evaluate the data trading model, three key dimensions of data quality will be used in addition to the privacy awareness factors. The key data quality metrics are data currency, data correlation, and data quantity. The data correlation will be a randomly generated number between 0 and 1 that defines how well two personal data types can be used to identify an individual; A maximum number N = 1000 data providers, M = 200 data consumers, and K = 4 data types will be used in this study to evaluate the data size; and the data currency will be assessed using the generated decimal between 0 and 1, where 1 means the data is not more than 1 year old. We use the data currency assessment model provided by Hinrichs [37] for its simplicity and accuracy to evaluate the currency value of a dataset.
Finally, the privacy awareness factors, and personal data cost will be retrieved from a survey conducted by Trend Micro Ponemon Institute [21]. The survey was designed to learn more about consumers' concerns as they relate to data privacy and security. In addition to collecting information about their privacy and security concerns, the survey also provides information about respondents' perceptions on the value of their personal information, such as health, browser settings, purchasing habits, locations, hobbies, and payment. The study had 1903 respondents, aged 18 and above, from across the United States, Europe, and Japan.
Combining the two datasets to test the new pricing model will help us achieve our research objectives. The two datasets are still relevant to the current IoT based Data marketplace ecosystem. They both present a more realistic setting for our data trading model.

C. Analyzing The Impact of Data Currency on the Profit
The data analysis procedures entail building two separate functions: the total cost of buying data from data providers and the total cost of selling data to data consumers. The profit cost for data brokers would be the difference between the two function values.
The total cost of buying the data from data providers is based on the willingness-to-sell (WTS) of data providers and the data quality. Oh et al. [30] expressed the function as the limit selling price of a specific personal data type. These selling prices are derived from the privacy awareness survey conducted by Trend Micro Ponemon Institute [21] and represent the minimum price at which the data provider decides to sell the specific data. Because people have different perceptions of privacy per data type, the WTS is defined as a cumulative distribution function (CDF). Also, in order to evaluate personal data trading model four metrics will be included in this exercise: size, currency, correlation, and privacy awareness factor. Data quantity, correlation, and privacy awareness factors are all constants derived from existing research while the data currency will be the independent variable in this cause-effect relationship.
• Data quantity refers to the size and completeness of the data, as more data equates to better accuracy and higher chances of identifying an individual. For this research. A fixed size N = 1000 data providers and M = 200 data consumers will be used to validate the maximum profit. • Data currency refers to the degree to which the data is current with the world it models. It is expressed in years, assuming 2017 is the starting year. In other words, a dataset collected in 2017 is less than 1 year old. This decision was made because the Kaggle SMS dataset contains messages between 2010 and 2017. Thus, a currency factor will be assigned to each data. The data currency is a value between 0 and 1 and it is based on Hinrichs [37] formula shown in (10), to indicate if an attribute is till up to date. For this study, we will assume that the mean attribute update frequency, which denotes how often an attribute is updated on average over a 2year period, is 0.5 for all data types.
• Data Correlation refers to the number of personal data types and how they can be linked to identify an individual as data with high correlation are more valuable to data consumers. They have a higher level of identifiability than data with low correlation. For this evaluation, the correlation between two personal data types will be randomly generated between 0 and 1. • Privacy awareness factors are the average prices for each personal data type derived from a real-world survey conducted by Trend Micro Ponemon Institute [21]. Four (k = 4) personal data types will be used in this research: payment details/credit cards ($20.8), purchase histories ($17.8), hobbies/tastes/preferences ($9.1), and physical location ($5.1). The derived privacy awareness factor based on the average cost for each type is p = {0.033, 0.0389, 0.0762, 0.1359}, respectively. Moreover, the total cost of selling the dataset to data consumers is derived from the willingness to buy (WTB) of data consumers, which is a CDF of the limit price at which data consumers can buy the dataset.
The profit function, resulting from subtracting the total cost of buying from the total cost of buying, is then evaluated to find the maximum profit. This profit is the total revenue a data broker makes by selling personal datasets to data consumers. According to Oh et al. [30], the optimal price to maximize the revenue function is the same as the data quality value, as it can be obtained by checking the first-order derivative of the revenue function. So, the profit is can also be deduced by WTS from the resulting data quality assessment.

V. EXPERIMENT AND RESULTS
The study used the archived data of a short message service (SMS) texts from mobile phones that is available for public usage in the Kaggle Machine learning repository. The dataset contains over 42,000 messages collected between 2010 and 2017 from random IoT devices. 32% of data can be grouped into 12 different types, including but not limited to delivery, payment, reservation, appointment, Flight, Bus, and Cab. For this study, four different personal data types will be selected: payment details/credit cards, purchase histories (bookings), hobbies/tastes/preferences, and physical location. To evaluate the data trading model, we will configure a couple of parameters M, N, p, c, and r as follow: • Because the number of data providers(N) only affects the size of the resulting values, there will be relatively no restrictions on its defined value. For this study, we will set the maximum number of providers as N = 1000. Similarly, the maximum number of data providers will be set as M = 200 as it only affects the cost of selling the data to consumers. • The personal data correlation between two different personal data types r is a randomly generated number between 0 and 1 (i.e., r ∈ [0,1]). • The cost of each personal data type k, ck, is derived from a survey conducted by Trend micro and sponsored by Ponemon institute [21] on privacy awareness factors. Based on that survey, the average cost of payment details is $20.8, physical location is $5.1, hobbies and preferences is $9.5, and purchase histories is $17.8. • The privacy awareness factor p will be based on the cost of each personal data type. It is derived from the willingness to sell function, ( " ), defined as (11).
• Hence, using logarithmic function on the willingness to sell function with a value of 0.5, we can find the respective privacy awareness factors provided the cost of personal data type. Based on the cost of each personal type, the privacy factor pk is defined as p = {0.0333, 0.0762, 0.01359, 0.089} respectively for the credit cards info, hobbies, physical location, and purchase histories.
As observed from the privacy vector, the data becomes less private as the cost decreases. Finally, the data currency variable will be derived from the personal dataset using the date of the message as the age of the data. Because the dataset is a collection of SMS texts between 2010 and 2017, 2017 will be used as the reference. Hence, any data collected in 2017 is most likely to be up to date and have the highest data currency value. The data currency assessed value is deduced from Hinrichs [21] formula to evaluate how up to date an attribute is.

A. Results
Four personal data types are used to evaluate data currency's impact on data brokers' profit in the IoT based Data marketplace. Because the data trading model goes up to the second-order terms, the correlation is defined as the degree to which two data types can be used to identify a data provider through some additional processing. It is a critical metric in measuring data quality, as personal data consumers can leverage correlated data to improve their decision-making systems. For this study, it is a random value generated between 0 and 1. Table I shows a matrix of the different personal data type correlated values.

1) Personal data currency
Data currency ranges from 0 to 1, describing how current the personal dataset is current for the task at hand. Hence, a value closer to 1 describes an up-to-date record, while a value closer to 0 describes an out-of-date record. Hinrichs [21] defines a metric to assess the currency of personal data, which describes how up to date a dataset is current with the world it models.
For this study, we are assuming that each personal data type is updated two times a year and has a mean attribute update frequency of 0.5. The age of attribute value is determined automatically from the metadata, using the year 2017 as the reference for the current date. As shown in Table II, data collected in 2017 have a currency value of 1, as the age of attribute value is 0. The data currency decreases as the age of attribute value increases, leading to a value closer to 0 for older data. That is why the year 2010 has the lowest data currency value from the personal dataset collection period. The age of an attribute can be retrieved from the content of the messages falling into the four different data types selected for this research. 2) Data brokers profit with k = 1 data type Using the short message service (SMS) texts with one personal data type, Hobbies, we can evaluate the impact of data currency on the data broker profit as a factor of time. We select N = 1000 text messages from the public dataset, and N = 200 data consumers in a subscription-based model.  Table III shows the data broker profit, U'optimal and U'timeless, using a single personal data type collected between 2010 and 2017. The personal data type used here is hobbies, conveying to any related text messages containing information about activities or interests that the message owner undertakes. The cost of buying a single hobbies dataset from data providers is 4.894 and the maximum profit using the time insensitive model is 51959.23. The table shows that there were no hobbies records collected between 2016 and 2017. That is why the profit of data brokers was not included during that timeframe. Figure 1 shows a comparison of data profits between the two personal trading models in IoT based Data marketplace, one that uses data currency as a key data quality metric call U'optimal and the other, U'timeless, excluding data currency in the data quality evaluation.
Using the new data trading model, data brokers' profit changes as a time factor. Hence, old personal datasets generate less profit compared to up-to-date data. The profit of data increases as the data becomes current with the world it models. On the other hand, the chart also shows that the profit of data brokers is constant using the model that does not use data currency as a critical data quality metric.
3) Data currency impact with one data type and different range of data consumers With a constant number N = 1000 data providers in a subscription-based data trading market, we can analyze the impact of data currency on the profit of data brokers with different number of subscribed consumers. Table IV shows a comparison of the proposed timesensitive data trading model and the old one using dataset containing physical locations of users. The dataset includes messages of the type of delivery and pickup that were collected between 2015 and 2017. Because the public dataset does not contain data type related to user location in 2016 and 2017, the profit listed in Table IV only reflects the potential gain from selling the users' physical location in 2015. The cost of buying these datasets is a constant value of 2.55. We can also observe that the cost of selling data using the old model is about twice the value of the time-sensitive model.
As shown in Fig. 2, the profit of data brokers using different ranges of data consumers is linear in the form y = ax + b, where a and b are constant values that can be derived from the graph as: U'optimal = 91.96 * (number of consumers) -2548.5 U'timeless = 183.93 * (number of consumers) -2549.89 We can observe here that as the number of data consumers is growing, the profit gap is growing even larger. The slope of the line from the old model is approximately two times greater than one of the time-sensitive models when using data collected in 2015.

4) Data currency impact with multiple data type
Considering the current data market is composed of large datasets containing different personal data types, analyzing the data brokers' profit in a similar context will give us a more realistic result on the impact of data currency.  The four data types selected for these analyses are locations, hobbies, credit cards/payments, and bookings. The cost of selling these datasets using the time-insensitive model is 89.353 and the maximum profit is 457,366. Similarly, the cost of buying the data is 31.987 per data provider. As shown in table 5, the profit of data brokers using the time-sensitive model is growing over the years. Fig. 3 shows a decreasing difference in gain between the two trading models as the data becomes more current with the world it models. Because there were no locations, hobbies, credit cards, and bookings records between 2016 and 2017, the profit of data brokers was not evaluated during that timeframe. Also, the profit of data brokers using the timeinsensitive model is a constant represented as a horizontal line.

5) Data currency impact with multiple data type and different ranges of data consumers
As the number of subscribed consumers is growing, we can evaluate the profit of data brokers using multiple personal data types from the public dataset. This will help us understand the gain or loss on a larger scale. This exercise selects four personal data types: hobbies, locations, credit cards, and bookings. The data was collected between 2010 and 2017. However, because there were no records of those types between 2016 and 2017, the profit reported result from selling data from between 2015. The cost of buying the four types of data from each data provider is 31.98744 while the costs of selling are different because of their time sensitivity. As shown in table 6, the cost of selling using the time sensitive model is about half the cost of selling using the old model. As a result, the profit of data broker, U'optimal is about half the profit made, U'timeless, using the old model. Fig. 4 shows that the gap in profit between the two models seems to grow more significant with the number of data consumers. Also, the profit of data brokers using both models are linear functions of the form y = ax + b, where a and b can be deduced using values in Table V. Solving for a and b results in (13) and (14).
Based on (12) and (13), we observe that the slope of the profit function using the time-sensitive trading model is about half the profit function of the old model. With a larger slope, we can deduce that the rate of change of the old model is greater than the new model.

B. Summary
The study results show the impact of data currency on the profit of data brokers in the IoT based Data marketplace, answering the research question and testing the different hypotheses. The findings show data brokers can maximize their profits after trading large personal datasets using the proposed model; the proposed model generates larger revenues than the previous model when trading data liable to temporal decline. The analyses were conducted using personal datasets from Kaggle database that contains 4 types of SMS texts: hobbies, locations, credit cards, and bookings. The cost and definition of each data type is derived from a study conducted in 2015 by Trend Micro [21].
As shown in Table III, the profit of data brokers using the previous models and a single personal data type is constant over time while the proposed models show an increase in profit over time. Also, the gap in gains between the two models was significant, ranging from 28,426.54 to 44,218.84 for data collected between 2010 and 2015, respectively. Still, as the dataset became current, the difference in gain reduced drastically. Using the same dataset with a different number of data consumers, the profit of data brokers shows a linear progression. Fig. 4 shows that the previous model has a steeper line or a greater rate of change with a slope approximately twice larger than the proposed model. This difference shows that when incorporating data currency in the data quality measurement, trading outdated personal data or old data has a negative impact on data brokers' profit. Another critical finding from the observation using multiple data types is the total profit that a data trader can make selling the whole dataset is about 62% higher than the profit using the previous model. Using the proposed model U'optimal(c), the total profit after selling dataset is 83,846.72, while the total profit using the previous model is 51959.23. A difference in gain of 31,887.49. However, this will only be valid if data consumers are interested in historical data.
With a variety of personal data types from the public dataset, we observed a similar profit trend as a factor of data currency. In Table V, as the data becomes more current with the world it models, we can observe that the total selling cost to data consumers increases. As a result, the maximum profit made by a data broker can only be reached when the data currency assessed value is 1. In other words, the timesensitive model is only equivalent to the old model when the data is current. Table VI shows that the profit made using the time-sensitive and previous trading models with a growing number of data consumers has a linear progression with a positive rate of change, respectively 815.59 and 1,631.18. Hence, the profit gap is more prominent as the number of data consumers increase. This difference in gain shows that incorporating data currency in the quality assessment of aging personal data has a significant impact on the profit of data brokers. However, the total profit after trading the whole dataset using the proposed model is about 58% higher than the profit made using the previous model. Indeed, the total profit after trading the whole dataset using the previous model is 457,366 while the one using the proposed model is 782,370 which is a difference in gain of about 325,004. As a result, data brokers can maximize their profits with data types that meet the minimum update frequency and trading stateless data leads to a constant profit, depending on the number of subscribed consumers.

VI. CONCLUSION
The lack of transparency in the personal data market ecosystem is a growing concern among stakeholders. Data trading models for data brokers are designed to facilitate transactions between all parties involved, and the profit of data brokers can be deduced from the difference between the cost of selling data to consumers and buying data from providers. These transactions must be done in a transparent ecosystem to invigorate the participation of all stakeholders. That is why building a personal data trading model that takes into consideration the willingness to sell of data providers and the willingness to buy of data consumers with an emphasis on data quality is crucial to develop a trustworthy market.
Based on existing literature in a subscription-based data market, the proposed data trading model uses realistic WTS and WTB data functions. It is designed with an improved data quality model to accurately assess datasets quality. This study focuses on the quality of conformance described by Helfert [42] as the level of correspondence between the attribute values stored in a database system and their respective realworld counterparts. This approach to assessing data quality is objective as it eliminates the subjectivity of consumers' demands in specific business settings. Unlike existing studies that only use the correlation and completeness of data quality metrics, this study adds a third dimension, data currency, to understand its impact on the profit of data brokers in the IoT based Data marketplace.
The analytic results of this quantitative experimental study have demonstrated that the profit of data brokers increases as the personal dataset becomes current with the corresponding real-world counterparts. Data brokers can maximize their profits using a time-sensitive data quality model to eliminate outdated data and meet data consumers' needs in quality. With an accurate data quality model, the data brokers' profit increases tremendously with the number of subscribed data consumers, and that is because data consumers are interested in high-quality data to create new revenue streams. These observations demonstrate that it is crucial to consider the liability of personal data to temporal decline in their pricing processes. This paradigm is feasible in the current IoT based Data marketplace characterized by the volume and velocity of the data, as there are various data currency assessment models adapted to the business' needs. Building a trustworthy data trading marketplace requires a model that incorporates the willingness to buy of data consumers, the willingness to sell of data providers, and an accurate data quality assessment model. Data brokers will be able to maximize their profits in such a marketplace.
As recommendations for future research, it would be great to create a data trading model that accounts for multiple personal data stores and brokers. The data trading model can include additional costs from storing and processing the data for a more realistic profit margin. Moreover, future research can use a different method to evaluate the correlation between two personal data types, which is more realistic than randomly generated values.