Understanding Big Data: Data Calculus in the Digital Age-Luohan Academy

Long Chen currently serves as the director of Luohan Academy, an open research institute initiated by the Alibaba Group and launched by world-renowned social scientists. He also serves as the Executive provost of the Hupan School of Entrepreneurship. Chen was a tenured professor at Olin Business School, Washington University in St. Louis. After returning to China in 2010, Chen took the position of the Associate Dean of Cheung Kong Graduate School of Business (CKGSB), and Professor of Finance.

In the third Luohan Academy Frontier Dialogue, Long Chen, on behalf of all authors (including nine Luohan Community authors and seven Luohan In-house authors), gave a brief introduction of the report "Understanding Big Data: Data Calculus in the Digital Era".

Transcript

Steve Tadelis:

I'm going to transfer the baton over to Long Chen who is going to talk in more detail about the report. Long, I'll give you a five-minute notice and a one-minute notice.[1]

Long Chen:

Okay. So, it is really my great pleasure and honor to present on behalf of the author team of this report. The mission of Luohan Academy is to work with the best minds to tackle the first order issues, on how to better embrace data technology. We are hoping that this report can stimulate more discussions on the important issue of data and privacy today and later.

In both theory and practice, promoting data sharing has always been fundamental for human cooperation. In Douglas North's words, "The fundamental problem of cooperation is for individuals to obtain the knowledge of each other's preference and likely behavior." It has been a century long practice to list publicly, everybody's name, address, and telephone number in the yellow pages. In industries such as finance or healthcare, the so-called, "knowing your customer" or “KYC” is the prerequisite to obtain satisfactory services.

In fact, sharing information is so important that, as Hayek pointed out, it is the economic problem of society, because only by doing so, can we utilize the knowledge that is not given to anyone in totality. In the process of writing out this report, we get to realize that even though information is not data, or data is not information, data economics in these days is pretty much the information economics in the digital era. In that regard so many great minds have explored how to overcome the obstacles of information, including Mike Spence, Bengt Holmstrom, Chris Pissarides, Alvin Roth, Eric Maskin, and Patrick Bolton. We're so honored to have you here to continue the discussion and exploration on the economic nature of information. Today we'll talk about the nature of data and how should we deal with it.

In our view, in order to understand some key issues in data economics, which we call data calculus, we have to have an integrated approach that can combine the understanding of the value of data versus some of the key issues, such as how should we understand and protect personal privacy? How should we understand and implement the ownership and the benefit distributions from data? What is the relation between big data and market competition? While we are understanding the impact, the nature of big data, we probably should also be inspired by big data. In the words of Ronald Coase, we do not want “blackboard economics” in purely theoretical models. What we need is more empirical evidence. The inspiration is most likely from the patterns, puzzles, anomalies arising from the gathering of data, particularly when we need to understand the new patterns to break out existing habits of thoughts.

Let me first discuss a bit about how should we understand people's privacy attitude and their behavior.There’s a global phenomenon that is called privacy paradox, which says that pretty much in any country, like in Europe, in the United States, and in China, that is no different. The majority of the people will say that they have huge concerns about privacy. Such claim is so much that we are given the impression that privacy abuse must be very severe. But in the meantime, in any of those countries, not only those countries, but other countries we don't list, people are very willing to share information at very little reward. This is very perplexing. One potential interpretation then is that probably people have no choice. It's like, let's say, if you do not to share your personal information on Facebook, if you don't want to log in, then you are shut out.

This hypothesis motivates us to provide, in our report, one of the largest big data studies on personal information, unrelated to the personal information choice behavior, while using the Alipay data. The idea is as follows. On Alipay's platform there are actually tens of thousands of something called "mini programs." They are actually the pop-up apps that provide all kinds of services from, let's say, transportation, restaurant booking, travel planning, take-out delivery, et cetera. To authorize the service, you actually have to authorize them. You can opt-in and later you can try to opt-out. It can be argued that because Alipay has more than a billion users, it would be hard for, especially Chinese users, refuse to use it. However, for those mini programs, users do have a choice. Those mini programs vary by their degree of necessity and their sensitivity of the requested information.

Therefore, we can have a very interesting angle to study how hundreds of millions of people make personal information related choices when they do have options. The first pattern we find is that, while there's some difference between a male and a female, males are more willing to share, higher educated people are more willing to share, and younger people are willing to share. But on average, about 75% of the users, when they are asked whether they are willing to share some information to get the service, they will opt-in. Overall, the majority of people are willing to do that. And they rarely regret. For example, after they opt-in for using the service, only about 0.1% of them will actually opt-out per month. This number is actually very consistent.

If you look at the United States, Canada, and Europe, actually the people opt-out a very tiny bit, which means they are not really regretting their choices. Also, people are very willing to try new apps. So, in those graphs here, the horizontal axis is the number of the users, which to some extent stands for the popularity. You can see that people are very willing to try new apps, regardless of how many users they already have. But they are more likely to opt-out if it's a newer app later.

Probably that means the service is not good enough. So, here we have two graphs. Here you can see that horizontal axis is either the age or the digital age. There are two patterns here. One pattern is that, the more sensitive of the information those pop-up apps ask, the less willing are people to opt-in, which would make a lot of sense. So, people actually do care about their privacy issues.

The second pattern is that the younger people are more enthusiastic to share. The older people are also equally willing to share. But we see people in the middle are more hesitant. It means that when people have more digital experience, they become more cautious. But when they get more actual experience, digital experience, they are actually more open to try more digital services. So that's the general patterns. That means they do embrace more data sharing in the long run when they are more digitally versatile. Those patents are actually consistent globally. There’s a privacy index in the past several decades, and we can find that only less than a quarter of the people are the so-called, "data fundamentalists", which means they really refuse to share much data, but the majority of people are willing to share. This was very consistent with what we observed in China.

Actually, in a separate paper I wrote with Professor Wei Xiong and my colleagues at Luohan Academy and Princeton University, we found that it is precisely people who are more willing to emphasize concern about privacy that are the ones who actually used more digital services. So, the title of our paper is, "The Privacy Paradox and the Digital Demand." The point here is that, the reason why people want to share their personal information is because they are willing to share more, and they want to get more services. They are very willing to do that. In the meantime, they are concerned about privacy. That is the big-picture message here.

This leads to my second point that there's a lot of the value coming out of data, as Mike just mentioned. Let me first quickly mention several things which we are all familiar with. One is that of course we know that the data exchange is essential for connecting supply and demand in the digital age. In a traditional age, usually a local store only serves people, let's say, within 10 kilometers. But now, with e-commerce the average distance between sellers and buyers is about 1000 kilometers. The traditional so-called "gravity model" was broken, and that means much, much more opportunities. Data exchange is essential for this to happen.

Also, data sharing makes us smarter in the area of finance, as some papers by Professor Yi Huang and his colleagues, who is also online here. With the digitalized information that a lot of SMEs are able to get the financial support for the first time without collateral.

Professor Holmstrom also mentioned that information has to become the new collateral in the digital age.There’s also another paper by Jin and Sun, both of whom are from Harvard and also visiting scholars at Luohan Academy. They found that the new startups nowadays quickly learn because of information sharing, as a result, their sales just jump up relative to the ones who haven’t used the correct information from the platform.

Then what happens if there's no personal information used in the recommendations? Just now Mike brought up this topic. There's another paper by Professor Sun and his colleagues, and he's also, of course, one of the authors of this report.

They found that, and this is so interesting, if we exclude the use of personal information, the recommendation will quickly concentrate on the top brands, as we have experienced in the industrial age. Also, the consumers quickly find that that's not what they want because it doesn't fit their personal needs. The click-through rate just dropped, and also the actual purchase dropped by a whopping close to 80%. So, as you can see, it hurts the consumers and it hurts the producers, especially the small brands.

Another point is that data sharing builds the trust as a catalyst. As Steve said, information sharing is so crucial for building trust. On a typical platform nowadays, everybody, every participant, they rate every product, every seller, and every procedure of the service. All together, they build a trust information, such that you can have hundreds of millions of people and tens of millions of the sellers, as if they are dealing within the same room, face-to-face.

To sum up, data help connect us with each other, make us smarter and build trust. You can see that all these kinds of data are helping to make the Hayek’s point more concrete.

After understanding where the value of data come from, and also how consumers actually make decisions on personal information sharing, we know that when we think about the customer benefits, it's not only the benefits of protecting the privacy, but also the benefit of the services by providing some kind of personal information. So, then the best policy to deal with personal data is not to lock it up, because you are really hurting the consumers this way, but actually to let the information to flow in market mechanism. In the meantime, we try to find the best way to protect privacy.

It leads us to the third question: What is a good way of protecting the data and privacy security and in the meantime, promote information exchange? Now, the basic idea here from what was summarized from practice in the report, is that it requires a coupling of the private engineering with privacy-enhancing technology. Privacy engineering means that it essentially is the privacy by design. It is the user-oriented, protection-oriented principle, which it builds this into the design and the use of the software and service. That is then coupled with the privacy-enhancing technology.

Let me give you one example at the Ant Group's practice. This is just one practice. Not only Ant Group but many other companies are trying to manage data in this way. At the collection stage, the data collection should be permitted, and it should be minimized. You don't collect all the information. After the information is collected, before anybody can use it, it should be desensitized. At the storage stage, it can also be encrypted, such that even if somebody steal it, they don't know how to use it. Then it's only at that stage people inside can start to use it. When you exchange the data with other people, it is the desensitized, encrypted information.

One good example of the privacy-enhancing technology, is called multiparty calculation. Here, more and more, when the data is actually being exchanged, it's exchanged in a third-party environment. This is something called zero-knowledge proofs, so that you actually share the information to find your solution, but you actually cannot trace back to the original data. That's the point.

There is a good logic that if you combine with the privacy engineering and the privacy-enhancing technology, then it is possible that the privacy and the security risk does not necessarily increase as more data are shared. We've seen these similar things happened historically in the food industry and in the airplane industry. Nowadays, we eat a lot of food, but we are not worried about being poisoned, even though it's provided by the private companies. Or we take a lot of planes, but you are not as worried about if it's going to be a crash, because of technology and some principles.

We already talked about how should we understanding privacy issues, and some of the value of the data. Now, we probably need an integrated framework to understand data and privacy. Here, we've been trying to say that the data exchange is the fundamental driver of economic activity, innovation, and benefits, so we should provide the data flow. In the meantime, we should protect the rights of the data subjects.

The next question is how do you do that? We think we have to build this based on the nature of data and the information. There's a couple of essential features of data. One is the nonrivalry, which means our data is not oil. That feature means it's more like fire that can be passed on. Now, it also means that there could be unlimited numbers of the ownership and it used up the data without consuming itself.

This brings up interesting issues. Who produces data? In our view, then, that actually, data producers are not necessarily the data subjects. For example, I am talking. The information that I am talking, I'm the data subject. It's related to me, but that piece of information is being produced, actually, by all the participants today, and you produce it by using your eyes and your ears. If you do not believe me, you can close your eyes. You're going to have a different set of information. You're actually producing it using your organs. We have a lot of ways to get that piece of information. The point here is that the data producers are not necessarily the data subjects.

Steve Tadelis:

Long, I'm giving you a five-minute warning.

Long Chen:

Got it. I'm getting there. As a data subject, actually, I have no incentive, paid no cost to produce this information. I'm just talking. I'm not trying, just producing the data. That's my point. There's a difference between subject, data subject, and the data producers. Finally, we believe another thing that we should pay attention to is the use case, because the value of the data and how data is produced is a continuous process in the particular use case, or that's called the real economic activities.

The second feature of data is that it's non-separable. The data subjects and the data producer are not separable. If we put those things together, that implies that probably we should not grant sole ownership of the data to the data subjects. At least, it could have multiple versions of ownership. The point of protecting the privacy is more about how protecting the privacy when data is being used.

Finally, we should have a market-driven mechanism that can decide how to distribute the benefits. Data sharing, everybody is already a beneficiary of data sharing. I do not have enough time, but that's the rough framework of how we think about the data. We found that, in reality, the evolving principles of privacy protection are consistent with the data calculus framework.

Now, we all know that there's this so-called fair information practice, FIPs. That is the building foundation for later laws and regulations on privacy and personal data governance, including the OECD guidelines, GDPR, and the CCPA. The key point here is that the none of them... Though they are different in some aspects, but none of them is trying to lock up data or restrict sole ownership. They aim to promote security and privacy-protected data flow.

Let me give you some quotes. As OECD pointed out, we are trying to harmonize privacy legislation, and we are holding up such human rights, we are trying to prevent interruptions in international flow of data. In FTC, they also pointed out that they're trying to maximize the benefits of big data while mitigating risks.

A quote by Peter Winn, who is the chief privacy officer and director of U.S. Department of Justice, Office of Privacy and Civil Liberties. As he pointed out, trust is fundamental to the efficacy of any institution. People can be blinded by a forced choice that online privacy governance must either be a Leviathan state social control with an absolute sovereign or a free-market system based solely on private property rights.

That seems to be consistent with what we are trying to say. The best way to protect the privacy is not restrict the sole ownership, because that doesn't fix the nature of data, but actually, you have to protect how and when it's being used.

Now, the final part is the relation between the data and the market competition. One question people frequently ask these days is to what extent is big data causing a phenomenon of winner-takes-all. Now, we believe this has to be taken case by case. Let me give you some evidence in China. For Alibaba, that is most familiar with a couple of things.

One is the e-commerce. Let me give you one example. There is a company called Pinduoduo. In the past four to five years, it has grown from nothing to a company that has market cap... I just checked this morning, it's $240 billion. That is e-commerce. My point here is that in the e-commerce market, the players, they are still growing in a healthy way, but the market share of the top players is diverging more and more.

It is in a similar situation in the mobile payment. Mobile payment, which is something Ant Group is very familiar. You can see that Ant, in the early days as the pioneer, it held almost 80% of the share, but now it's getting down to lower than 50%. It's getting more diverse. We can see the similar patterns in the advertisement industry. There are more players, like TikTok, that are coming up. And the old, the most famous one of the big data leaders was Baidu, and here, clearly you can see it's going down. The company is kind of trailing back.

In the United States, and all over the world, we can see that it's not only in China that the market share is spreading, but there is another trend that firms rise fast, and also decline very fast. The number of years a company can stay as S&P 500 companies is clearly declining. If we look at the companies back in year 2000 who claimed they are big data leaders, now, a lot of them are gone. More than half of them have gone, and very few can still take the lead.

Why is that the case? Because as we all know, that data is only one part of the production input, as that is part of the business models. We're actually still competing through the business. Your business models matter. There's also a big gap between the data and available information, probably Bengt Holmstrom or Thomas Sargent will tell you that. The transformation, actually, from data to the business insight takes a lot of skills. Also, the value of the data actually declines very fast. In the words of Professor Catherine Tucker, for a resource to provide the market with a great lasting advantage, it must be inimitable, rare, valuable, and sustainable. Data is not like this. Also, the network is double-edged.

Steve Tadelis:

Long, I'm giving you your one-minute warning.

Long Chen:

Sure. Thanks. I appreciate it. Also, we find that both online and off, that the price in China and in the United States is actually converging. This is another good signal that it's a very competitive market. Another point is, does big data lead to discrimination against and harm consumers? There could be instance of that, but a lot of evidence against that. There's very little academic evidence suggests this is the mainstream trend .

Long Chen:

I think there's one good reason for that, because for the first time, actually, producers know who the customers are. It's really up to them and their benefits, actually, to stay with the consumers for a long time. Ripping them off at every case is not to their benefit. Finally, we know that in all the industries that are being reshaped by the digital technology, there's a lot of innovation. That's the key.

On the platforms, on the digital platforms, like on Alibaba's platform, or in Apple, we can see a lot of innovations coming up. Let me summarize. We're trying to explore the data calculus in the digital age in this report. We tried to answer some of the questions. In particular, we're trying to understand people's privacy attitude and behavior. We're trying to understand where does the value of data come from. We tried to answer how to protect data and privacy security while promoting information exchange.

We're trying to provide an integrated framework to understand data and privacy. We also say something about the logic of data governance, and data and the market competition. Finally, there are a couple of ways to get the full report. You can either scan the QR code here or you can go to our website to download it. Thank you very much. It's just a start, and we hope it can stimulate and be useful for further discussions. Thank you.

Steve Tadelis:

Thank you very much, Long. I'm sure folks here, the participants, and we have almost 80 people, which is really exciting, have all been very stimulated. As of I, from the information you provided. Now we're going to jump into four separate short panels. Each one is going to have two participants who were involved in writing the document, and are going to share their thoughts and wisdom.

For more information, please visit Luohan Academy's youtube channel: Luohan Academy