Contact Tracing Technologies: Methods and Trade-offs

29 min readApr 29, 2020

By Alex Berke & Kent Larson

This post was an early draft of our white paper, now available via the MIT Media Lab. Read here.

Many organizations are working on technology for contact tracing, and the landscape is changing rapidly. This is an overview of existing contact tracing technologies, along with different methods and trade-offs to consider when building new ones.

Governments around the world are considering the deployment of contact tracing technologies to help contain the spread of COVID-19 and mitigate its economic impacts.

Combined with increased testing, effective contact tracing offers the opportunity to improve policy decisions by providing information to help safely re-open the economy and only intervene when new outbreaks are detected. In particular, governments and communities may use contact tracing technology to:

Target quarantines to mitigate the economic impacts of stay-at-home orders
Understand transmission trends
Better use limited testing resources
Conduct targeted serology testing in order to expedite the return of a workforce
Improve traditional labor-intensive contract tracing efforts

However, it is not yet known whether contact tracing technologies will have their desired impact. They will need to be widely adopted and accurate in order to be effective, and they will need to provide enough information about their users to health authorities or governments in order to guide future policy decisions. These challenges raise both technical issues and societal issues, as deploying effective contact tracing technologies may jeopardize individual privacy rights and freedoms.

There are different ways these technologies can be designed, but each design decision leads to different trade-offs between their potential accuracy, adoption, usefulness, and privacy risks. This blogpost lays out these differences and trade-offs, and raises questions that should be addressed by the people who will build these technologies and the communities who will use them, collectively. The public should understand how these systems function, and what their alternatives are, because their widespread adoption may drastically change how we move or work in public and the degree to which we can live in privacy.

We will provide an overview of existing contact tracing technologies and explain how they work, and how future alternatives might work instead, so that their trade-offs can be better evaluated. In particular, we will cover differences in how data is sourced and used to detect contacts, whether the flow of information is centralized or decentralized, how COVID-positive cases are reported, and how exposure risk is assessed and how the system’s users are impacted. We will discuss how these differences lead to trade-offs between accuracy, adoption, usefulness, and privacy.

We want to leave you with information as well as questions as to whether any of these technologies can be useful enough or worth their trade-offs for potential benefits that cannot yet be measured.

Background: Contact tracing & technology

Contact tracing is a longstanding public health strategy for reducing the spread of infectious disease by identifying people who may have been exposed. Traditionally this involves asking infected people where they have recently been and whom they may have come in contact with, and then following up with those contacts. But this process can be labor-intensive and people may not even recall all of their recent whereabouts, or know the people they came in contact with.

Technology can help make this process both more efficient and more accurate, and help scale up existing human-driven contact tracing initiatives. For example, data collected by mobile phones can aid a patient’s memory and increase the accuracy and speed of traditional contact tracing interviews. Mobile applications can also be used to connect people with resources for getting tested or provide guidance for quarantine or other means to reduce the spread of infection. This overview focuses on systems that go beyond assisting human-driven contact tracing interviews, such as technologies that automatically identify and notify individuals who may have been exposed and assess risk.

Recent technology efforts for contact tracing

There have been significant recent efforts to use technology to scale up the process of contact tracing in order to control the outbreak COVID-19, including projects coordinated by governments, open source communities, and private companies. Notable examples have already been deployed by governments in Asia.

contact tracing apps in Asian countries — (Left) In South Korea, there are multiple apps and websites publishing maps and detailed timelines for infected patients’ travel histories. (Center) In China, the AliPay Health Code app has colored QR codes where the colors indicate how long a user should quarantine and scanning the QR code is necessary to travel around or enter many places. (Right) In Singapore, the TraceTogether app detects points of contact between users by using the peer-to-peer exchange of messages over Bluetooth.

South Korea has effectively traced the travel routes and contacts for infected patients by using a range of data sources, such as in-person interviews, GPS data from cell phones, and credit card transactions, all of which they have legal access to. There are multiple websites and smartphone apps that publish this data with timelines and maps, including granular details such as which bus someone took, when and where they got on and off, or whether they were wearing face masks. The government also broadcasts information to nearby citizens with emergency alerts whenever new cases are discovered in their districts. This provides a basic way to inform people of their risk of exposure, based on whether they may have crossed paths with those infected.

In China, the “Alipay Health Code” system involves a mobile app that creates colored QR codes for each user, where a color of red, yellow or green indicates the user’s exposure risk level determined by the system, and dictates their quarantine. Exactly how exposure risk is determined is not public, but involves combining people’s personal information with their recent travel histories and current location. Using this app, or systems like it, has become a de facto requirement in hundreds of cities across China, where scanning a green QR code is required to enter many buildings, or travel, or return to work.

In Singapore, The Ministry of Health and Government Technology Agency launched the TraceTogether mobile app for more targeted contact tracing. Instead of using geographic location data to detect whether people were in the same place at the same time, the system uses Bluetooth signals to detect whether two app users came into proximity of one another. The app is voluntary and its limited use raises the question of whether opt-in systems will be effective or whether there will need to be ways to incentivize their use.

In all of these described systems, the collection and management of their users’ data is centralized, allowing the governing authority to more effectively act upon it.

Ongoing projects and research

There are many more independent projects and proposals that use data in similar ways to those systems already implemented by governments, but with more privacy-preserving and decentralized technology designs. They are designed to limit the centralized collection of people’s private data in order to limit its potential abuse. Many of them focus on the use of Bluetooth technology because it can provide ways to precisely detect whether app users come into contact with each other without exposing other sensitive information that location histories can reveal. Notable projects and academic proposals include CoEpi, Covid-Watch, DP-3T, and PACT (Private Automated Contact Tracing) from MIT.

Shared protocols

These projects were designed with the intention of sharing protocols (e.g. TCNCoalition), so that even as they develop separate mobile applications, information about infection status can be shared across users of the different systems. How a user’s infection exposure risk level is assessed and shared can then be tailored to each project’s specific objectives.

From independent projects and research to adoption

The ideas from these independent projects and proposals have made their way into a new framework jointly provided by Apple and Google (announced April 10, 2020) and will likely be adopted by Western countries developing their own new contact tracing systems. However, the extent to which Western countries will maintain user privacy in the ways originally intended by these projects’ authors is not yet known.

Apple and Google’s new framework provides a software layer that interfaces with Bluetooth, allowing other software developers who work on behalf of various public health authorities to build apps on top of. Their framework is designed around providing security and privacy for users and their draft Bluetooth and cryptography specifications follow the suggestions of recent privacy-preserving research proposals, such as PACT and DP-3T (the appendix describes how these work). So far, their intentions also seem consistent with the desire to protect users’ data from the centralized collection by governments, which may have already put them at odds with French and British health authorities, which had plans for more centralized contact tracing systems.

Using Bluetooth to create effective contact tracing systems was previously difficult due to compatibility issues between Google Android and Apple iOS devices, as well as iOS limitations on the continuous broadcasting of Bluetooth signals (due to privacy considerations). Needless to say, Apple and Google’s new framework changes this. It provides an interface to more easily use Bluetooth for contact tracing, and also improves interoperability between Android and iOS devices.

There is now good reason to believe that Bluetooth will become the preferred method to detect whether two individuals came in close enough contact to transmit the disease, and will be used in combination with location history. We’ll explain why in more detail shortly.

Implementation differences and trade-offs

Contact tracing technologies that have already been created, and their alternatives that could be created in the future, have a range of differences in terms of the following:

How data is used to detect contacts
How trust and the flow of information is managed
How positive cases are reported
How exposure risk is assessed and how it impacts users

And these differences in methods used lead to trade-offs between:

Adoption
Accuracy
Usefulness to public health authorities and decision makers
Usefulness to individuals
Privacy

We’ll discuss each of these. But first note that when we consider privacy, we should specify whose privacy is protected, and from whom privacy is being protected. In the case of contact tracing apps, there are users who are infected and then share their data, and there are other users who they may have come in contact with. We can then consider 3 different notions of privacy for users: (1) privacy from authorities administering the system or app, (2) privacy from potential contacts, and (3) privacy from anyone else. This ‘anyone else’ might include snoopers trying to find out information about individuals, or it might involve companies increasing their existing collection of user data to help them better target ads, or for other means of private profit.

All of the existing projects and research proposals assume that users with a positive COVID-19 test result must give up some privacy when reporting their data. However, the amount of privacy they give up, and to whom, varies depending on implementation.

To further protect user privacy, any contact tracing system should only collect data relevant to the current problem and data should be deleted after a predefined period that experts consider medically relevant. We state this here because these data minimization and storage limitation measures are not specific to any of the alternative technology designs or methods we discuss, they are simply standard good practice as well as part of GDPR compliance.

How data is used to detect contacts: Location data versus Bluetooth co-locations

There are different forms of data that can be useful for contact tracing: location data such as timestamped GPS coordinates, or data collected via Bluetooth signals. There are also different ways this data can serve contact tracing, such as through the creation of maps and aggregate statistics, or detecting whether two people have come into contact. In what follows we will explain these different data sources and the ways to use them, as well as their trade-offs.

How data is collected and what makes it useful

People’s location histories are commonly collected and recorded by applications installed on their mobile devices in the form of timestamped GPS coordinates. Co-location data collected via Bluetooth is different. Devices that broadcast and receive messages over low-range Bluetooth signals can exchange data peer-to-peer when they come in close enough proximity to one another. This provides information about whether people were co-located rather than their geographic locations. We will see how this can be useful for privacy-preserving contact tracing.

Mobile phone, GPS location icon, Bluetooth icon — *Data collected from mobile devices, such as user locations or messages exchanged over Bluetooth, can be used to serve contact tracing efforts in multiple ways.*

This data can be used to scale up contact tracing efforts in multiple ways. One way that GPS data can be used, but that Bluetooth data cannot, is the creation of maps and timelines, or aggregate statistics, about when and where people went before they were diagnosed as infected. This can be useful to public health agencies, and making this information public can help inform other people of their exposure risk. This approach is used by South Korea, with their release of detailed timelines of infected people’s whereabouts. However, the detailed information they publish risks exposing private information about the infected people they report on, and risks the stigmatization of the businesses or communities that these people visited. This data could instead be more safely anonymized and aggregated, but there is a trade-off: the data is more informative when it is more detailed, but safer for use from a privacy perspective when it is more aggregated and less detailed. However, even in aggregated form, GPS location data can be useful for the creation of heatmaps and for statistical analysis in order to better understand geographic transmission flow and trends of disease outbreaks. Alternative uses of this aggregate data keep it encrypted, or add noise (e.g. differential privacy), in order to better preserve privacy while using artificial intelligence tools to predict, rather than track, future disease outbreaks and hotspots. These promising use cases may be addressed in a future blogpost.

Another use case for both GPS location data and Bluetooth co-location data is more targeted person-to-person contact tracing.

Trade-offs: Accuracy, usefulness, and privacy

Location data approach

GPS data can be used to estimate an individual’s disease exposure risk by detecting whether their mobile device reported a location near an infected person’s at about the same time, and for how long they were in that place together.

*With GPS data, points of contact are detected based on whether users’ devices reported locations in the same place and time.*

An issue with using GPS data in this way is that it suffers from limited accuracy in dense urban areas or indoors, and lacks context on which room or floor in a building someone was, making it less useful for detecting if people came in contact. However, GPS accuracy can be somewhat improved when combined with data logged by Wifi routers.

Another issue with using GPS data is privacy, as location histories can reveal private and sensitive information about people. This is the case even when data is anonymized, because statistical methods can be used to reconstruct location histories and re-identify people. Redaction can help mitigate this risk. For example, systems like SafePaths allow health providers or users to retroactively redact their location histories before sharing them. Apps could also allow users to proactively set places and times when their data will not be recorded at all.

Bluetooth co-location data approach

Some of the issues with GPS location data can be resolved by using Bluetooth co-location data for the more targeted person-to-person contact tracing. (This approach is used by TraceTogether, CoEpi, COVID-Watch, PACT, DP-3T, the Apple/Google framework, and more.) Applications installed on users’ mobile devices use the Bluetooth Low Energy (BLE) protocol to broadcast IDs and listen for IDs broadcast from other devices. Each app records information about their broadcast IDs and received IDs. Since the Bluetooth signals are low range, the apps can only exchange IDs when devices come in close proximity of one another, serving as a good proxy for whether users came in close enough contact to transmit disease. Users who later report a positive infection status can share information about the IDs their app broadcast or received (depending on the implementation). Exposure risk for other users can then be assessed by whether their app exchanged IDs with infected users’ apps.

mobile devices exchange messages over Bluetooth peer-to-peer — *With the Bluetooth approach, devices exchange IDs peer-to-peer when they are near each other. Points of contact are then detected based on whether a device received another device’s ID.*

One of the benefits of using Bluetooth for more targeted contact tracing is that it can allow a system to better preserve user privacy. While location data can expose people’s private information, the approach with Bluetooth mitigates privacy risks by detecting when people come into contact without using location data. The app that broadcasts IDs from users’ devices can generate the IDs in a way to make them look random, and change them often, making it much more difficult to track people between places they go (although it will still be technically possible).

Another benefit of the Bluetooth approach is that it can overcome the accuracy issues of GPS. The Bluetooth signal is low-range and degrades when crossing between the walls and floors of a building, enabling a more precise way to detect whether two people were in a shared space. A measure of signal strength can also be used as a proxy for how closely two people came into contact and this measure can be used to better assess exposure risk.

Yet by itself, the data from Bluetooth may not be accurate or useful enough, as the information it lacks about the locations where contact occurred can provide important context for risk assessment. For example, whether contact occurred with an infected user while in a closed setting like a restaurant, where many people may touch surfaces, versus outdoors should imply different levels of exposure risk.

There is another trade-off between privacy and usefulness to consider for Bluetooth-based systems. Bluetooth can only detect when people were in the same place at the same time, and may miss when people shared common spaces at slightly different times. In these cases disease may transmit across commonly touched surfaces (fomites), such as grocery check-out counters. GPS data does not have this issue because when comparing time and location to detect points of contact, the comparison can account for time ranges. Bluetooth beacons that act as signal repeaters can be installed at common locations to help resolve this issue for the apps that use Bluetooth. These beacons could repeat the signals broadcast by app users that came near these beacons for a limited time period, so that the next app user that comes near the beacon also receives the signal. However, associating Bluetooth beacons with dedicated locations, and having these beacons listen to users’ broadcast signals, then degrades the central privacy feature for using Bluetooth: signals received about co-location are not associated with locations. If the beacons store information about the signals received, this information could be used to later learn where someone was, and who else was there with them.

This potential use of beacons should raise new privacy questions. Even though Bluetooth-based systems may better preserve location privacy in the present, the act of building these systems can change that in the near future. Bluetooth beacons are already used by retailers in stores to track customers’ behaviors to better sell their products. We can imagine a future where beacons like these are as ubiquitous in our environments as the ubiquitous collection of GPS location data from our mobile devices. Building contact tracing technologies that cause our devices to constantly broadcast Bluetooth signals may bring about this future more quickly. In other words, by building Bluetooth-based contact tracing systems intended to preserve privacy, we may just create more precise ways to track people.

Hybrid approach

The most accurate and useful contact tracing systems will likely use Bluetooth co-location data in combination with GPS location histories. Bluetooth co-location data can be used for the more precise detection of contacts, while GPS location histories can provide data for aggregated statistics and heatmaps.

Mobile phone, Bluetooth, GPS location — *The most accurate and useful contact tracing systems will likely use Bluetooth data in combination with GPS location histories.*

Location data can also improve the contact detection done via Bluetooth. For example, when an app using Bluetooth exchanges IDs with another app, it might also record metadata, such as the time and location where those IDs were broadcast or received. If those IDs are later shared by an infected person in order to indicate exposure risk to their contacts, a system can then connect those IDs to the time and place an app stored them. This can provide useful context about where a user was at these points of contact in order to better assess exposure risk.

However, while this may make an app more useful, it also re-introduces the privacy issues associated with location data, as co-locations are then reconnected to locations. An app could mitigate this risk for its user by only storing locations locally and never sharing them, so that only the app’s user would see where it came into contact with infected people. However, this does little to preserve the privacy for the infected users who shared their data, as their locations will then be shared with their contacts who could then identify them.

Additional data sources, such as credit card transactions, transit pass records, or CCTV footage (all of which have been used in South Korea’s contact tracing efforts), can also be useful for improving the quality of information used for contact tracing. However, each of these data sources also present trade-offs between the added accuracy and usefulness they provide, and privacy.

Trade-offs: Adoption

An important issue to consider for Bluetooth-based systems is adoption. Systems that rely on using Bluetooth to detect contacts will require the mass adoption of a new mobile app before they can be useful, while this is not necessarily the case for systems using GPS locations.

Users of Bluetooth-based contact tracing apps need to exchange enough data via the apps, in advance of infected users reporting their data, in order for their contacts to be detected. In addition, a substantially large portion of the population needs to consistently use the system in order for it to provide enough useful information to the people who do use it. Even in Singapore where there is a government app (TraceTogether), less than 20% of people have downloaded it (at time of writing). If only 20% of people use an app, the system can only hope to detect about 4% (0.2 x 0.2) of the encounters between people. Needless to say, too many points of contact with infected people will go undetected for this app to have a meaningful impact.

On the other hand, location data is already collected from mobile devices by a variety of apps and companies and it can be used even before the mass adoption of new apps. For example, users can export their Google location histories, or companies might share, or be compelled to share, the location data they have been amassing.

Countries such as China, and now Israel, have decided to make use of data already collected from people’s devices, rather than allowing users to opt-in to their surveillance. This data provides the opportunity to scale contact tracing efforts as well as enforce quarantines by monitoring whether people stay at home. Using location data in this way can make a system useful with the immediacy needed to effectively stem the rate of further infections, but may forfeit the privacy and rights of the citizens who did not explicitly consent to being tracked by the system.

How trust and the flow of information is managed: Centralized versus decentralized

The contact tracing technology systems used by South Korea, China (AliPay Health Code), and Singapore (TraceTogether) are centralized: a single entity collects location, co-location, or other data from all users, whether or not they have positively tested as infected. These entities also control the flow and use of this information. For example, China’s system can use its knowledge of the location histories from all of its users to find similarities and determine which users were more likely exposed to infected users.

Similarly, Singapore’s TraceTogether app, which uses Bluetooth to exchange IDs, keeps a database linking the IDs that users broadcast to users’ identities and phone numbers. When users are diagnosed as infected, they are required to then share the IDs that their app received from other users with TraceTogether’s central server. These received IDs are the IDs broadcast by their contacts. Authorities then connect these IDs back to the information they store about users in order to learn who these exposed users are and reach out to them through their phone numbers.

centralized vs decentralized — *With centralized contact tracing systems, all users share their data to a central authority (left). With the decentralized approach, only infected users need to share their data (right).*

Decentralized systems, such as those proposed by CoEpi, Covid-Watch, PACT, and now enabled by Apple and Google’s framework, work slightly differently. When users are diagnosed as infected, they (optionally) share their data. This data may even be shared to a central database. What makes the system decentralized is that other users can then download or query this data without sharing their own information. The data returned can then be used to assess their exposure risk locally within their app.

Trade-offs: Privacy and usefulness

The centralized and decentralized system designs differ in terms of whose privacy is preserved, and from whom. In each case, infected users give up some privacy when reporting their data, but in the case of the centralized system design, they need only share this data with the authority managing the system. This can work well for users if they trust the authority managing the system because by collecting data from all users, the system can do the work of finding points of contact while protecting users’ privacy from others. But no users have privacy from that authority. This authority may be a government, or an organization, or a company. The authority may then have the opportunity to act on its knowledge of contacts, not only to notify people of their exposure risk, but possibly ensure that exposed contacts quarantine or limit their travel, as in the case of China. This amount of information and level of control afforded to governments may make the system most useful for them and their citizens, and be desirable. Or it may be of concern. In places like the US, concerns about forfeiting privacy and control to a central authority may even stymie adoption of the system, making it less effective.

In the case of a decentralized system, users can query the system to find whether they had points of contact with infected users without sharing their own information. This makes it difficult for an authority to gain an overall view of which users, or how many users, came into contact with infected users. A decentralized approach can increase privacy for most users, but at the potential cost of privacy for the infected users who shared their data and whose data is then free to access. Some systems use additional privacy protection measures, such as mix networks and private set intersection protocols, to limit the amount of information other users can learn about infected users’ data, and the amount of information other users can expose by querying the system. However, these additional privacy measures add implementation complexity.

In general, the decentralized approach provides users with more privacy and autonomy, and authorities with less information and control.

How positive cases are reported

Many systems are designed to only use information from positive test results submitted by trusted health agencies to inform exposure risk. This process can be distributed in a secure way. For example, Covid-Watch and PACT propose using the concept of “permission numbers”. With this scheme, each testing authority generates a list of permission numbers that are distributed to health providers authorized to diagnose individuals. Each permission number is “use once”: it is used to authorize the upload of information from one diagnosed individual. And permission numbers are generated in a way to make them nearly impossible to guess, keeping the system secure from unauthorized data uploads.

Other systems may allow users to self-report symptoms.

Trade-offs: Usefulness

Including self-reported data can enable a system to more quickly scale its collection of data, without the bottlenecks of hospital visits and limited access to certified test results. This could allow a system to be more useful, and more quickly, to more users. Or this option could degrade the quality and integrity of the system’s data, as people may misdiagnose their own illness, or share low quality or false data, either unintentionally or intentionally. This could harm the accuracy of the system as well as users’ trust in the system, making it less useful.

reporting through authorized health agencies vs self-reporting — *A system may only allow test results reported from authorized health agencies (left), or allow self-reporting (right), or both.*

There is a middle ground, and systems that allow self-reported data can interoperate with systems that only use data shared by authorized healthcare providers. For example, Covid-Watch and CoEpi plan to use the same protocol to allow data sharing across their system users but CoEpi allows self-reporting while Covid-Watch does not. Metadata can be connected to data points indicating whether they came from a self-report or a trusted health provider. Different applications can then choose to treat this data differently. For example, an app could ignore self-reported data or use it as a weaker indicator in its assessment of exposure risk than data submitted by health providers. A system that successfully leverages self-reported data in combination with official test results could be most useful.

How exposure risk is assessed and how it impacts users

Systems can differ in how they assess exposure risk and present this information in apps for their users, or how they use risk assessments to limit the mobility of their users.

For example, China’s AliPay Health Code app shows users their assessed risk levels with color codes. The colors and associated QR codes are used to limit and further track the mobility of the app’s users, making it very useful for the government to manage the health crisis, but limits the freedom and privacy of citizens. Another issue is transparency. The color code is not very informative to its users who are not told how their risk level was determined, and presents an issue of fairness because they cannot contest it.

Other apps could tell a user the estimated amount of times they came in contact with infected people, or their estimated total contact duration, and evaluate exposure risk based on these numbers in a more transparent way. Apps might even show a user when or where contact with infected people occurred. As previously described, this can make the app more informative for users, but it can also present privacy issues by potentially exposing the identities of infected people to their contacts.

Other systems may incorporate more personalized information and AI into their risk calculation. For example, the MILA group developing a contact tracing system for the Canadian government plans to use machine learning and symptoms reported by users in combination with contact information in order to more intelligently estimate personalized risk levels for users.

Contact tracing apps can assess risk, present information to users, and impact users’ lives in a variety of ways. For example, they might show when and where points of contact with diagnosed individuals occurred (left). Or they might provide estimated measures of exposure risk (center). Or they might provide QR codes that users must scan to enter certain places or travel (right).

All contact tracing systems will have limited accuracy, due to the limitations of technology and the complexity of human interactions, and they should be careful in handling potential false positives or false negatives. Reporting false positives can be harmful for users who might then go to a hospital to seek a test, or who are wrongfully directed to quarantine. Similarly, false negatives will also be an issue. These can occur when points of contact are missed either because people do not consistently carry a mobile device or use an app, or because the system is not sensitive enough. If apps give users a false sense of security when false negatives occur, users may then expose themselves or others to risk.

There is then a trade-off between providing users with sufficiently detailed information to demonstrate a level of confidence needed to make recommendations for testing or quarantine, versus providing less precise indicators of exposure risk to hedge against wrongfully reporting information.

Risks and questions that go beyond contact tracing

For any contact tracing technologies, we have to wonder how useful they can really be, and how to even measure whether they are working. We also have to wonder what their deployments will mean for privacy and freedom in both the immediate and distant futures. Even the research proposals that use Bluetooth protocols and decentralized designs to best protect users’ privacy from central authorities create new ways for people to be tracked. Can the potential benefits of contact tracing technologies be worth their trade-offs?

Even if the technology for these contact tracing systems could work with high accuracy, we must question whether they could provide a solution to the current epidemic. Suppose there are far more asymptomatic cases than confirmed cases, is the tracing of only those who test positive even useful? The most accurate systems will likely require adoption of a new app to use Bluetooth. Researchers have estimated that over a majority of the population would need to use such an app for it to be useful, but only about 1 in 5 people in Singapore use their government’s TraceTogether app, which is voluntary. Can we expect enough people to opt in to such a system, or will governments need to enforce or otherwise incentivize its use?

Moreover, these technologies can only be useful if the people they notify about potential exposure risk are able to get tested, get treatment, or self-isolate. Will these options be made available and affordable for enough of the population?

We presented you with alternative methods and trade-offs to consider when building contact tracing technologies, but ultimately how these systems are built and used will rest on the consideration of societal questions about privacy and freedom and access to health services.

If we do not think about these questions as a society and intentionally design the technologies and policies to address them, the decision might be made for us. Consider the China AliPay Health Code app’s use of QR codes. We can imagine a future where presenting an app that shows a low exposure risk or bill of good health becomes necessary to board a train or airplane, or enter a building or place of work. Then even systems that were designed as opt-in may become effectively required.

Even without these questions resolved, the development of contact tracing technologies is underway, and their development may go beyond their initial use cases for COVID-19.

Technology systems that are used to detect individuals at risk of infection in order to target quarantines, testing, and treatment, can also be used to detect individuals who likely developed immunity. Individuals with sustained exposure risk but who never developed symptoms themselves, as well as those who recovered from the illness, may have developed immunity. This knowledge can be used to help governments issue “immunity cards” or “immunity passports” in order to control a safe re-entry of a workforce and help their economies continue functioning. Safely allowing people to work again in this way can be extremely useful to governments and vulnerable communities most impacted by this pandemic, but once again presents trade-offs. In the same ways that contact tracing technologies can limit personal freedoms through dictated quarantines, immunity cards could further limit freedoms in how we work or move, and exacerbate the inequities already present in our societies.

These technologies are being built for our present health emergency, but as we weigh their different methods and trade-offs, we must consider that what we build can last beyond a time of crisis.

Who we are: The City Science group at the MIT Media Lab strives to enable more livable, equitable, and resilient communities. We propose that transnational problems such as climate change and public health are best addressed in cities, one community at a time. In addition to our main research themes, we also strive to understand current scenarios during the COVID-19 health crisis and offer solutions and findings.

Alex Berke is a PhD candidate in the City Science group. She is a creative computer scientist, civic hacker, and technology architect, with degrees in mathematics and computer science.

Kent Larson is the Director of the City Science group. His research focuses on developing urban interventions with a focus on mobility solutions, live/work spaces, and data-enabled tools for decision making. He also leads the City Science Network, an international group of living labs creating new strategies for future communities.

Appendix

In what follows we explain the concepts behind the more privacy-preserving Bluetooth protocols, such as those from PACT, DP-3T, and Apple/Google, to help you understand how they work, but also why they are imperfect. They are designed to help keep the identities of diagnosed users who share their data secret, as well as help protect them from being tracked across the different places they went.

These protocols use similar ideas but differ in their specifics and terminology. We’ll make this explanation more consistent with the Apple/Google framework, since it will likely be used in the apps soon to be built.

At a high level:

Users’ devices broadcast random-looking IDs via Bluetooth Low Energy signals. They also receive the IDs broadcast by other devices and record these IDs along with the time at which they were received. Call these IDs “Rolling Proximity Identifiers’’, or RPIs for short.

When a user is diagnosed as infected, they share information about the RPIs they recently broadcast (from whatever period is determined to be medically relevant) to a “diagnosis server”. Other users’ apps can then periodically check if any of the RPIs they recently received match against the diagnosis server’s data. A match indicates they came into contact with someone who was later diagnosed and their app can notify them of their exposure risk.

However, diagnosed users do not simply upload their broadcast RPIs to the “diagnosis server”. Instead they upload the parameters that generated each RPI.

In computer science and cryptography, one-way and pseudorandom functions (PRFs) are commonly used to hide secrets. Given such a function and its input, it is easy to compute the output. But it is considered computationally infeasible to reverse the function and find its input given its output. A PRF is used to generate the RPIs. Each RPI broadcast by a user’s device is the output of a PRF that uses a key that only the device knows, k, and time, t, as inputs.

RPI ← PRF(k, t)

Each RPI is broadcast by a device for only a temporary amount of time, after which point a new RPI is computed using the new current time as input, and then broadcast instead. Changing the RPI in this way makes it more difficult to track devices or re-identify people who anonymously shared their data to the diagnosis server (but as we’ll see, this is still possible).

What a diagnosed user device shares to a diagnosis server is the sets of inputs (k, t) that were used to generate the RPIs it broadcast. Another user’s device can then use these inputs along with the PRF to recompute the RPIs, and check if any of these RPIs and corresponding time inputs match against the received RPIs and times that the device stored locally.

You might ask: Why do it this way? Why not just have users share their RPIs? The reason is for the integrity and security of the system.

Suppose a malicious user wanted to generate false alarms or otherwise create distrust in the system. They might rebroadcast RPIs that were recently uploaded to the diagnosis server. They also might continuously rebroadcast as many RPIs as possible that they received from other users (these malicious behaviors are referred to as “replay attacks”). Other users would then receive the rebroadcast RPIs from the malicious user. If only the RPIs of diagnosed users were uploaded to the diagnosis server rather than their (k, t) inputs, then these users could be falsely notified that they were in contact with diagnosed users. However, this potential integrity attack is prevented because devices can check that the time they received the RPIs match against the corresponding (k, t) inputs that the diagnosis server stores.

This protocol also protects users from being framed. Suppose that when a user is diagnosed, instead of sharing information for their own RPIs to the diagnosis server, they attempt to dishonestly share RPIs that were broadcast by another user. However, since they must share the (k, t) inputs used to generate the RPIs, this requires knowledge of the other user’s key. Since other users can keep their keys secret until they are diagnosed and choose to share them, this attack is prevented.

These kinds of attacks that the protocol was designed to prevent may seem far-fetched, but keep in mind that the protocol must reliably work without anyone knowing whose data is truly whose and without depending on a central authority to intervene or prevent misuse. Clever use of cryptography and protocols are then necessary to ensure trust in the system.

This was meant as a high level overview of how these Bluetooth protocols will work to improve the privacy and security of contact tracing systems. There are many more details in how keys and RPIs are generated. For the details of the Apple and Google framework, refer to their specifications (https://www.apple.com/covid19/contacttracing/).

Apple/Google Bluetooth protocol — https://www.apple.com/covid19/contacttracing/

For example, the key used by a single device to produce RPIs periodically changes. This way a user’s RPIs will be associated with different sets of keys. This is done so that when they anonymously share their (k, t) data to the diagnosis server, it will be more difficult to link their data points together, making it more difficult to track them across the locations they visited, and more difficult to re-identify them. However, the protocol is still imperfect and cannot guarantee this type of privacy for its users.

For example, if a diagnosed user shares data for an RPI they broadcast that a contact received while receiving no other RPIs, then this contact can easily re-identify them. We can also imagine a future where beacons that listen for Bluetooth signals are present throughout our environment. (This might be done for a variety of reasons, such as improving contact tracing, or tracking customers in stores and elsewhere to better advertise products.) Users’ RPIs could then be recorded throughout the places they go and linked back together once shared, creating a record of their location histories.

Researchers have proposed using mix networks or private set intersection protocols to mitigate these privacy and security issues. Others have considered reversing the above scheme so that instead of users uploading data for their own broadcast RPIs, they share the RPIs they have received from others (see section 4.1 of this paper for the “dual approach”). However each of these proposals are imperfect.

Tracking people is central to the concept of contact tracing. The newly developed protocols have found clever ways to minimize people’s loss of privacy while they are tracked, but for now some privacy may need to be forfeited for contact tracing to be effective.