Technological Privacy Research Paper

For context, this was a research paper written my senior year of high school.

Introduction

Ever since I began diving into the world of programming and technology, I’ve come across many different positions regarding the ethics of software. Many prominent figures in the computer science community – including Linus Torvalds, the creator of Linux, and Richard Stallman, the creator of GNU – support open-source software. Large tech companies such as Meta (formerly Facebook), ByteDance (creator of TikTok), and Microsoft follow a very different philosophy, one focused on profit and expansion. Their software is proprietary – meaning their code is kept secret – in order to prevent so-called “intellectual property theft” and hide tracking software that if fully exposed would certainly alienate their respective user bases. This tracking of user data is a standard practice among large tech companies, even as some such as Apple, Google, and Microsoft have begun open-sourcing many pieces of their software. I am learning about technological privacy because I want to find out the extent to which internet users are being taken advantage of by large tech companies to help my reader(s) understand how and why to protect their personal data.

Privacy and Tech Companies

Large tech companies often monetize their software by charging a usage fee and/or collecting and selling data for advertising purposes. Tracking user data for marketing purposes isn’t a recent phenomenon. Before the internet, shop owners might watch to see what items people buy or how much money they spend. Since the information is collected in the aggregate and no personal information is recorded, the customers remain anonymous and thus are comfortable with said collection of information. Online, however, “No longer are [customer] shopping behaviors available only in the aggregate. Instead, individuals are tracked, and information is collected from purchasing transactions as they surf through Web sites” (Caudill, Murphy). The anonymity has been removed from the process, and with it the bare minimum privacy we should expect when it comes to technology.

As time has progressed, the extent to which large companies collect data on their users has become increasingly worrying. TikTok, the popular social media app, began collecting biometric data such as faceprints and fingerprints in 2021 (McCluskey). Since TikTok’s privacy policy is very vague, its intentions are impossible to decipher. At worst, some worry that TikTok may use such data for “‘...mass re-identification and surveillance,’” (McCluskey). The potential for malpractice when it comes to the handling of user data is ever-increasing, and as discussed earlier, is already bad enough.

TikTok is not alone in its questionable pursuit of data. As Android has become more accessible to developers, “certain apps have crossed the line from merely displaying ads to pushing (or forcing) products to the user, harvesting private data for future use (e.g. spam or other use), and even extracting fraudulent revenues” (Erturk). This trend is so unwavering that some experts predict that a separate market for privacy will emerge (Rust, et al), allowing consumers to purchase certain degrees of privacy.

Ethics and Technological Privacy

The ethics behind technological privacy are more complicated as they differ between cultures. These differences are highlighted in the different approaches governments take to regulating user data collection. The United States’s “privacy laws… [are] rooted in a ‘harms-prevention-based’ hodgepodge of privacy protections”, whereas the European Union has taken a “broader ‘rights-based’ approach” (Bellamy). Although ethical interpretations of current privacy issues differ among people, there are repeating ideas and themes. For example, privacy itself is considered valuable by most people, however, the extent to which that privacy should be protected is up for debate.

Regardless of the approach you take as discussed above, there are reasons to improve privacy and security that satisfies most of said approaches. The simplest to consider is the question of “What could go wrong?”, or more specifically, “What could the company do with my data?”. Using TikTok as a continued example, “[TikTok] agreed to pay $92 million to settle a class-action lawsuit alleging that it violated Illinois’ Biometric Information Privacy Act, the federal Video Privacy Protection Act, and other consumer and privacy protection laws” (McCluskey). Even with laws barring their actions, large corporations have used and collected user data with malicious (or at least not noble) intent. Given that there’s a demonstrably non-zero probability of large tech companies using data unethically (when viewed through most ethical lenses), it’s better to take a stricter approach to deal with these companies.

The Open Source Movement

Recently within the computer science community and technology scene in general there has been a sort of open-source movement. Many companies like Apple and Microsoft have open-sourced some or all of their software in order to promote software quality and prove that no data collection is occurring. In the same vein, Linux – the most popular fully open source operating system kernel – has become more popular over the last few years, its market share having increased ~.7% since January 2022 to 2.91% of desktops (“Largest Tech Companies by Market Cap”). Its appeal is easy to understand, given that it’s very secure. Like with other “open source application teams… [Linux is] able to fix flaws much faster than both internally developed and commercial application teams” because “open-source operating systems allow and encourage more people to work against malware” (Erturk). The growing popularity of an operating system famous for its privacy and security is indicative of the cultural shift taking place in the world of technology.

With large tech companies competing directly with open-source projects providing high-quality and privacy-oriented software, we are likely to see large tech companies open-sourcing more of their software as people turn towards established open-source projects as alternatives to more invasive yet popular software. This would reveal the full extent to which these companies are tracking their users and likely result in either alienation or different data collection practices.

Why Large Tech Companies Collect Data

As discussed earlier, large tech companies often monetize their software by charging a usage fee and/or collecting and selling data for advertising purposes. For the most part, the latter appears to be more popular and successful. Google and Meta – both of whom are in the top 8 largest tech companies by market cap – make the majority of their money selling data from their free services (Google’s selection of free software includes YouTube, Google Drive/Docs/Slides/Sheets/etc, Google itself, Android, Gmail, and many others while Meta’s selection of free software includes Facebook, Instagram, WhatsApp, and others).

According to Meta itself, they track and sell data in order to “provide, improve and develop [our] services”. Most large tech companies claim that data collection is used to improve their software using user-generated metrics and make patching flaws easier by monitoring usage and tracking errors as they progress. The money acquired via selling the data for marketing purposes is then used to fund the development of the software and grow the company. None of this is dependent on data collection, however. As we’ve discussed earlier, open-source projects (most of which do not collect user data) are more secure and of higher quality than proprietary software because of direct community involvement and more people contributing to the source code. If large companies were to open source their projects and start charging usage fees or use non-targeted advertising, they would still make a profit and have a higher quality product.

Conclusion

Constant and personalized data collection is unnecessary in order to ensure quality software. As we’ve established previously, open-source software is often more secure and of higher quality. Most open-source projects only take data from people who are willing to provide it in order to patch an error or improve a specific component (thus by nature not personalized). If tech companies were honestly trying to improve their software, they’d only collect it during an error or with explicit permission. The biggest reason that large tech companies collect data is to turn a profit by selling it for marketing purposes. Even then, non-personalized advertising is still profitable and used by some tech companies like DuckDuckGo.

Given that data collection is unnecessary to create quality software and not fully necessary to generate revenue, we must then ask ourselves what else are these companies doing with our data. As discussed earlier, the worst-case scenario is quite bad and the reality isn’t much better (at least with TikTok). At worst, identification and subsequent surveillance could lend power to dangerous people and governments. In reality, fraudulent revenue streams and illegal and/or hyper-aggressive data collection have been identified in many services. These problems are rarely found in open-source programs, which fully expose their source code to gather community insight and quell fears of data collection. Open-source projects like Linux offer the best example of how easy and beneficial it is to use open-source software. Linux is faster, more secure, more customizable, and arguably more user-friendly than Windows and/or MacOS, with “96.3 percent of the top 1 million web servers… running Linux” (Vaughan-Nichols). The benefits of open-source extend past privacy and into quality, making them excellent options for consumers.

Switching to Linux and using open-source alternatives to everything is possible, but for most a non-realistic answer due to the learning curve and inconvenience. However, completely changing your technological habits isn’t necessary to protect the majority of your data from being misused. Basic things like using Firefox with Privacy Badger and HTTPS Everywhere (web extensions available on your browser’s respective extension store), disabling cookies where you can and not idling your computer (letting it run instead of sleeping/hibernating or shutting it down) are great ways to prevent as much data collection as you can without causing a severe inconvenience (Ryabitsev et al). By protecting our data and promoting open-source software, we can make technology more private and serve our best interests.

Works Cited

Caudill, Eve M., and Patrick E. Murphy. “Consumer Online Privacy: Legal and Ethical Issues.” Journal of Public Policy & Marketing, vol. 19, no. 1, 2000, pp. 7–19. https://doi.org/10.1509/jppm.19.1.7.16951.

“Data Policy.” Facebook, https://www.facebook.com/about/privacy/previous.

“Desktop Operating System Market Share Worldwide.” StatCounter Global Stats, https://gs.statcounter.com/os-market-share/desktop/worldwide.

Erturk, Emre. "A case study in open source software security and privacy: Android adware." World Congress on Internet Security (WorldCIS-2012). IEEE, 2012.

“Largest Tech Companies by Market Cap.” CompaniesMarketCap.com - Companies Ranked by Market Capitalization, https://companiesmarketcap.com/tech/largest-tech-companies-by-market-cap/.

McCluskey, Megan. “What TikTok Could Do with 'Faceprints' and 'Voiceprints'.” Time, Time, 14 June 2021, https://time.com/6071773/tiktok-faceprints-voiceprints-privacy/.

Person, and Fredric D. Bellamy. “U.S. Data Privacy Laws to Enter New Era in 2023.” Reuters, Thomson Reuters, 12 Jan. 2023, https://www.reuters.com/legal/legalindustry/us-data-privacy-laws-enter-new-era-2023-2023-01-12/.

Rust, Roland T., et al. “The Customer Economics of Internet Privacy.” Journal of the Academy of Marketing Science, vol. 30, no. 4, 2002, pp. 455–464. https://doi.org/10.1177/009207002236917.

Ryabitsev, Konstantin, et al., editors. “Linux Workstation Security Checklist.” 15 Dec. 2017. https://github.com/lfit/itpol/blob/master/linux-workstation-security.md#linux-workstation-security-checklist.

Vaughan-Nichols, Steven. “Can the Internet Exist without Linux?” ZDNet, ZDNet, 15 Oct. 2015, https://www.zdnet.com/home-and-office/networking/can-the-internet-exist-without-linux/.