For context, this was a research paper written my senior year of high school.
Ever since I began diving into the world of programming and technology, I’ve come across many different positions regarding the ethics of software. Many prominent figures in the computer science community – including Linus Torvalds, the creator of Linux, and Richard Stallman, the creator of GNU – support open-source software. Large tech companies such as Meta (formerly Facebook), ByteDance (creator of TikTok), and Microsoft follow a very different philosophy, one focused on profit and expansion. Their software is proprietary – meaning their code is kept secret – in order to prevent so-called “intellectual property theft” and hide tracking software that if fully exposed would certainly alienate their respective user bases. This tracking of user data is a standard practice among large tech companies, even as some such as Apple, Google, and Microsoft have begun open-sourcing many pieces of their software. I am learning about technological privacy because I want to find out the extent to which internet users are being taken advantage of by large tech companies to help my reader(s) understand how and why to protect their personal data.
Large tech companies often monetize their software by charging a usage
fee and/or collecting and selling data for advertising purposes.
Tracking user data for marketing purposes isn’t a recent phenomenon.
Before the internet, shop owners might watch to see what items people
buy or how much money they spend. Since the information is collected in
the aggregate and no personal information is recorded, the customers
remain anonymous and thus are comfortable with said collection of
information. Online, however, “No longer are [customer] shopping
behaviors available only in the aggregate. Instead, individuals are
tracked, and information is collected from purchasing transactions as
they surf through Web sites” (Caudill, Murphy). The anonymity has been
removed from the process, and with it the bare minimum privacy we should
expect when it comes to technology.
As time has progressed, the extent to which large companies collect data
on their users has become increasingly worrying. TikTok, the popular
social media app, began collecting biometric data such as faceprints and
fingerprints in 2021 (McCluskey). Since TikTok’s privacy policy is very
vague, its intentions are impossible to decipher. At worst, some worry
that TikTok may use such data for “‘...mass re-identification and
surveillance,’” (McCluskey). The potential for malpractice when it comes
to the handling of user data is ever-increasing, and as discussed
earlier, is already bad enough.
TikTok is not alone in its questionable pursuit of data. As Android has
become more accessible to developers, “certain apps have crossed the
line from merely displaying ads to pushing (or forcing) products to the
user, harvesting private data for future use (e.g. spam or other use),
and even extracting fraudulent revenues” (Erturk). This trend is so
unwavering that some experts predict that a separate market for privacy
will emerge (Rust, et al), allowing consumers to purchase certain
degrees of privacy.
The ethics behind technological privacy are more complicated as they
differ between cultures. These differences are highlighted in the
different approaches governments take to regulating user data
collection. The United States’s “privacy laws… [are] rooted in a
‘harms-prevention-based’ hodgepodge of privacy protections”, whereas the
European Union has taken a “broader ‘rights-based’ approach” (Bellamy).
Although ethical interpretations of current privacy issues differ among
people, there are repeating ideas and themes. For example, privacy
itself is considered valuable by most people, however, the extent to
which that privacy should be protected is up for debate.
Regardless of the approach you take as discussed above, there are
reasons to improve privacy and security that satisfies most of said
approaches. The simplest to consider is the question of “What could go
wrong?”, or more specifically, “What could the company do with my
data?”. Using TikTok as a continued example, “[TikTok] agreed to pay $92
million to settle a class-action lawsuit alleging that it violated
Illinois’ Biometric Information Privacy Act, the federal Video Privacy
Protection Act, and other consumer and privacy protection laws”
(McCluskey). Even with laws barring their actions, large corporations
have used and collected user data with malicious (or at least not noble)
intent. Given that there’s a demonstrably non-zero probability of large
tech companies using data unethically (when viewed through most ethical
lenses), it’s better to take a stricter approach to deal with these
companies.
Recently within the computer science community and technology scene in
general there has been a sort of open-source movement. Many companies
like Apple and Microsoft have open-sourced some or all of their software
in order to promote software quality and prove that no data collection
is occurring. In the same vein, Linux – the most popular fully open
source operating system kernel – has become more popular over the last
few years, its market share having increased ~.7% since January 2022 to
2.91% of desktops (“Largest Tech Companies by Market Cap”). Its appeal
is easy to understand, given that it’s very secure. Like with other
“open source application teams… [Linux is] able to fix flaws much faster
than both internally developed and commercial application teams” because
“open-source operating systems allow and encourage more people to work
against malware” (Erturk). The growing popularity of an operating system
famous for its privacy and security is indicative of the cultural shift
taking place in the world of technology.
With large tech companies competing directly with open-source projects
providing high-quality and privacy-oriented software, we are likely to
see large tech companies open-sourcing more of their software as people
turn towards established open-source projects as alternatives to more
invasive yet popular software. This would reveal the full extent to
which these companies are tracking their users and likely result in
either alienation or different data collection practices.
As discussed earlier, large tech companies often monetize their software
by charging a usage fee and/or collecting and selling data for
advertising purposes. For the most part, the latter appears to be more
popular and successful. Google and Meta – both of whom are in the top 8
largest tech companies by market cap – make the majority of their money
selling data from their free services (Google’s selection of free
software includes YouTube, Google Drive/Docs/Slides/Sheets/etc, Google
itself, Android, Gmail, and many others while Meta’s selection of
free software includes Facebook, Instagram, WhatsApp, and others).
According to Meta itself, they track and sell data in order to “provide,
improve and develop [our] services”. Most large tech companies claim
that data collection is used to improve their software using
user-generated metrics and make patching flaws easier by monitoring
usage and tracking errors as they progress. The money acquired via
selling the data for marketing purposes is then used to fund the
development of the software and grow the company. None of this is
dependent on data collection, however. As we’ve discussed earlier,
open-source projects (most of which do not collect user data) are more
secure and of higher quality than proprietary software because of direct
community involvement and more people contributing to the source code.
If large companies were to open source their projects and start charging
usage fees or use non-targeted advertising, they would still make a
profit and have a higher quality product.
Constant and personalized data collection is unnecessary in order to
ensure quality software. As we’ve established previously, open-source
software is often more secure and of higher quality. Most open-source
projects only take data from people who are willing to provide it in
order to patch an error or improve a specific component (thus by nature
not personalized). If tech companies were honestly trying to improve
their software, they’d only collect it during an error or with explicit
permission. The biggest reason that large tech companies collect data is
to turn a profit by selling it for marketing purposes. Even then,
non-personalized advertising is still profitable and used by some tech
companies like DuckDuckGo.
Given that data collection is unnecessary to create quality software and
not fully necessary to generate revenue, we must then ask ourselves what
else are these companies doing with our data. As discussed earlier, the
worst-case scenario is quite bad and the reality isn’t much better (at
least with TikTok). At worst, identification and subsequent surveillance
could lend power to dangerous people and governments. In reality,
fraudulent revenue streams and illegal and/or hyper-aggressive data
collection have been identified in many services. These problems are
rarely found in open-source programs, which fully expose their source
code to gather community insight and quell fears of data collection.
Open-source projects like Linux offer the best example of how easy and
beneficial it is to use open-source software. Linux is faster, more
secure, more customizable, and arguably more user-friendly than Windows
and/or MacOS, with “96.3 percent of the top 1 million web servers…
running Linux” (Vaughan-Nichols). The benefits of open-source extend
past privacy and into quality, making them excellent options for
consumers.
Switching to Linux and using open-source alternatives to everything is
possible, but for most a non-realistic answer due to the learning curve
and inconvenience. However, completely changing your technological
habits isn’t necessary to protect the majority of your data from being
misused. Basic things like using Firefox with Privacy Badger and HTTPS
Everywhere (web extensions available on your browser’s respective
extension store), disabling cookies where you can and not idling your
computer (letting it run instead of sleeping/hibernating or shutting it
down) are great ways to prevent as much data collection as you can
without causing a severe inconvenience (Ryabitsev et al). By protecting
our data and promoting open-source software, we can make technology more
private and serve our best interests.
Caudill, Eve M., and Patrick E. Murphy. “Consumer Online Privacy: Legal and Ethical Issues.” Journal of Public Policy & Marketing, vol. 19, no. 1, 2000, pp. 7–19. https://doi.org/10.1509/jppm.19.1.7.16951.
“Data Policy.” Facebook, https://www.facebook.com/about/privacy/previous.
“Desktop Operating System Market Share Worldwide.” StatCounter Global Stats, https://gs.statcounter.com/os-market-share/desktop/worldwide.
Erturk, Emre. "A case study in open source software security and privacy: Android adware." World Congress on Internet Security (WorldCIS-2012). IEEE, 2012.
“Largest Tech Companies by Market Cap.” CompaniesMarketCap.com - Companies Ranked by Market Capitalization, https://companiesmarketcap.com/tech/largest-tech-companies-by-market-cap/.
McCluskey, Megan. “What TikTok Could Do with 'Faceprints' and 'Voiceprints'.” Time, Time, 14 June 2021, https://time.com/6071773/tiktok-faceprints-voiceprints-privacy/.
Person, and Fredric D. Bellamy. “U.S. Data Privacy Laws to Enter New Era in 2023.” Reuters, Thomson Reuters, 12 Jan. 2023, https://www.reuters.com/legal/legalindustry/us-data-privacy-laws-enter-new-era-2023-2023-01-12/.
Rust, Roland T., et al. “The Customer Economics of Internet Privacy.” Journal of the Academy of Marketing Science, vol. 30, no. 4, 2002, pp. 455–464. https://doi.org/10.1177/009207002236917.
Ryabitsev, Konstantin, et al., editors. “Linux Workstation Security Checklist.” 15 Dec. 2017. https://github.com/lfit/itpol/blob/master/linux-workstation-security.md#linux-workstation-security-checklist.
Vaughan-Nichols, Steven. “Can the Internet Exist without Linux?” ZDNet, ZDNet, 15 Oct. 2015, https://www.zdnet.com/home-and-office/networking/can-the-internet-exist-without-linux/.