As businesses continue to seek to leverage big data for growth, marketing and investment purposes, the legality of scraping data from websites is increasingly being examined by regulators and courts. Several recent cases have highlighted the privacy, contract and copyright risks associated with this form of data collection, including the ongoing fall-out associated with Clearview AI, Inc. (Clearview) a firm that used screen scraping to harvest data on individuals’ faces posted online.
Earlier this year, privacy regulators across Canada found that Clearview had violated federal and provincial privacy laws by screen scraping images of people and selling facial recognition services to law enforcement. Parallel Canadian and American litigation against Clearview is ongoing.
The first Canadian privacy class action to result in court-awarded damages related to screen scraping data from public Instagram profiles.
Canadian and American courts are beginning to rule on contract- and copyright-based claims for use of data scraped from third-party websites, especially between industry competitors.
While financial risk can be mitigated through appropriate contractual indemnities, companies using scraped data sets are not immune from operational, reputational and regulatory consequences even where they did not perform the initial data collection.
What is screen scraping?
Screen scraping can be conducted in various ways and on different scales. In most cases software is used to automatically access, aggregate and collect relevant data from websites or mobile applications. The information and data that is collected, or “scraped”, may be readily available (i.e., through search engines) or may require access by clicking-through a website. In some cases, websites may specifically require parties accessing the website to first accept the website’s specific terms-of-use or terms-of-service.
Background: Clearview AI and the privacy regulators’ findings
Clearview AI is a U.S.-based company that built and sells facial recognition software services. Clearview uses screen scraping to collect images of faces from publicly accessible online sources (e.g., social media) and stored this information, along with metadata and Clearview’s own biometric identifiers, in a database. Law enforcement could use Clearview’s service by uploading an individual’s image and receiving a list of results containing all matching images and metadata. If a user clicks on any of these results, they are directed to the original source page of the image. Clearview ultimately collected a database of over three billion images of faces and corresponding biometric identifiers, including those of a vast number of individuals in Canada, including children.
Even if screen scraping does not collect personal information, organizations still risk running afoul of contractual and copyright law.
On February 2, 2021, the Office of the Privacy Commissioner of Canada (OPC) and provincial privacy commissioners from Alberta, British Columbia and Québec (collectively, the Regulators), released their findings with respect to Clearview’s collection, use and disclosure of personal information by means of its facial recognition tool. The Regulators found Clearview had failed to comply with federal and provincial privacy laws, concluding that:
Clearview engaged in the mass and indiscriminate scraping of images from millions of individuals across Canada, including children, amongst over 3 billion images scraped worldwide;
Clearview collected highly sensitive biometric information without Canadians’ knowledge or consent (and without any applicable exception); and
Clearview’s collection, use and disclosure of personal information was unreasonable, and “represents the mass identification and surveillance of individuals by a private entity in the course of commercial activity”.
The Regulators rejected Clearview’s assertion that, because the images in its database were scraped from publicly available sources, no consent was required for Clearview’s collection, use or disclosure of such personal information. The Regulators also noted that while it did not receive direct evidence on the topic, evidence that Clearview had violated website terms of service would have also supported a finding that Clearview’s actions were unreasonable.
On December 14, 2021, the provincial privacy regulators of Alberta, British Columbia and Québec, ordered Clearview to comply with recommendations to stop collecting and sharing images, recommendations that flowed from their joint OPC investigation.
Additional legal proceedings
In addition to this investigation, Clearview and organizations that used Clearview’s services have been engaged in legal proceedings related to screen scraping and its facial recognition tool, including:
a parallel investigation by the OPC that found the RCMP had breached the Privacy Act through its use of Clearview’s services;
a proposed class action brought against Clearview in Canada;
a lawsuit brought by Vermont’s Attorney General based on the state’s consumer protection law;
an ongoing class action in Illinois brought against Clearview based on the state’s Biometric Information Privacy Act;
the Swedish Police Authority was found by Sweden’s Data Protection Authority to have illegally processed personal data through its use of Clearview’s services; and
in November 2021, the United Kingdom’s Information Commissioner’s Office (ICO) announced a provisional intent to impose a potential fine of just over £17 million on Clearview. The ICO also issued a provisional notice ordering Clearview to stop further processing of the personal data of people in the UK and to delete it following alleged serious breaches of the UK’s data protection laws1.
Risks of screen scraping
Businesses that are considering using data or insights generated through screen scraping should be aware of the various legal risks associated with this technology, including where the data collection is done by a vendor. Legal risk arises from two common features of screen scraping programs. The first is the often-indiscriminate nature of screen scraping. Automated screen scraping tools have the potential to collect information off the internet regardless of jurisdiction, the content of the information, whether legal protections apply to the information, and terms and conditions of the source website.
The second risk factor is screen scraping’s inherently large scale. The utility of screen scraping is correlated with the amount of information collected from the internet, but each website visited can also multiply the magnitude of privacy, contract and copyright risk assumed, as well as the scraping company’s ability to trace the source of the data collected. This in turn increases the litigation, regulatory and reputational risks faced by the organization using the data.
As a default, Canadian private sector privacy legislation requires consent to collect personal information. Screen scraping is often a risky activity because its collection will typically be done without consent. The further use or disclosure of that personal information can incur further liability for failure to obtain consent. In addition, collection of personal information at a large scale may negatively influence privacy regulators’ assessments of whether the activity is reasonable (a statutory requirement), since the organization must be able to show that the specific personal information is necessary to accomplish a legitimate business objective.
The scale and indiscriminate nature of screen scraping may also expose organizations to heightened class action and regulatory risk across several jurisdictions, even if an organization does not “operate” in such jurisdiction. This was noted in the Clearview decision, where Canadian privacy regulators determined that Clearview was subject to Canadian privacy laws, despite the company’s headquarters being in the U.S. Companies using scraped data must assess the privacy risks of the various countries of residence of the individuals whose information is being collected, as well as the locations of the source websites and the governing law set out in those websites’ terms.
Data scraped from online platforms for user-generated content (i.e., social media websites) may be less likely to be subject to a valid claim of copyright by the website owner.
Note also that personal information publicly posted on websites is often still considered personal information under privacy laws. While this information might be colloquially considered public, the legislative exemption for “publicly available” information under PIPEDA (as well as in the now defunct Bill C-11) is narrow and does not include all information that is publicly accessible on the internet. Indeed, Clearview’s argument that the information it collected was publicly available was rejected by privacy regulators.
A British Columbia court recently awarded privacy tort damages of $10/person to a class of Canadian Instagram users whose information was scraped from their public profiles by a third-party company for marketing purposes. Given the size of the class, the damages award totaled approximately $25 million, even though there was no evidence that the users had suffered harm beyond bare violation of their rights to control the use of their personal information2.
Even if screen scraping does not collect personal information, organizations still risk running afoul of contractual and copyright law.
Websites typically have terms and conditions of use attached to them. These terms typically deem a user to have agreed to them when the user accesses the website. Website terms and conditions will often expressly prohibit screen scraping or other activities that a screen scraping program may engage in. The use of a screen scraping tool may contravene these terms and expose the user to a breach of contract claim. The user may also be agreeing to terms that are unfavourable to it.
The U.S. law on contractual scraping claims is developing more rapidly than in Canada, but remains unsettled. In September 2020, the Ninth Circuit of the U.S, Court of Appeals supported a lower court’s decision in HiQ v. LinkedIn, to grant a preliminary injunction to HiQ Labs, a data aggregator, against LinkedIn, enjoining LinkedIn from denying HiQ access to the public profiles of LinkedIn’s users. LinkedIn’s User Agreement prohibits data scraping or copying of users’ public profiles, though the Court’s decision notes that HiQ was no longer bound by the User Agreement since LinkedIn had terminated HiQ’s user status. At issue (in part) was whether the Computer Fraud and Abuse Act (CFAA) prohibited HiQ’s use of bots to scrape LinkedIn profiles. The Ninth Circuit determined that the CFAA did not prohibit HiQ’s screen scraping. LinkedIn subsequently filed, and was granted by the Supreme Court of the United States, a petition for certiorari. The Ninth Circuit’s judgment was vacated and the case was remanded back for further consideration in light of the Supreme Court’s decision in Van Buren v. United States. Notably, the Supreme Court in Van Buren left open whether the CFAA prohibited unauthorized access based only on an assessment of technological restrictions, or instead also looks to limits contained in contracts or policies.
Screen scraping can also run afoul of copyright law. For example, in Trader v. CarGurus3 , in the Ontario Superior Court of Justice held that the defendant’s screen scraping had infringed the plaintiff’s copyright. The defendant was engaged in aggregation of data from car dealerships. Unbeknownst to the defendant, the scraped images of cars for sale included images that were sourced from the website of the defendant’s competitor. Because the images were photographed by the competitor’s photographer, they were protected by copyright. Therefore, the Court found that the defendant had infringed upon its competitor’s copyright when the defendant had scraped the images and posted them on its own website.
In the U.S., two main statutes have been implicated in screen scraping cases: the Copyright Act of 1976 and the Digital Millennium Copyright Act of 1998. It should be noted that not all instances of data scraping will trigger copyright infringement laws. This may be largely dependent on the source of data; for example, data scraped from sites which act as platforms for user-generated content (i.e., social media websites) may be less likely to be subject to a valid claim of copyright by the website owner.
Implications for businesses and key takeaways
Businesses should carefully weigh the risks of screen scraping, regardless of whether the screen scraping is done internally or by a vendor.
Tailor data collection to minimize risk, such as targeting specific sites that have the desired information, and in the desired jurisdictions.
Prevent or limit the chances that the tool will collect higher-risk types of information, including personal information, or information subject to copyright.
Review website terms to determine if scraping is permitted, and negotiate data sharing agreements with website owners.
Consider the likelihood that website owners will object to the screen scraping, and what remedies—if any—they may be entitled to.
Obtain confirmation from the data aggregator or source website that the collection is permitted under all applicable laws, and seek indemnities for claims alleging unauthorized collection.
The ICO’s provisional fine and orders arose from the ICO and Office of the Australian Information Commissioner’s joint Clearview investigation, which focused on Clearview AI Inc’s use of images, data scraped from the internet and the use of biometrics for facial recognition. For more, see the ICO’s announcement here.
To discuss these issues, please contact the author(s).
This publication is a general discussion of certain legal and related developments and should not be relied upon as legal advice. If you require legal advice, we would be pleased to discuss the issues in this publication with you, in the context of your particular circumstances.
For permission to republish this or any other publication, contact Janelle Weed.