GitHub: Updating the Citizen Lab test lists

Citizen Lab test list repo

This guide is meant for GitHub users and provides information related to updating the Citizen Lab test lists for website censorship testing. If you are not a GitHub user, please refer to our Test Lists Editor.

By contributing to the Citizen Lab test lists, you can support website censorship testing by OONI Probe users around the world.

Before getting started, please refer to our documentation to learn all about test lists.

About test lists

Test lists are machine-readable CSV files that include URLs that are tested for censorship.

Censorship measurement projects like OONI rely on a global community of volunteers who run censorship detection tests from local vantage points. In light of bandwidth constraints, testing most websites available on the internet is not practical (nor possible in many cases). Instead, our measurements focus on a sample of websites provided in “test lists”: machine-readable CSV files with a set of curated, interesting domains.

There are two types of test lists:

To maximize the breadth of coverage while reducing research bias, test list URLs are categorized based on 30 diverse categories.

We encourage you to learn more here.

Reviewing test lists

All test lists that OONI Probe is designed to test for censorship are hosted in the Citizen Lab Test List repository on GitHub.

All test lists are saved as CSV files. The CSV files of country-specific test lists are saved based on country codes. For example, if you would like to review the test list for Azerbaijan, open the az.csv file. The global test list (which includes internationally-relevant websites that are tested by all OONI Probe users globally) is saved as global.csv.

Step 1. Find the CSV file which is specific to the country list (e.g. az.csv for Azerbaijan) that you would like to review/update here.

If you don’t find a CSV file for a country, that’s probably because it doesn’t exist yet. In this case, please refer to the next section on “Creating new test lists”.

As part of reviewing a test list, you can:

Adding websites

Step 1. Add new URLs to the CSV file under the url column.

Some criteria for adding new URLs can include the following:

For further criteria, please view the URL categories here.

Please try to add URLs which fall under as many (if not all) of these categories as possible.

Important:

Step 2. Every time you add a URL, please add the following in the CSV file for each new URL:

Step 3. Once you have added all new URL entries, open a pull request on GitHub.

We may provide feedback through the comments section of your pull request. Once the feedback is addressed and your pull request is merged, your recently added URLs will automatically get prioritized for OONI Probe testing.

Editing existing entries

As many URLs were added to the test lists many years ago (and the status of websites constantly changes), there is ongoing need to review existing test list entries to check:

Upon reviewing a test list, you can:

Step 1. Update URLs by replacing existing URLs with updated versions. For example, this may involve updating a URL to HTTPS (e.g. replacing http://www.facebook.com with https://www.facebook.com), or updating a URL if its domain has changed (and the new domain is still relevant for testing).

Step 2. Change the category codes and descriptions for URLs (included under the category_code and category_description columns of the CSV file) only if you think that those URLs have been allocated to wrong category codes and descriptions. In this case, please replace the category codes and descriptions with ones (from the standardized categories) that you think are more suitable. We would also appreciate a comment in your pull request explaining the proposed changes.

Step 3. Add notes (or edit existing notes) in the notes section of URLs to share relevant context.

Step 4. Once you have completed your edits, open a pull request on GitHub.

We may provide feedback through the comments section of your pull request.

Deleting existing entries

There is occassionally the need to delete existing test list entries when websites are no longer operational or relevant (for example, when a domain has expired, is squatted, or parked).

In such cases, you can delete a test list entry and mention why you propose deletion in your pull request on GitHub.

We may provide feedback through the comments section of your pull request.

Creating new test lists

If you can’t find a test list specific to a country here, then it probably does not exist yet. Please help us create a test list for that country through the steps below:

Step 1. Create a CSV file and name it based on an ISO-3166 two-letter country code which is specific to the country that URLs are being added for. You can find a reference for international standards for country codes here. An example would include a CSV file created for Andora, named ad.csv.

Step 2. Include the following columns in the newly created CSV file:

Step 3. Add URLs under the url column of the CSV file.

Some criteria for adding new URLs can include the following:

For further criteria, please view URL categories here.

Important:

Step 4. Every time you add a URL, please add the following in the CSV file for each new URL:

Step 5. Once you have created a new test list based on the above, open a pull request on GitHub.

We may provide feedback through the comments section of your pull request. Once the feedback is addressed and your pull request is merged, your recently added URLs will automatically get prioritized for OONI Probe testing.

Important tips

  1. Always include the full URL, including the HTTP or HTTPS prefix, exactly as it appears when you type it into a browser. If you include example.com in a test list, OONI Probe won’t be able to test it. Rather, it should be included as http://www.example.com, if that is what it looks like in a browser.

  2. Always use the format described in the sections above. The test lists are meant to be machine-readable, and OONI Probe will not parse test lists that don’t strictly follow the prescribed format.

  3. Please use the categories provided here and refrain from adding your own categories. The categories may not be perfect, and we welcome your suggestions for additional/alternative categories. But if you don’t use the prescribed category codes, OONI Probe will not be able to test those URLs, since test lists are meant to be machine-readable.

  4. Please do not scrape and add “the top 1,000 Alexa sites”. Community contributions are more useful when they include URLs that (a) fall under these 30 diverse categories and (b) reflect local insight. Given that many OONI Probe users around the world have bandwidth constraints, we favour quality over quantity in terms of what is tested.

Thanks for contributing!