Investigating TLS blocking in India

This report investigates Transport Layer Security (TLS)-based blocking in India. Previous research by the Centre for Internet & Society, India (CIS) has already exposed TLS blocking based on the value of the SNI field. OONI has also implemented and started testing SNI-based TLS blocking measurements.

Recently, the Magma Project documented cases where CIS India and OONI’s methodologies could be improved. They specifically found that blocking sometimes appears to depend not only on the value of the SNI field but also on the address of the web server being used. These findings were later confirmed by OONI measurements in Spain and Iran through the use of an extended measurement methodology.

We were therefore curious to see whether such an extended methodology would discover further cases of TLS blocking in India. To answer this research question we ran experiments on the networks of three popular Indian Internet Service Providers (ISPs) (ACT Fibernet, Bharti Airtel, and Reliance Jio) which account for over 70% of the internet subscribers in India.

We recorded SNI-based blocking on both Bharti Airtel and Reliance Jio. We also discovered that Reliance Jio blocks TLS traffic not just based on the SNI value, but also on the web server involved with the TLS handshake. Moreover, we noticed that ACT Fibernet’s DNS resolver directs users towards servers owned by ACT Fibernet itself. Such servers caused the TLS handshake to fail, but the root cause of censorship was the DNS.

We also document that one of the endpoints we tested, collegehumor.com:443, does not allow establishing TCP connection from several vantage points and control measurements. Yet, in Reliance Jio, we see cases where the connections to such endpoints complete successfully and a timeout occurs during the TLS handshake. We believe this is caused by some kind of proxy that terminates the TCP connection and performs the TLS handshake.

Index

TLS Blocking Measurements

Web Censorship in India

Aladdin: our Experimental Implementation

Description of the Experiments

Results Analysis & Discussion

Conclusion & Future Work

TLS Blocking Measurements

Transport Layer Security (TLS) is a cryptographic protocol that provides communication with end-to-end security with guarantees of confidentiality and authenticity, which is popularly used for encrypting web traffic as done in HTTPS. The Server Name Indication (SNI), defined first in RFC 6066, is an extension to TLS that facilitates multiplexing, i.e. the hosting of multiple HTTPS websites on the same server. In other words, the SNI gives content providers the opportunity to host a variety of websites under the same IP address. For example, the 216.58.209.36 IP (belonging to Google) allows accessing both www.google.com and www.youtube.com using HTTPS, depending on the SNI being used. When a client wants to establish a secure connection, it fills in the SNI with the hostname of the website it wants to connect to.

Unfortunately, the SNI travels on the network in cleartext, even though there are experimental efforts to work around this technical limitation. Thus, network operators can use deep packet inspection to track the websites someone is visiting, and also to filter traffic based on the SNI. The use of SNI-based blocking filtering in state-directed web censorship is being increasingly recorded. In 2019, the use of SNI filtering was documented in China and South Korea; OONI reported on the SNI based filtering of Wikipedia in Venezuela and China, as well as of Facebook live-streaming in Jordan; and CIS has earlier documented the use of this technique by Reliance Jio, the most popular ISP in India.

Therefore, researchers in the internet freedom community have started proposing and implementing techniques to measure SNI-based blocking. As part of their research on the blocking practices of Indian ISPs, CIS proposed a methodology to detect SNI blocking. Around the same time, researchers at Jigsaw proposed improvements in detecting domain blocking, which also included SNI blocking measurements. OONI later implemented and successfully tested on the field SNI blocking measurements based on Jigsaw’s methodology.

At their core, these methodologies detect SNI-based blocking by connecting to an unrelated host that is not blocked (e.g. example.com), and checking whether it is possible to successfully complete a TLS handshake even if the SNI is filled with a hostname that is potentially blocked (e.g. pornhub.com in India). The rationale of this technique is to measure whether there is a specific filtering rule in the network blocking of a given SNI.

A recent report published by the Magma project, however, shows that there are other ways of blocking TLS that are not detected by this measurement methodology. In particular, they showed that TLS connections to www.womenonweb.org were being blocked, but the SNI blocking measurement methodology did not detect it. TLS blocking, in fact, only occurred when the SNI was equal to www.womenonweb.org and the IP address was the one of www.womenonweb.org.

Thus, OONI wrote an experimental implementation based on the new Go engine that was performing two experiments. The first experiment connected to www.example.org using the www.womenonweb.org SNI. The second experiment, instead, connected directly to the IP address used by www.womenonweb.org. The results confirmed the findings of the Magma project’s blog post, and sparked additional curiosity on whether using the same methodology in other contexts (e.g. India) could reveal more forms of blocking. A measurement campaign run by OONI while we were researching this report documented cases of TLS blocking solely based on the endpoint being used for DNS over TLS connections in Iran. Specifically, OONI found cases where the TLS handshake with 1.1.1.1:853 was blocked regardless of the SNI.

Web Censorship in India

India has a decentralised model of web censorship, where state authorities order Internet Service Providers (ISPs) to block certain websites for their users. State authorities draw these powers from Section 69A and Section 79 of the Information Technology (IT) Act. Since there are no technical specifications given by the government, each ISP is at the liberty to adopt their own method of blocking websites. A recent study of censorship in the Indian state of Manipur using OONI data concluded, in this regard, that “website blocking within the country varies primarily from ISP to ISP, rather than from region to region”. Furthermore, regulations notified under Section 69A require ISPs to maintain confidentiality over certain website blocking orders.

Recent research at the Centre for Internet and Society (CIS) revealed how Indian ISPs are using a variety of techniques, including DNS-based blocking, HTTP host header inspection, and SNI-based filtering. In the absence of a publicly available official list of blocked hostnames, CIS India compiled a list of potentially blocked websites from (i) publicly-available government orders, (ii) court orders, and (iii) user reports from various sources. They devised network tests to identify the methods that different ISPs are using, and recorded how India’s most popular ISP, Reliance Jio (which serves 50% of Indian internet subscribers), is using SNI inspection for blocking websites. Out of the 4379 websites that the authors tested for, they found Jio to be censoring 2951 websites via SNI inspection.

In addition to the opaqueness surrounding the lists of websites being blocked, CIS India also found inconsistencies in the list of websites being blocked by each ISP. Furthermore, only some of the ISPs explicitly relayed a censorship notice to its users. Simply put, Indian internet users can have wildly different experiences of web censorship depending on their ISP.

Aladdin: our Experimental Implementation

Aladdin is a bash script that uses the new OONI Probe engine written in Go. Given an input domain (e.g. blocked.com), Aladdin performs a series of experiments loosely inspired by the domain-blocking measurement methodology proposed by Jigsaw. OONI wrote this script to collect data that could be useful in better understanding how to evolve its Web Connectivity nettest.

In this section, we only describe the experiments that are relevant to this report. This text describes the performed experiments at a functional level; the actual implementation may be different, typically for efficiency reasons. For further information, we encourage you to read the script source code and reach out with questions and feedback on OONI’s Slack channels.

Because Aladdin is based on the OONI Engine, all experiment results are submitted to the OONI collector and automatically published as part of OONI S3 buckets.

sni_check

The first experiment we discuss is called sni_check. It is similar to the OONI experiment called sni_blocking, except that sni_check does not check whether the helper website being used (example.org) is actually reachable. This is not an issue because we know it was reachable while we were running these manual experiments. The following diagram shows the interactions that occur when performing this experiment with blocked.com as its input.

Description of the sni\_check experiment

Figure 1: description of the sni_check experiment.

We use example.org as a test helper. The first step is to use Google’s DNS over HTTPS (DoH) resolver, to map example.org to its IP address. Once we know the IP address, we connect to this address on port 443 and we initiate a TLS handshake with blocked.com as the SNI. What happens next determines the result of the experiment.

If there is blocking, we expect the connection to just be closed (eof_error) or interrupted (connection_reset). A timeout (generic_timeout_error) also in general implies that there is interference. By repeating the experiment, we gain more confidence that such an error is not just a temporary disruption.

If there is no interference, the handshake completes. Because the web server for example.org does not handle blocked.com, the client code should emit the ssl_invalid_hostname error indicating that the server returned a certificate that is not valid for the requested SNI. In such a case, we can inspect the returned certificate to have further confidence that we are indeed speaking with the legitimate server that handles the example.org domain.

dns_check

The second experiment is called dns_check. It is conceptually similar to the dns_consistency OONI experiment. The following diagram illustrates the dns_check experiment.

Description of the dns\_check experiment

Figure 2: description of the dns_check experiment.

We basically resolve the same domain (e.g. blocked.com) using the system resolver (i.e. the resolver configured on the system where OONI is running) as well using a DNS over HTTPS (DoH) resolver that we trust. In this set of experiments, we used Google’s DoH resolver. The objective of this experiment is to understand whether we can trust the answer of the system resolver, by comparing its results to the DoH resolver ones.

When a DNS resolver claims that a domain name does not exist, the corresponding error is dns_nxdomain_error. When a resolver returns private addresses (e.g. 10.0.0.1), the corresponding error is dns_bogon_error. If there are no errors, we expect this experiment to return two lists: a list of IP addresses for the domain obtained using the system resolver and a similar list obtained instead using the trusted DoH resolver. As we will see in the following sections, we will then use the IP addresses from both lists to perform further checks.

system_resolver_validation

The third experiment we discuss is called system_resolver_validation. This experiment is roughly a subset of the OONI’s Web Connectivity experiment.

Description of the system\_resolver\_validation experiment

Figure 3: description of the system_resolver_validation experiment.

It is called system_resolver_validation because we use the IP addresses collected by the system resolver in the previous step to access the target website using HTTPS, and verify that the IP address indeed serves the target website. We connect on port 443 and, if we are successful, we perform a TLS handshake using the target SNI. If the handshake succeeds, we assume that the specific IP address we are using is valid for the domain. This means that either we are speaking with the legitimate web server or, in a less likely but still quite possible scenario, with a proxy that is willing to let us through.

We consider the experiment successful if we are able to perform the HTTP GET request fetching the home page of the domain without any TCP or TLS errors. Failures during the TLS handshake, or later, are flagged as likely interference. All the failures described previously may occur during the handshake. It is worth noting that, in this context, ssl_invalid_hostname is an error, because we should be able to establish a TLS connection with the domain, given that we are attempting to speak to a web server serving such a domain.

doh_resolver_validation

The fourth experiment we discuss is called doh_resolver_validation. This experiment is basically the same as the previous one, except that here we are using the results returned by the DoH resolver as opposed to the results returned by the system resolver. This experiment gives us an opportunity to run our test with a valid IP for the domain, which is useful for cases wherein the system resolver returns an error, or a list of IPs not related to the domain. This experiment can therefore help us measure SNI-based blocking when the test network is also blocking websites using DNS poisoning or injections.

psiphon_check

The fifth experiment we discuss is called psiphon_check. This experiment uses the Psiphon network to fetch the input domain over HTTPS. It consists of the following two steps.

The first step establishes an encrypted tunnel to one of the thousands of geographically distributed proxy servers managed by Psiphon, Inc. The technology used to establish such a tunnel depends on the censorship techniques implemented in the country in which the experiment is run. Psiphon, in fact, is optimised to select the censorship evasion technique that provides the best performance, choosing among techniques such as obfuscated protocols and domain fronting. Once the encrypted tunnel with the remote proxy server is established, Psiphon exposes it using a SOCKS5 proxy listening on a local port.

The second step performs an HTTPS measurement of the target domain using the encrypted tunnel via the SOCKS5 proxy. Psiphon’s implementation of the SOCKS5 protocol is such that when Aladdin requests Psiphon to connect to a specific domain name on port 443, it will also rely on Psiphon for the domain name resolution. For this reason, we do not need to worry about DNS tampering in this experiment. In turn, Psiphon will ask the remote proxy server to establish a TCP connection to the specified domain and port. If the connection is successful, Aladdin will then perform the TLS handshake and issue a GET request for the homepage. Otherwise, the SOCKS5 server returns a byte indicating the error that occurred. Because the set of error codes specified by SOCKS5 is rather limited, the same error code may actually map to a variety of error conditions. In our experience, two most frequent errors we have seen in this context are 0x01 (“general failure”) and 0x05 (“connection refused”).

Description of the psiphon\_check experiment

Figure 3: description of the psiphon_check experiment.

In the context of this report, we will use the results of the Psiphon experiment to attempt to access the same domain from another vantage point. This will give us further confidence of whether errors in connecting to a website could be caused by interference by the local ISP or, instead, by the website not currently being reachable.

To learn more about Psiphon, we encourage you to watch the presentation on Psiphon from the 2020 edition of the Internet Measurement Village.

Description of the Experiments

We ran the Aladdin script from three different ISPs in India: ACT Fibernet (AS24309), Bharti Airtel (AS45609), and Reliance Jio (AS55836). We attempted to measure four domains for TLS blocking: facebook.com and google.com (both accessible in India via all ISPs); and collegehumor.com and pornhub.com (both usually blocked by Indian ISPs). We ran experiments on May 11st, 12nd, 14th, and 19th, 2020 using github.com/bassosimone/aladdin@5471390. We also ran follow-up experiments on June 22nd and 23rd, 2020.

According to the latest Telecom Regulatory Authority of India’s report (Table 1.30), the three ISPs we tested together constitute 74.5% of the internet subscribers in India. All the measurements were made in Bengaluru to preclude any potential regional variations. The tests for Reliance Jio and Bharti Airtel were run via mobile internet connections. As ACT Fibernet does not provide a retail mobile connection, we used a fixed internet connection to run tests for their network. As far as this report is concerned, we assume that ISPs do not alter their behaviour based on the type of connection (mobile or fixed).

Results Analysis & Discussion

This section describes the results of all the experiments we performed. We fetched measurements from OONI’s S3. For brevity, we are going to include only the results for collegehumor.com and pornhub.com, since facebook.com and google.com were not blocked in any of the experiments that we ran.

sni_check

The following table shows the sni_check experiment results. In this experiment we performed a TLS handshake with the web server serving example.org using the SNI indicated in the table, to detect cases where the presence of this SNI was sufficient to trigger blocking. The script to generate the table is published as a GitHub gist.

ISPSNIFailureCount
ACT Fibernetcollegehumor.comssl_invalid_hostname2
ACT Fibernetpornhub.comssl_invalid_hostname2
Bharti Airtelcollegehumor.comeof_error3
Bharti Airtelpornhub.comeof_error2
Bharti Airtelpornhub.comssl_invalid_hostname1
Reliance Jiocollegehumor.comssl_invalid_hostname1
Reliance Jiopornhub.comssl_invalid_hostname1

Table 1. Results of connecting to example.org’s IP address when using specific SNIs.

As mentioned above, the eof_error result indicates that the connection was closed during the TLS handshake, likely because some middlebox rejected the provided SNI. The following JSON snippet shows a measurement for collegehumor.com from Bharti Airtel:

{
  "test_keys": {
    "network_events": [
      {
        "failure": null,
        "operation": "connect",
        "address": "93.184.216.34:443",        // (1)
        "t": 0.3199562,
        "proto": "tcp"
      },
      {
        "failure": null,
        "operation": "tls_handshake_start",    // (2)
        "t": 0.3199999
      },
      {
        "failure": null,
        "operation": "write",                  // (3)
        "num_bytes": 286,
        "t": 0.3213545
      },
      {
        "failure": "eof_error",                // (4)
        "operation": "read",
        "t": 0.382416
      },
      {
        "failure": "eof_error",
        "operation": "tls_handshake_done",
        "t": 0.383602
      }
    ]
  },
  "resolver_asn": "AS9498",
  "probe_cc": "IN",
  "probe_network_name": "Bharti Airtel Ltd. AS for GPRS Service",
  "input": "tlshandshake://93.184.216.34:443",
  "probe_asn": "AS45609",
  "annotations": {
    "step": "sni_blocking",
    "session": "38c221ed-5fc6-4897-984d-b612bc43dd24"
  },
  "resolver_network_name": "BHARTI Airtel Ltd.",
  "measurement_start_time": "2020-05-19 08:05:07"
}

Here we basically (1) connect to example.org's IP address, (2) start the TLS handshake, (3) write the ClientHello, and (4) observe that the connection is closed.

It is also interesting to note that in one specific case we could complete the TLS handshake with example.org's IP address with the pornhub.com SNI. We also observed the same pattern in the follow-up measurements for collegehumor.com (see #1, #2, #3, #4, #5, and #6) and for pornhub.com (see #1, #2, #3, #4, and #5) performed on June 22nd and 23rd, 2020. Typically there is blocking, but three times we succeed to complete the TLS handshake.

Instead, the ssl_invalid_hostname result hints that there was no blocking. To be sure about this, we fetched the returned certificate from the measurement, ensured it was unique, and computed its fingerprint. In all cases, we received the following X.509 certificate:

Certificate:
    Data:
        Version: 3 (0x2)
        Serial Number:
            0f:d0:78:dd:48:f1:a2:bd:4d:0f:2b:a9:6b:60:38:fe
    Signature Algorithm: sha256WithRSAEncryption
        Issuer: C=US, O=DigiCert Inc, CN=DigiCert SHA2 Secure Server CA
        Validity
            Not Before: Nov 28 00:00:00 2018 GMT
            Not After : Dec  2 12:00:00 2020 GMT
        Subject: C=US, ST=California, [...] CN=www.example.org
[...]
SHA1 Fingerprint=7B:B6:98:38:69:70:36:3D:29:19:CC:57:72:84:69:84:FF:D4:A8:89

This certificate is indeed the one used by example.org. We can therefore conclude that, in all these cases, we were able to speak to the test helper without interference.

Establishing a baseline

In the following sections, we are going to comment on the experiments where we attempt to connect to the target websites using HTTPS. Before diving into that, let us check whether we could access such websites using Psiphon. This check will give us an opportunity to establish whether the websites were reachable when we performed the measurement. In fact, the usage of the Psiphon circumvention tool allowed us to access the websites we wanted to test using alternative routes over encrypted and obfuscated tunnels to Psiphon managed proxies.

The following table shows the results of the psiphon_check experiment, which was computed using a script published at GitHub.

ISPDomainBootstrap TimeFailure
ACT Fibernetpornhub.com6.5null
ACT Fibernetcollegehumor.com6.6general SOCKS server failure
ACT Fibernetpornhub.com5.7null
ACT Fibernetcollegehumor.com5.3SOCKS: connection refused
Reliance Jiopornhub.com7.2null
Reliance Jiocollegehumor.com6.8SOCKS: connection refused
Bharti Airtelpornhub.com6.3null
Bharti Airtelcollegehumor.com6.1SOCKS: connection refused
Bharti Airtelpornhub.com6.5http: unexpected EOF reading trailer
Bharti Airtelcollegehumor.com6.7SOCKS: connection refused
Bharti Airtelpornhub.com8.1null
Bharti Airtelcollegehumor.com6.6SOCKS: connection refused

Table 2. Results of fetching the homepage of specific domains using Psiphon.

The Bootstrap Time column indicates the number of seconds it took Psiphon to establish an encrypted tunnel with a proxy server. Because the bootstrap time is defined (and not null) for each measurement, it means that it was always possible to establish a tunnel. The OONI Engine implementation, in fact, does not set the bootstrap time unless the tunnel has been successfully established.

Regarding the Failure column, we notice that pornhub.com is available on and off, and sometimes there are HTTP protocol errors. Yet, there are definitely cases in which this domain is reachable via Psiphon. On the contrary, collegehumor.com is consistently not reachable.

We could explain this consistent failure with either the website not being reachable over HTTPS or with censorship experienced by the Psiphon proxy we were using.

As a follow-up, we ran subsequent measurements targeting collegehumor.com from Bharti Airtel, Vodafone (Italy), and Google Cloud (europe-west4 zone) on June 22nd and 23rd, 2020 using the OONI Probe Engine, OONI Probe for iOS, and netcat. We attempted to connect to collegehumor.com using HTTPS with and without using the Psiphon. When using Psiphon, we obtained the same errors reported in the table above. When connecting directly we noticed a timeout when trying to establish a TCP connection on port 443.

While investigating further, we also found an “ancient” measurement using OONI Probe 1.2 that timed out when attempting to connect to collegehumor.com on port 443 from the Telx ISP in the United States in 2015. However, we also found that records of an X.509 certificate for this website exist in the Certificate Transparency Log.

We concluded that collegehumor.com was quite likely not reachable on port 443 when we measured it, which is why we will not flag as censorship any failure in connecting to it on port 443 that we may encounter in the following sections. At the same time, it is interesting to note that Bharti Airtel was censoring the collegehumor.com SNI (as we have seen in the previous section) even though it was not a properly working HTTPS website during our measurements.

dns_check

The following table shows the domain names resolved using the system resolver. For each target domain in the Domain column, we performed a DNS resolution using the resolver configured in the operating system. The ISP owning the resolver is indicated in the Resolver column. The table has been computed using a script published at GitHub. We then manually annotated IP addresses with their autonomous system number (ASN) and network name.

ISPResolverDomainFailureIPIP NetworkCount
ACT FibernetACT Fibernetcollegehumor.comnull202.83.21.15ACT Fibernet1
ACT FibernetACT Fibernetcollegehumor.comnull49.205.75.6ACT Fibernet1
ACT FibernetACT Fibernetpornhub.comnull202.83.21.15ACT Fibernet1
ACT FibernetACT Fibernetpornhub.comnull49.205.75.6ACT Fibernet1
Bharti AirtelBharti Airtelcollegehumor.comdns_nxdomain_error3
Bharti AirtelBharti Airtelpornhub.comdns_nxdomain_error3
Reliance JioReliance Jiocollegehumor.comnull52.8.26.172AMAZON1
Reliance JioReliance Jiocollegehumor.comnull54.193.47.52AMAZON1
Reliance JioReliance Jiopornhub.comnull66.254.114.41SWIFTMILL1

Table 3. Results of resolving specific domains using the system’s default resolver.

Let us compare the above table with a similar table, where we show the results obtained using Google public DNS over HTTPS (DoH) resolver and a similar analysis script. (No errors are included into the table because no DoH query ever failed.)

ISPDomainDoH URLIPIP ASNIP NetworkCount
ACT Fibernetcollegehumor.comhttps://dns.google/dns-query52.8.26.172AS16509AMAZON5
ACT Fibernetcollegehumor.comhttps://dns.google/dns-query54.193.47.52AS16509AMAZON2
ACT Fibernetpornhub.comhttps://dns.google/dns-query66.254.114.41AS30361SWITFMILL2
Bharti Airtelcollegehumor.comhttps://dns.google/dns-query52.8.26.172AS16509AMAZON3
Bharti Airtelcollegehumor.comhttps://dns.google/dns-query54.193.47.52AS16509AMAZON3
Bharti Airtelpornhub.comhttps://dns.google/dns-query66.254.114.41AS30361SWITFMILL3
Reliance Jiocollegehumor.comhttps://dns.google/dns-query52.8.26.172AS16509AMAZON1
Reliance Jiocollegehumor.comhttps://dns.google/dns-query54.193.47.52AS16509AMAZON1
Reliance Jiopornhub.comhttps://dns.google/dns-query66.254.114.41AS30361SWITFMILL1

Table 4. Results of resolving specific domains using Google’s DoH resolver.

By comparing the two tables, we conclude that ACT Fibernet and Bharti Airtel resolvers are lying to us. Reliance Jio’s resolver instead returns answers that are consistent with Google’s DoH resolver, which we assume to not be lying.

We say that ACT Fibernet’s resolver is lying because it claims that the two domains we are testing are hosted by ACT Fibernet itself and it also claims that they share the same IP addresses. We say that Bharti Airtel’s resolver is lying because it claims that the two tested domains do not exist (dns_nxdomain_error), but in fact they do. This is consistent with the findings in CIS’ recent study, which found ACT Fibernet and Bharti Airtel to be tampering with DNS responses in this precise way.

While Bharti Airtel’s resolver’s answer prevents us from accessing these websites, we cannot exclude that ACT Fibernet’s answer is just directing us to some cache. To investigate this hypothesis, we will need to attempt to use such IPs and see what happens.

system_resolver_validation

The following table shows the results of the system_resolver_validation experiment. This means that we used the IP addresses previously resolved using the system resolver to connect to the specific websites to which they belong, according to such a resolver. The script used to generate the table is published on GitHub.

ISPSNIFailureCount
ACT Fibernetcollegehumor.comeof_error2
ACT Fibernetpornhub.comeof_error2
Reliance Jiocollegehumor.comgeneric_timeout_error1
Reliance Jiopornhub.comeof_error1

Table 5. Results of using the IP address for a domain returned by the system’s default resolver to fetch the homepage of such a domain using the HTTPS protocol.

Of course, the table does not include Bharti Airtel entries, because Bharti Airtel’s resolver told us that the domains we were looking for do not exist.

The following snippet shows the recorded measurement for pornhub.com from ACT Fibernet:

{
  "test_keys": {
    "dns_cache": [
      "www.pornhub.com 49.205.75.6"                   // (1)
    ],
    "network_events": [
      {
        "failure": null,
        "operation": "connect",
        "address": "49.205.75.6:443",                 // (2)
        "t": 0.002680552,
        "proto": "tcp"
      },
      {
        "failure": null,
        "operation": "tls_handshake_start",           // (3)
        "t": 0.00271161
      },
      {
        "failure": null,
        "operation": "write",                         // (4)
        "num_bytes": 285,
        "t": 0.003049326
      },
      {
        "failure": "eof_error",                       // (5)
        "operation": "read",
        "t": 0.005205896
      }
    ]
  },
  "input": "https://www.pornhub.com/",
  "probe_asn": "AS24309",
  "annotations": {
    "step": "system_resolver_validation",
    "session": "c44b5e74-ecb4-4310-b5d8-f9d6b27a1599"
  }
}

We see that (1) we use the DNS cache to force the previously discovered IP, (2) we connect successfully to such IP on port 443, (3) we start the TLS handshake and then (4) write the ClientHello message, and (5) after that the connection is closed. Because we know that the IP address we are using is suspicious, given that it does not belong to the correct ASN for pornhub.com, we assume that this is either blocking or a misconfigured cache.

The same could of course be said for collegehumor.com failures on ACT Fibernet. Because the IP address is suspicious, it may either be blocking or a misconfigured cache.

The following snippet shows a collegehumor.com measurement inside Reliance Jio:

{
  "test_keys": {
    "dns_cache": [
      "collegehumor.com 52.8.26.172 54.193.47.52"     // (1)
    ],
    "network_events": [
      {
        "failure": null,
        "operation": "connect",
        "address": "52.8.26.172:443",                 // (2)
        "t": 0.0405582,
        "proto": "tcp"
      },
      {
        "failure": null,
        "operation": "tls_handshake_start",           // (3)
        "t": 0.0406031
      },
      {
        "failure": null,
        "operation": "write",                         // (4)
        "num_bytes": 286,
        "t": 0.0418597
      },
      {
        "failure": "generic_timeout_error",           // (5)
        "operation": "tls_handshake_done",
        "t": 10.0432773
      }
    ]
  },
  "probe_cc": "IN",
  "probe_network_name": "Reliance Jio Infocomm Limited",
  "test_runtime": 10.0446836,
  "input": "https://collegehumor.com/",
  "probe_asn": "AS55836",
  "annotations": {
    "step": "system_resolver_validation",
    "session": "40504ce1-94e0-4d6d-a4e5-6ede4272e385"
  }
}

The sequence of events we see here is basically the same as above. The two main differences are that (1) the IP addresses we are using seem legitimate and (5) the TLS handshake fails with a timeout rather than with the connection being closed.

What is particularly interesting here, though, is that we should not have been able to connect to collegehumor.com:443/tcp, according to the results of the psiphon_check experiment. This fact seems to indicate the presence of a some kind of proxy that terminates our TCP/IP connection and then forwards TLS bytes to the remote server. Because we know that the remote server is misconfigured on port 443, it is reasonable to assume that the timeout error we see is caused by such a misconfiguration rather than by censorship.

We consider the successful connection to collegehumor.com:443 an anomaly because the psiphon_check experiment run around a minute afterwards failed to connect. Also, the same measurement was later performed using the result from doh_resolver_check which yielded the same result. Subsequent measurements run between May 23rd and June 22nd 2020 confirmed this anomaly pattern (see #1, #2, #3), but we also saw cases where the measurement failed with a TCP connect timeout (see #1, #2, #3), consistently with what we observed from other ISPs. We will further investigate this behavior as part of our future work.

The failure we see for pornhub.com, instead, is more suspicious. The following is a relevant snippet of the JSON measurement archived at OONI’s S3:

{
  "test_keys": {
    "dns_cache": [
      "pornhub.com 66.254.114.41"
    ],
    "network_events": [
      {
        "failure": null,
        "operation": "connect",
        "address": "66.254.114.41:443",
        "t": 0.0369444,
        "proto": "tcp"
      },
      {
        "failure": null,
        "operation": "tls_handshake_start",
        "t": 0.0369909
      },
      {
        "failure": null,
        "operation": "write",
        "num_bytes": 281,
        "t": 0.0380669
      },
      {
        "failure": "eof_error",
        "operation": "read",
        "t": 0.2627324
      },
      {
        "failure": "eof_error",
        "operation": "tls_handshake_done",
        "t": 0.2640703
      }
    ]
  },
  "probe_cc": "IN",
  "probe_network_name": "Reliance Jio Infocomm Limited",
  "input": "https://pornhub.com/",
  "probe_asn": "AS55836",
  "annotations": {
    "step": "system_resolver_validation",
    "session": "285e6728-8c92-4034-8c0d-5c62034a71bd"
  },
}

We see the same events as before, except that the TLS handshake fails with eof_error. This failure is similar to the one Magma noticed in Spain and OONI noticed in Iran. Also in this case, blocking only happens when the IP address is consistent with the SNI. We have in fact seen previously that the pornhub.com SNI could be successfully used with example.org's IP. (We also saw this blocking pattern for all subsequent measurements run between June 22nd and June 23rd, 2020: see #1, #2, #3, #4, #5, and #6).

Regarding ACT Fibernet and Bharti Airtel, we need to check what happens when using the IP addresses returned by the DoH resolver before drawing any conclusion concerning TLS blocking. We will do that in the following subsection.

doh_resolver_validation

The following table shows the results of the doh_resolver_validation experiment. This means we used the IP addresses returned by Google’s DoH resolver to connect to the websites to which they belong to, according to this resolver. We assume that Google’s DoH resolver is not returning false answers. This experiment therefore gives us another chance to verify whether there are additional forms of blocking beyond the system resolver returning errors or wrong entries. The script used to generate the table is available on GitHub.

ISPSNIFailureCount
ACT Fibernetcollegehumor.comgeneric_timeout_error2
ACT Fibernetpornhub.comnull2
Bharti Airtelcollegehumor.comgeneric_timeout_error3
Bharti Airtelpornhub.comeof_error3
Reliance Jiocollegehumor.comgeneric_timeout_error1
Reliance Jiopornhub.comeof_error1

Table 6. Results of using the IP address for a domain returned by Google’s DoH resolver to fetch the homepage of such a domain using the HTTPS protocol.

In ACT Fibernet collegehumor.com is not accessible and pornhub.com is reachable. The failure for collegehumor.com matches the following pattern (the full measurement can of course be accessed on OONI Explorer):

{
  "test_keys": {
    "dns_cache": [
      "collegehumor.com 52.8.26.172 54.193.47.52"  // (1)
    ],
    "network_events": [
      {
        "failure": "generic_timeout_error",
        "operation": "connect",
        "address": "52.8.26.172:443",              // (2)
        "t": 30.001980387,
        "proto": "tcp"
      },
      {
        "failure": "generic_timeout_error",
        "operation": "connect",
        "address": "54.193.47.52:443",             // (3)
        "t": 60.002742623,
        "proto": "tcp"
      }
    ]
  },
  "probe_asn": "AS24309",
  "annotations": {
    "step": "doh_resolver_validation",
    "session": "1ada324e-736e-45ab-8127-d1066b23c5f5"
  }
}

Here we see that (1) we are using the DNS cache to force the correct IPs and we timeout when we attempt to connect to both IP addresses for the domain (2, 3). This is consistent with our previous observation that collegehumor.com:443 is misconfigured and attempting to reach it fails with a connection timeout when using Psiphon, as well as when connecting from vantage points in which it should be censored (e.g. Vodafone Italy and Google Cloud).

In Bharti Airtel collegehumor.com:443 is also failing with a connect timeout as expected, as shown by the following measurement:

{
  "test_keys": {
    "dns_cache": [
      "collegehumor.com 52.8.26.172 54.193.47.52"
    ],
    "network_events": [
      {
        "failure": "generic_timeout_error",
        "operation": "connect",
        "address": "52.8.26.172:443",
        "t": 30.0091558,
        "proto": "tcp"
      },
      {
        "failure": "generic_timeout_error",
        "operation": "connect",
        "address": "54.193.47.52:443",
        "t": 60.014621,
        "proto": "tcp"
      }
    ]
  },
  "probe_asn": "AS45609",
  "annotations": {
    "step": "doh_resolver_validation",
    "session": "38c221ed-5fc6-4897-984d-b612bc43dd24"
  }
}

What we see here is again consistent with the results of the psiphon_check experiment. The collegehumor.com:443/tcp endpoint, in fact, is not working correctly and consistently fails with a timeout when attempting to connect to it from several vantage points.

Pornhub.com is blocked by Bharti Airtel during the TLS handshake. This is not surprising, since we have seen above that Bharti Airtel blocks any handshake towards any host as long as the SNI contains pornhub.com. The pattern that we see is roughly the same as what we previously saw when discussing sni_check measurements for Bharti Airtel. Yet, in one specific instance, we were able to perform a handshake for pornhub.com, only to be redirected to www.pornhub.com, for which the TLS handshake failed with eof_error. The following is the relevant snippet of the measurement:

{
  "test_keys": {
    "dns_cache": [
      "pornhub.com 66.254.114.41"
    ],
    "requests": [{
        "failure": "eof_error",
        "request": {
          "headers_list": [[
              "Referer",
              "https://pornhub.com/"
            ], [
              "Host",
              "www.pornhub.org"
            ]
          ],
          "url": "https://www.pornhub.org/",
          "method": "GET"
        }
    }, {
        "failure": null,
        "request": {
          "headers_list": [[
              "Host",
              "pornhub.com"
            ]
          ],
          "url": "https://pornhub.com/",
          "method": "GET"
        },
        "response": {
          "headers_list": [[
              "Location",
              "https://www.pornhub.org/"
          ]],
          "code": 302
        }
    }],
    "tls_handshakes": [
      {
        "tls_version": "TLSv1.2",
        "no_tls_verify": false,
        "server_name": "pornhub.com",
        "peer_certificates": [
          {
            "data": "...",
            "format": "base64"
          },
          {
            "data": "...",
            "format": "base64"
          }
        ],
        "cipher_suite": "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
        "failure": null,                // (1)
        "negotiated_protocol": "",
        "t": 0.4989724
      },
      {
        "tls_version": "",
        "no_tls_verify": false,
        "server_name": "www.pornhub.org",
        "peer_certificates": null,
        "cipher_suite": "",
        "failure": "eof_error",         // (2)
        "negotiated_protocol": "",
        "t": 0.956141
      }
    ]
  }
  "probe_asn": "AS45609",
  "annotations": {
    "step": "doh_resolver_validation",
    "session": "f0cfc2f4-ea96-4e99-bf35-8a283db9d3e9"
  }
}

The tls_handshakes field of the above measurement shows, in particular, that the eof_error occurs indeed when attempting to perform a TLS handshake for www.pornhub.com. In fact, we see that the first handshake is successful (1) and the second fails (2).

Regarding Reliance Jio, we see exactly the same results of the system_resolver_validation experiment, since Reliance Jio’s resolver returned IP addresses consistent with Google’s DoH resolver’s answer. This result, therefore, confirms previous findings.

Conclusion & Future Work

We investigated TLS blocking in India. The research question was to understand whether there were cases of TLS blocking caused not only by the value of the Server Name Indication (SNI) field in the ClientHello TLS message, but also by the destination IP address. That is, cases in which the SNI blocking methodology we previously developed was not sufficient.

We measured four domains (facebook.com, google.com, collegehumor.com, and pornhub.com) in three popular Indian ISPs: ACT Fibernet (fixed line), Bharti Airtel, and Reliance Jio (mobile). For each domain, we performed a series of experiments to answer the research question using a bash script driving the new OONI measurement engine in Go.

Neither facebook.com nor google.com were blocked in any of the measured ISPs.

The following table recaps our findings regarding the blocking of collegehumor.com and pornhub.com on the ACT Fibernet, Bharti Airtel, and Reliance Jio networks. We call “SNI blocking” the case where we observed that a specific SNI was blocked when connecting to a control server (the one managing example.org). “SNI+IP blocking” instead covers the cases where we observed TLS blocking only when connecting to the correct IP for the domain.

ISPDomainSNI blockingDNS lyingSNI+IP blocking
ACT Fibernetcollegehumor.com✔️
ACT Fibernetpornhub.com✔️
Bharti Airtelcollegehumor.com✔️✔️
Bharti Airtelpornhub.com✔️✔️
Reliance Jiocollegehumor.com
Reliance Jiopornhub.com✔️

Table 7. Summary of the measured blocking techniques for each measured ISP.

ACT Fibernet does not implement TLS blocking. ACT Fibernet’s resolver is configured to lie to users and redirect them to specific servers that cause TLS handshakes to fail. While we see TLS failures, the root cause of censorship is ACT Fibernet’s resolver answer. We observed this behaviour for both collegehumor.com and pornhub.com.

Bharti Airtel seems to be blocking collegehumor.com and pornhub.com by inspecting the SNI field. Bharti Airtel’s DNS resolver also claims that collegehumor.com and pornhub.com are not existing domain names. While we could not connect to collegehumor.com:443/tcp, we are confident that this is not censorship, but the endpoint not behaving correctly. We have, in fact, observed deterministic TCP connection timeout failures for this domain across the measurement period, as well as during subsequent follow-up measurements run from Vodafone Italy, Google Cloud, and using the Psiphon censorship-evasion network.

Reliance Jio does not block pornhub.com solely based on the SNIs. We can successfully use this SNIs when connecting to example.org. Yet, when connecting to the legitimate web server for the domain, TLS handshakes are actually blocked. This kind of blocking where both the SNI value and the destination IP address value matter is similar to the one that Magma and OONI previously observed in Spain for www.womenonweb.org.

Surprisingly, we can sometimes connect to the collegehumor.com:443 endpoint from Reliance Jio, even though all other measurements, as well as control measurements, suggest that the endpoint is misbehaving: any connection attempt to it fails to establish a TCP connection and times out. We conclude that Reliance Jio implements some sort of proxying that terminates the TCP connection and then performs the TLS handshake and forwards the bytes back to the client. Therefore, we cannot conclude whether the timeout we see during the TLS handshake to collegehumor.com:443 is censorship, or just the proxy failing at establishing the connection, as it ought to be, and reporting an error back to us.

In the future, we would like to better characterise TLS blocking in India. Previous measurements by CIS India, for example, indicate that Reliance Jio was blocking specific SNIs regardless of the destination address being used. For destination addresses where they did not notice SNI censorship, they hypothesise that a middlebox was not present on that particular network path. The experiments included in this report, yet, open up the additional possibility that Reliance Jio’s blocking is intentionally depending also on the destination IP address. It would be interesting to try and understand whether it is possible to perform TLS handshakes for other domains with one of the blocked websites. For example, we could check what happens when connecting to the IP address of collegehumor.com with the SNI being, e.g., example.org. It would also be interesting to determine whether the blocking depends on the TLS version and other TLS options.