Exercises

IXPs

Find the top 10 IXPs in terms of members.

MATCH (m:AS)-[:MEMBER_OF]->(i:IXP)
RETURN i.name AS ixp_name, count(DISTINCT m) as num_members
ORDER BY num_members DESC
LIMIT 10

Find the top 10 ASes in terms of IXP membership.

MATCH (m:AS)-[:MEMBER_OF]->(i:IXP)
RETURN m.asn AS asn, count(DISTINCT i) as num_ixps
ORDER BY num_ixps DESC
LIMIT 10

Which are the top 10 IXPs for which CAIDA reports more members than PeeringDB?

Note: IXP data provided by CAIDA is an aggregation of PeeringDB, PCH, and HE data.

Bonus: for some of these ASes we have looking glass information too (reference_org: 'Alice-LG'). Show also the counts from the looking glass.

MATCH (c:AS)-[:MEMBER_OF {reference_org: 'CAIDA'}]->(i:IXP)
MATCH (p:AS)-[:MEMBER_OF {reference_org: 'PeeringDB'}]->(i)
WITH i.name AS ixp_name, count(DISTINCT c) AS num_members_caida, count(DISTINCT p) AS num_members_pdb
WHERE num_members_caida > num_members_pdb
RETURN ixp_name, num_members_caida, num_members_pdb
ORDER BY num_members_caida DESC
LIMIT 10

Bonus:

MATCH (c:AS)-[:MEMBER_OF {reference_org: 'CAIDA'}]->(i:IXP)
MATCH (p:AS)-[:MEMBER_OF {reference_org: 'PeeringDB'}]->(i)
OPTIONAL MATCH (lg_member:AS)-[:MEMBER_OF {reference_org: 'Alice-LG'}]->(i)
WITH i.name AS ixp_name, count(DISTINCT c) AS num_members_caida, count(DISTINCT p) AS num_members_pdb, count(DISTINCT lg_member) AS num_members_lg
WHERE num_members_caida > num_members_pdb
RETURN ixp_name, num_members_caida, num_members_pdb, num_members_lg
ORDER BY num_members_caida DESC
LIMIT 10

The query above is very slow. You can optimize it by only searching for looking glass information for IXPs that will be included in the output:

MATCH (c:AS)-[:MEMBER_OF {reference_org: 'CAIDA'}]->(i:IXP)
MATCH (p:AS)-[:MEMBER_OF {reference_org: 'PeeringDB'}]->(i)
WITH i, count(DISTINCT c) AS num_members_caida, count(DISTINCT p) AS num_members_pdb
WHERE num_members_caida > num_members_pdb
ORDER BY num_members_caida DESC
LIMIT 10
OPTIONAL MATCH (lg_member:AS)-[:MEMBER_OF {reference_org: 'Alice-LG'}]->(i)
RETURN i.name AS ixp_name, num_members_caida, num_members_pdb, count(DISTINCT lg_member) AS num_members_lg
MATCH (c:AS)-[:MEMBER_OF {reference_org:'CAIDA'}]->(i:IXP)
MATCH (p:AS)-[:MEMBER_OF {reference_org:'PeeringDB'}]->(i)
RETURN i.name AS ixp_name, count(DISTINCT p) AS num_members_pdb, count(DISTINCT c) AS num_members_caida, abs(count(DISTINCT c) - count(DISTINCT p)) AS diff
ORDER BY diff DESC
LIMIT 10

Which are the top 10 IXPs having the most members from a different country?

Hint: there is a COUNTRY relationship between IXP and Country nodes, and AS and Country nodes. This may mean different things (e.g., AS registered in a country, AS “geolocated” to a country etc.). First try with reference_org: 'PeeringDB' for IXPs and reference_org: 'NRO' for ASes.

MATCH (i:IXP)-[:COUNTRY {reference_org: 'PeeringDB'}]->(i_c:Country)
MATCH (m_c:Country)<-[:COUNTRY {reference_org: 'NRO'}]-(m:AS)-[:MEMBER_OF]->(i)
WHERE i_c <> m_c
RETURN i.name AS ixp_name, count(DISTINCT m) AS num_members_diff_country
ORDER BY num_members_diff_country DESC
LIMIT 10

RIPE Atlas

Find the top 10 ASes (or countries) hosting the most Atlas probes.

Bonus: Can you do the same but counting only “Connected” probes?

ASes:

MATCH (p:AtlasProbe)-[:LOCATED_IN]->(a:AS)
RETURN a.asn AS asn, count(DISTINCT p) AS num_probes
ORDER BY num_probes DESC
LIMIT 10

Countries:

MATCH (p:AtlasProbe)-[:COUNTRY]->(c:Country)
RETURN c.name AS country, count(DISTINCT p) AS num_probes
ORDER BY num_probes DESC
LIMIT 10

Only connected probes:

MATCH (p:AtlasProbe {status_name: 'Connected'})-[:LOCATED_IN]->(a:AS)
RETURN a.asn AS asn, count(DISTINCT p) AS num_probes
ORDER BY num_probes DESC
LIMIT 10

Find the top 10 “Connected” probes that are part of the largest number of measurements.

Return the id, type, and description of the probe.

MATCH (p:AtlasProbe {status_name: 'Connected'})-[:PART_OF]->(m:AtlasMeasurement)
RETURN p.id AS prb_id, p.type AS prb_type, p.description AS prb_description, count(DISTINCT m) AS num_measurements
ORDER BY num_measurements DESC
LIMIT 10

Find hostnames targeted by more than 20 different Atlas measurements.

Hint: see the Cypher WITH clause.

MATCH (m:AtlasMeasurement)-[:TARGET]->(h:HostName)
WITH h.name as hostname, count(DISTINCT m) as num_measurements
WHERE num_measurements > 20
RETURN hostname, num_measurements
ORDER BY num_measurements DESC

What are the countries with the highest population/#probes ratio?

Hint: Total population of a country is available in the relation:

(:Country)-[:POPULATION {reference_org: 'WorldBank'}]-(:Estimate)

You can first aggregate the number of probes per country with a WITH clause.

MATCH (p:AtlasProbe)-[:COUNTRY]->(c:Country)
WITH c, count(p) AS num_probes
MATCH (c)-[pop:POPULATION {reference_org: 'WorldBank'}]->(:Estimate)
RETURN c.name AS country, num_probes, pop.value AS population, pop.value / num_probes AS pop_probes_ratio
ORDER BY pop_probes_ratio DESC

Without WITH:

MATCH (p:AtlasProbe)-[:COUNTRY]->(c:Country)
    -[pop:POPULATION {reference_org: 'WorldBank'}]->(:Estimate)
RETURN c.name AS country, count(p) AS num_probes, pop.value AS population, pop.value / count(p) AS pop_probes_ratio
ORDER BY pop_probes_ratio DESC

Content (Google CrUX)

Find the top 1k websites in Denmark.

MATCH (h:HostName)-[r:RANK {reference_name: 'google.crux_top1m_country'}]->(:Ranking)
    -[:COUNTRY]->(:Country {country_code: 'DK'})
WHERE r.rank <= 1000
RETURN h.name AS hostname

Find in the top 10k websites in Denmark that are accessed via HTTP (not HTTPS).

Hint: the RANK relationship has a property that contains the whole URL.

MATCH (h:HostName)-[r:RANK {reference_name: 'google.crux_top1m_country'}]->(:Ranking)
    -[:COUNTRY]->(:Country {country_code: 'DK'})
WHERE r.rank <= 10000
AND r.origin STARTS WITH 'http:'
RETURN h.name AS hostname

Find which ASes host the largest number of the top 1k popular websites in Denmark.

MATCH (a:AS)-[:ORIGINATE]->(:BGPPrefix)
  <-[:PART_OF]-(:IP)
  <-[:RESOLVES_TO]-(h:HostName)
  -[r:RANK {reference_name: 'google.crux_top1m_country'}]->(:Ranking)
  -[:COUNTRY]->(:Country {country_code: 'DK'})
WHERE r.rank <= 1000
RETURN a.asn AS asn, count(DISTINCT h) AS num_websites
ORDER By num_websites DESC
MATCH (h:HostName)-[r:RANK {reference_name:'google.crux_top1m_country'}]-(:Ranking)
  -[:COUNTRY]-(:Country {country_code:'DK'})
WHERE r.rank <= 1000
WITH h
MATCH (h)-[:RESOLVES_TO]->(:IP)-[:PART_OF]->(:BGPPrefix)<-[:ORIGINATE]-(a:AS)
RETURN a.asn AS asn, count(DISTINCT h) AS num_websites
ORDER BY num_websites DESC

In how many countries is www.google.com in the top 1k most popular websites?

MATCH (:HostName {name: 'www.google.com'})-[r:RANK {reference_name: 'google.crux_top1m_country'}]->(:Ranking)
  -[:COUNTRY]->(c:Country)
WHERE r.rank <= 1000
RETURN count(c) AS num_countries

How many websites appear only in the top 1k of one country (websites popular in only one country)?

Hint: see the Cypher WITH clause to count the number of countries.

Simple:

MATCH (h:HostName)-[r:RANK {reference_name: 'google.crux_top1m_country'}]->(ran:Ranking)
-[:COUNTRY]->(c:Country)
WHERE r.rank <= 1000
WITH h, count(DISTINCT c) AS num_countries
WHERE num_countries = 1
RETURN count(h) AS num_websites

Better performance with cheating by using the country code in the RANK relationship (avoids matching Country nodes):

MATCH (h:HostName)-[r:RANK {reference_name: 'google.crux_top1m_country'}]->(:Ranking)
WHERE r.rank <= 1000
WITH h, count(DISTINCT r.country_code) AS num_countries
WHERE num_countries = 1
RETURN count(h) AS num_websites

Routing

Which AS announces the largest number of IPv4 prefixes? IPv6 prefixes?

Bonus: Do the same but for hyper-specific prefixes, i.e., /25 and smaller for IPv4 and /49 and smaller for IPv6.

Hint: look at the properties of the Prefix nodes.

IPv4 (use af: 6 for IPv6):

MATCH (a:AS)-[:ORIGINATE]->(p:BGPPrefix {af: 4})
RETURN a.asn AS asn, count(DISTINCT p) as num_pfx
ORDER BY num_pfx
LIMIT 1

Hyper-specific prefixes (use af: 6 and prefixlen < 48 for IPv6):

MATCH (a:AS)-[:ORIGINATE]->(p:BGPPrefix {af: 4})
WHERE p.prefixlen < 24
RETURN a.asn AS asn, count(DISTINCT p) as num_pfx
ORDER BY num_pfx DESC
LIMIT 1

Find 10 prefixes that are originated by more than one AS (multi-origin prefixes).

Bonus: Find prefixes that are originated by more than 10 ASes and the number of origins.

Normal:

MATCH (a:AS)-[:ORIGINATE]->(p:Prefix)<-[:ORIGINATE]-(b:AS)
WHERE a <> b
RETURN p.prefix
LIMIT 10

Bonus:

MATCH pa = (a:AS)-[:ORIGINATE]->(p:Prefix)<-[:ORIGINATE]-(b:AS)
WHERE a <> b
WITH p.prefix AS prefix, count(DISTINCT a) AS num_origins
WHERE num_origins > 10
RETURN prefix, num_origins
ORDER BY num_origins DESC

Find the 10 ASes that peer with the most ASes (according to BGP data) in IPv4 / IPv6.

IPv4 (use af: 6 for IPv6):

MATCH (a:AS)-[:PEERS_WITH {af: 4}]-(b:AS)
RETURN a.asn AS asn, count(DISTINCT b) AS num_peers
ORDER BY num_peers DESC
LIMIT 10

Find the 10 ASes that peer with the most RIPE RIS and RouteViews collectors.

MATCH (a:AS)-[:PEERS_WITH]-(rc:BGPCollector)
RETURN a.asn AS asn, count(DISTINCT rc) AS num_rc_peers
ORDER BY num_rc_peers DESC
LIMIT 10