IP and hostname
Now we will see how to query more complex patterns and introduce other types of
nodes and relationships: IP, HostName, PART_OF, RESOLVES_TO.
We will also learn about the Cypher aggregating
functions
collect() and count().
Finding popular IPs in a prefix
Some of the datasets integrated into IYP provide IP addresses and hostnames. A good example of that are the top popular websites and DNS nameservers provided by Tranco and OpenINTEL.
The query to fetch prefixes originated by the University of Tokyo (AS2501) related to hostnames (from any of the integrated datasets) is:
MATCH (:AS {asn:2501})-[:ORIGINATE]-(pfx:BGPPrefix)-[:PART_OF]-(:IP)-[:RESOLVES_TO]-(h:HostName)
RETURN pfx.prefix, collect(DISTINCT h.name)
Note the usage of collect in the RETURN clause.
This function is used to compile a list of all HostName names per prefix.
If you use aggregation functions in the return clause, it implies (in SQL terms)
a “GROUP BY” on all returned elements that are not aggregated (like pfx.prefix
in this example).
You might also notice that for longer queries the order of nodes and relationship “verbs” can get mixed up (e.g., “IP RESOLVES_TO HostName” instead of “HostName RESOLVES_TO IP”). This is sometimes unavoidable and should not confuse you as the order does not matter for Cypher.
The above query is returning only prefixes that are related to hostnames.
It will not return an empty hostname list.
To list all prefixes and their corresponding hostnames (if they have any) we
should break down the previous query into two parts and make one of the parts
optional.
Optional parts of a pattern are preceded by the keyword OPTIONAL, hence the
previous query becomes:
MATCH (:AS {asn:2501})-[:ORIGINATE]-(pfx:BGPPrefix)
OPTIONAL MATCH (pfx)-[:PART_OF]-(:IP)-[:RESOLVES_TO]-(h:HostName)
RETURN pfx.prefix, collect(DISTINCT h.name)
Finding DNS authoritative nameservers and corresponding domains
Looking at the results of the above query you may see a couple of hostnames that
start with “dns” or “ns”.
Those are typically DNS nameservers.
In IYP a node can have multiple types.
The DNS nameservers are both HostName and AuthoritativeNameServer.
Hence, the following query finds all authoritative nameservers hosted by AS2501
and the number of domains they manage.
MATCH (:AS {asn:2501})-[:ORIGINATE]-(pfx:BGPPrefix)
MATCH (pfx)-[:PART_OF]-(:IP)-[:RESOLVES_TO]-(ns:AuthoritativeNameServer)
OPTIONAL MATCH (dn:DomainName)-[:MANAGED_BY]-(ns)
RETURN ns.name, count(DISTINCT dn.name) AS nb_domains, collect(DISTINCT dn.name)
ORDER BY nb_domains DESC
Note the use of:
- the
countfunction (similar to SQL) to count the number of domain names, - the
ASkeyword to name a result column, - the
ORDER BYandDESCkeywords to sort the results (similar to SQL).
Exercises
- Write a query that finds the AS hosting the hostname
example.com. - Write a query fetching hostnames that resolves to IP addresses located in multiple ASes.
- Find hostnames that are located in prefixes which are RPKI invalid (hint:
look for
Tagnodes).