Overview of IYP data

Given the diversity of datasets integrated into IYP, one of the first difficulties when working with IYP is to get an overview of the data that is available. We cover these with the two following sections:

Internet Health Report

The easiest way to browse the IYP database is to visit the Internet Health Report website. There you can search for an Internet resource (e.g. AS, prefix, domain name) and get IYP data related to that resource.

  1. Enter the resource into the search field in the top left corner.
  2. Selecting the Routing, DNS, Peering, Registration, or Rankings tabs.
    1. The time series Monitor tab is external to the IYP.
IHR website
IYP routing data shown on IHR website.

The above figure illustrates the ‘routing’ tab for the Internet Initiative Japan network (AS2497). All other tabs (except the ‘monitoring’ one) are providing IYP data via different widgets.

For each widget, you’ll find a chart, data, Cypher query, and a metadata tabs.

  • The Chart tab shows a visual representation of the data.
  • The Data tab gives the raw data in a table format.
  • And the Cypher Query tab gives you the exact query we used to pull the data from IYP. You can reuse that to query directly the IYP database or craft your own queries. More on how to write your own query in the next section.
  • The Metadata tab gives links to the original datasets and the freshness of the data.

IYP data modelling

Under the hood IYP is stored in a graph database where nodes represent mostly Internet resources and links give relationships between them. To do that we had to model all datasets integrated to IYP as graphs. For some datasets it makes a lot of sense, for some it may seem counter-intuitive. In any cases IYP’s github repository contains a readme for each dataset where the modelling is documented.

Let’s see a simple example. The below figure is an extract of BGP data showing the two ASes connected to the University of Tokyo (AS2501):

Networks peering with AS2501
Networks peering with AS2501

This is very similar to AS graphs we usually draw on a whiteboard. In addition the number displayed in each node shows the ASN which is actually a property set on the node.

For a counter-intuitive example see the graph below. These are the names associated with AS2501:

Names for AS2501.
Names for AS2501.

Although we could have set a property ‘name’ to our AS nodes, we wouldn’t know which value to use for this property. Different datasets give us different AS names. So instead of a property, a name is a unique entity that can be linked to other entities.

The previous example also illustrates two different types of nodes, one AS node and three Name nodes. In IYP every node and every relationship are typed. The list of node and relationship types is available on GitHub and grows as IYP integrates more datasets.

Nodes and links have properties. For example the asn property permits to uniquely identify the AS represented by an AS node. In the IYP console (introduced on the next page) click on a link or node to see all its properties.
IYP uses properties for three different purposes:

  • Identification: Some properties are immutable values that enable the user to select a specific thing in the database. For example, all AS nodes have the property asn that permits distinguishing the different ASes. Country nodes have the property country_code, HostName nodes have the property name, etc. These properties are documented in the list of node and relationship types.
  • Non-modelled data: Some datasets provide a lot of detailed information, more than what IYP is modelling. In this case IYP keeps non-modelled data in the form of properties. For example, PeeringDB provides IXP member lists. This is modelled by connecting AS nodes to IXP nodes with the MEMBER_OF relationship. PeeringDB also provides the general peering policy of these ASes, which is not modelled in IYP but available via the property policy_general of the relationships. These properties are generally not documented in IYP and require a good understanding of the original dataset.
  • Reference: Every relationship in IYP contains metadata referring to the origin of the data:
    • The reference_org property refers to the organization providing the original data.
    • reference_time_fetch is the date at which the data was imported.
    • reference_time_modification is the time at which the data was produced.
    • reference_url_data is the URL of the original dataset.
    • reference_url_info is the URL of the documentation of the original dataset.
    • reference_name is unique per dataset and is composed of the organization name and dataset name (e.g. caida.as_relationships_v4). This is particularly useful to filter per datasets.