The Internet Structure
The internet is the most complex network in the world right now. It is a massive hub of information. An enormously huge amount of traffic flows through it every second. It hosts millions of networks, owned independently, but still connected with each other.
No one owns the internet or runs it. It’s a totally free place. Billions of users surf the web every day, Thousands of networks join and unjoin each day, but are still able to seamlessly join and unjoin the internet without disturbing the existing Internet.
But even though, there are rules, regulations and protocols that makes the Internet non-chaotic. There are bodies governing and taking care of some of the basic addressing schemes in the Internet namely the IP addresses, domain names, autonomous system numbers and port numbers.
We all use the internet, many wonder how it works, many just don't care, and few knows the actual architecture of the internet.
This is the actual Internet map. Well Ofcourse it seems a bit overwhelming and confusing. But if we try to breakdown the basic fundamental phenomenon that runs the internet and try to understand it, it won't be so difficult.
Hereby, I attempt to create a picture of the internet, its backbone structure, its governing bodies and how it all fits together and makes everything run so smoothly and efficiently. I will try to explain how the IP addresses, Domain names and other such namespaces in the internet are distributed, and who controls them.
This article, by no means, will try to explain the networking fundamentals like TCP/IP or routing protocols and will assume that the reader is familiar with such networking concepts. I will try to focus only on the Internet side of networking and will make sure that I don't miss anything important in order to explain the internet structure.
2. WARMING UP!
Before we continue, it is very important that we take a look at various terms and internet related authorities and understand it with a brief description.
The internet works on the underlying TCP/IP protocol. The network level addressing scheme for TCP/IP is the IP address. Considering the internet as one huge logical network, all devices using an IP address inside this network must be unique. As the internet is completely decentralized, there must be something or someone who can atleast makes sure that everyone in the internet gets a unique IP.
The same way, we all use The Domain Name System and every domain name within the internet must be unique. Other than that we have the Autonomous system numbers and the port numbers, which a normal internet user may not come across or need to worry about, but that also needs be kept unique. We will look into autonomous systems in more detail later on.
The IP Addressing scheme and the Domain naming system follow a hierarchical allocation method. There are various organization, some non-profit, that help manage part of these namespaces. But the primary ones, that have the complete authority and is at the root of all these namespaces are the IANA and ICANN. ICANN is the parent organization and IANA works under it. Both these organization deals with the administration and management of all these 4 namespaces in the internet i.e. IP Address, Domain names, Autonomous system numbers and the port numbers. They make sure the uniqueness inside these namespaces.
For many people, the term World Wide Web or WWW seams synonyms to the internet. The fact is that both these terms are quite different, but are related to each other. World Wide Web is just a part of the internet. It is a service or application, out of many other like the mail service, ftp service or the bittorrent protocol etc., which runs on the internet. The application level protocol that runs the World Wide Web is HTTP. One can vaguely say that as TCP/IP is to the internet, HTTP is to the World Wide Web.
3. IP ADDRESS ALLOCATION
Allocation vs Assignment
Let us first begin with an understanding of IP Address allocation vs assignment. IP address allocation means that when the top level authoritative organization, from its pool of IP Addresses, allocates part of that pool to its lower level authoritative organization for further allocating to its lower level organization or directly assigning it to the end user. Allocated IP Addresses are not routable. An IP Address only becomes routable on the internet when it is assigned to someone. Assigning an IP Address means giving the IP Address (or a pool of IP Addresses) to the end user. We can compare with any traditional market scenario. Any company manufacturing goods first allocates its goods to its lower level distributors and/or resellers, which eventually assigns (sells) the product to the customer, which they can then use that product. The distributors or the resellers can't use the product.
IANA and RIRs
It all begins with the IANA. For IPv4 we have 232 IP addresses. Not all of these addresses can be used. But the sole authority over all these IP Addresses lies with the IANA. Now even the entire pool for IPv6 lies with the IANA. The distribution of these IP Addresses is done in a hierarchical manner. The first level of IP Address Allocation is done by the Regional Internet Registries. There are currently 5 Regional Internet Registries, each covering a specific portion of our world. These regional registries have evolved over time.The 5 regional internet registries currently are: -
- African Network Information Centre (AfriNIC) for Africa
- American Registry for Internet Numbers (ARIN) for the United States, Canada, several parts of the Caribbean region, and Antarctica.
- Asia-Pacific Network Information Centre (APNIC) for Asia, Australia, New Zealand, and neighboring countries
- Latin America and Caribbean Network Information Centre (LACNIC) for Latin America and parts of the Caribbean region
- Réseaux IP Européens Network Coordination Centre (RIPE NCC) for Europe, Russia, the Middle East, and Central Asia
IANA allocates blocks of IP Addresses to these RIRs as on when needed according to these policy guidelines. These policy guidelines are for IPv4 allocations. Now as entire IPv4 address space has been allocated, with no unallocated IP Address now, these policy guidelines have been obsolete. The new allocation guidelines for IPv6 can be found here.
Other Internet Registries
The Regional Internet Registries then further allocates the IP Addresses allocated to then by IANA, to the National Internet Registries, or even directly to Internet Service Providers. The Hierarchy may even come down from National Internet Registries to Local Internet Registries and then to the ISPs. So the allocation chain goes something like this -
IANA -> Regional Internet Registrars -> National Internet Registrars/Large ISPs -> local registrars/ISPs/big companies -> end users/companies/small ISPs
4. IP ADDRESS ASSIGNMENTS
Assignment of an IP Address means giving it to the end user for its use and making it routable in the internet. But who needs an IP Address?
- Ofcourse, these top level organization like IANA, ICANN, RIRs, NIRs, ISPs etc. would also need IP Addresses for their own personal needs e.g. their network equipments, servers etc. So they get a whole block of IP Addresses for themselves.
- Large Organizations (large in the sense that needs big blocks of IP Addresses e.g. web hosting companies, cloud service providers) generally buy IP Addresses directly from the RIRs or the NIRs.
- Companies who need public IP Addresses for their company workstations and servers buy blocks of IP Addresses from ISPs or NIRs.
- Small Companies who just need a few IP Addresses or small block of IP Addresses, buy it from the ISPs and use it with NAT (Network Address Translation) for their offices.
- Home user also needs a single IP Address. They get an IP address from their local ISP on temporary basis, whenever they wish to use the internet.
Assigned IP Addresses when no more needed, can be returned back from where it was purchased, and it could become reusable again.
5. AUTONOMOUS SYSTEMS AND INTERNET ROUTING
Now that we have seen how the IP Addresses are distributed in the internet and who all are responsible for that job, lets us now continue and understand how one network device in the internet finds another.
Routing is a process of finding the best path between two nodes in a network. There are various routing protocols to do this job. But these routing protocols are divided into 2 main categories -
- internal routing protocols e.g. OSPF, RIP, EIGRP etc.
- external routing protocol e.g. BGP (border gateway protocol)
But with what reference do we mean internal and external here? To understand this let us assume a scenario where for all the networks and subnetworks in the internet, we have a unique routing protocol. A network can be divided into subnetworks almost any number of times. Now, once a pool of IP Addresses is assigned to any company, they are free to partition and create subnetworks within that pool of their IP Addresses. So in order for all these networks to be found and identified in the internet they must participate in the routing process. So our routing protocol here must be able to locate and know about every subnetwork we created and give that information to every other network in the whole internet. Imagine how complex and huge the routing information would become. It gets even worse when a company decides to do a slight change in their subnetwork. Companies do need that flexibility to change their network design once in a while in order to fulfill their network requirements. Even such small changes in these subnetworks would be needed to reflect on the entire internet's routing information. This could mean a lot of work for the routers. So what should we do? Autonomous Systems comes to rescue.
Autonomous System is a network or a group of networks under a single administrative domain. Autonomous Systems have a unique routing policy for their networks. Everything inside the Autonomous system is internal. Thus Autonomous system helps to draw a line between the external routing and the internal routing. Routing inside the autonomous systems would be done by the internal routing protocols and the external routing protocol would be responsible for routing between these autonomous systems. The external routers would only see these autonomous systems as group of a few large networks, but internally these networks might be subnetworked as many times and however as they may please without affecting the external routing information. So a large company network can create an autonomous system for itself and then may divide its networks as many times as possible and use any internal routing policy they wish. Autonomous systems help truly reduce the load of a huge routing information by reducing the number of entries of networks and giving part of the responsibility of routing internally to the company possessing that network.
An Autonomous System is identified by a unique 32-bit (previously 16-bit) number called the autonomous system number. The authority for all the ASNs is, without any surprise, with the IANA. The IANA allocates these ASNs in blocks to the RIRs. From the RIRs, it is directly assigned to the organization needing an ASN. Any company or organization needs to fulfill certain criteria and obey these policies to get an ASN from its appropriate RIR. The complete list of all allocated ASN can be found here.
Some very large organizations might want to implement BGP internally also. So they would need private ASNs for their private ASs. So for that reason there exists, like IP Addresses, a range of ASN for private use i.e. 64512-65534.
6. INTERNET BACKBONE
The internet can be imagined as a collection of individual islands called Autonomous Systems separated with each other. But in order to communicate, they must be connected with each other in any way. Now any new device or network that joins the internet, magically gets connected to every other network and device on the internet. What makes it possible is the Internet Backbone.
The Internet Backbone is simply the collection of the physical infrastructure (layer 1, layer 2, or layer 3) that connects one large network (i.e. an autonomous system) with another large network. The majority of these networks are ISPs and NSPs (Network Service Providers), and a few might be other giant companies. The internet backbone is decentralized, distributed and managed by no single organization or entity. A network may be connected with more than one network for redundancy and fault tolerance. There are two ways by which one network can form a connection with another network -
- Transit - Internet transit is a service where one network (say, network A) allows another network (say, network B) to let it pass its network (network B's) traffic through its own network (network A), for a price. Internet Transit service is usually bought by a smaller ISP from a larger ISP. This Larger ISP is called the upstream provider for the smaller ISP. They sell their bandwidth to such smaller ISPs.
- Peering - When the size of two ISP's is almost similar to each other, they connect with each other and let each other's network traffic pass through one another. Such a mutual agreement between the two networks is called peering. Both the ISPs establish such connection without charging each other, but they mutually benefit from each other and get the revenue from their customers. Peering can be done privately, by establishing direct connection between the two networks or it can be done publicly via the Internet Exchange Points.
Internet Exchange Points
Internet Exchange Points or IXPs are physical infrastructure (generally layer 2), that allows various networks to join it and mutually exchange their network traffic. Various ISPs and other networks (Autonomous Systems) participate and get connected at any Internet Exchange points that would let them transfer the traffic directly to other participating networks. No network needs to pay any kind of fees to any other network for data transfers. In this way networks can save money by not using their expensive upstream transit provider's bandwidth and also by increasing reliability by creating more redundant connections.
Types of NetworkDepending on the size of an ISP and how they make connections to other networks, networks are divided into 3 types. These categorization are not standard, but simply casual.
- Tier 1 networks - are at the top of the hierarchy. They don't have any upstream transit connections with any other network, but only peering with similar sized networks. They would mostly have international connections with such large sized ISPs of other countries making international connections.
- Tier 2 networks - make some transit connections and some peering connections with other networks. They are the most common network around.
- Tier 3 networks - have only transit connections with other networks. They are generally local ISP and are small in size.
So ideally what would happen is that traffic generated from any local internet user would first reach its local ISP i.e. tier 3 network (It could be tier 2 network even, but let's consider tier 3 for now). From there it would reach its upstream tier 2 provider. Then ultimately after crossing couple of tier 2 networks it would reach a tier 1 network. From there, to a few other tier 1 networks and then coming downstream to a tier 2 network and finally to some other local ISP which is a tier 3 network, to the destination device. In reality the number of hops it travels would differ vastly depending on where the sender and destination are. For e.g. if the destination is local to the sender then it might reach soon enough from the tier 2 network via an Internet Exchange Point.
7. DOMAIN NAME SYSTEM
The Domain Name System, as we all are familiar with, is something that translates human-readable names into IP Addresses. That something, in its core is a simple protocol, but the whole thing is much more than just a protocol, it's a whole system. So we call it The Domain Name System, instead of the Domain Name Protocol. The DNS is a distributed and Hierarchical system. The domain name system is ironically both, important and unnecessary. Important because without it we possible couldn't remember all those IP Addresses all by ourselves, and unnecessary because the network devices and the computers don't care about the domain name. They need only the IP address.
The Hierarchy and The DNS Root Zone
The Domain name database is in fact so big that making such a huge database hierarchical and distributed was a very wise decision. Any domain name is divided into different levels, each in its own namespace, separated by a dot and traversed from right to left. The right most domain name is called a top level domain, then the second rightmost domain name is called second level domain and so on. The domain name on the left automatically becomes the subdomain of the domain name on its right.
ICANN has created 2 main types of top-level domain names -
- generic top-level domain name e.g. com, edu, net, org etc.
- country code top-level domain name e.g. us (USA), in (India), au (Australia), ae (UAE) etc.
The responsibility of administration and maintenance of various generic top-level domain names has been given to various companies like VeriSign Global Registry Services handles com domain name. And the country code top-level domain names is being handled by the respective countries. The complete list of the generic and country code top level domain names along with its responsible authority can be found here. The second level domain names can be brought from the domain name registrars that are officially accredited by ICANN. A list can be found here. From here on further subdomains fall under the authority of the company or individual who brought that second level domain name. A total of 127 level of subdomains starting from the top level domain name is possible.
Authoritative Name Servers
Authoritative Name Servers are the servers for the Domain Name Protocol. They holds all the information about its DNS Zone and answers queries regarding that zone for which it stands authoritative. e.g. for rawbytes.com. The authoritative name server is say, ns1.rawbytes.com. and ns2.rawbytes.com., that would answer all queries for the DNS zone rawbytes.com., like for www.rawbytes.com., mail.rawbytes.com., fp.rawbytes.com. etc.
At the top of the DNS Hierarchy we have The Root Servers. They say there are 13 Root servers, named from a.root-servers.org to m.root-servers.org, but to be honest that's not the complete truth. There are logically 13 root servers but physically each of the 13 root servers have multiple redundant copies, many spread across the globe, using the anycast routing methodology. These root servers provide everyone with the information of the top level name servers like com, org, net etc. The top level domain name database is administered and managed by various organizations assigned by ICANN. So these organizations will have the information for the second level domain name that are registered as a subdomain of that top level domain name eg. VeriSign Global Registry Services will have the authoritative name server information for the rawbytes.com DNS Zone.
DNS Resolvers are the clients for the Domain Name Protocol. They actually communicate with other name servers and finally give us the IP Address we need. There are two types of DNS Resolving Request - Iterative and recursive. A DNS client can request the name server in any one of these ways -
- Iterative DNS Resolving request - The DNS client asks the name server to provide them with the perfect answer or if they can't then atleast refer another closest name server in the hierarchy to inquire again. The resolver would keep on sending iterative queries to various name servers that it has been referred to by previous name servers it queried until it finds the IP address it seeks or gets an error message. e.g. the DNS resolvers that your ISP assigns you, or many public dns like google dns and open dns perform the iterative search for your domain name.
- Recursive DNS Resolving request - The DNS client asks the name server to provide them with the perfect answer or return them with an error message. The resolver expects the name servers to perform an iterative DNS resolution for them, if they don't know the IP Address, and provide them with the final IP Address it wants. e.g. the operating system or your personal home router sends a recursive DNS request to the name server.
Fully Qualified Domain Name
Fully qualified domain names or the absolute domain names are domain names that uniquely and absolutely identify themselves. Only a fully qualified domain name can be used in a name resolution and is accepted by any resolver. Such names include all their domain levels and cannot arouse ambiguity. The domain names we use and type in the browser are only partially qualified domain names. But don't be scared. For most times the difference between a partially qualified domain name and a fully qualified domain name is just a small dot (also a notation for the DNS root) at the end of any domain name.
e.g. when we type www.rawbytes.com , it's not FQDN. But if we simply add a dot at the end like www.rawbytes.com. It becomes a FQDN. The www.rawbytes.com domain name could actually be a subdomain of another domain name say www.rawbytes.com.xyz.org. and also www.rawbytes.com.abc.net. So just www.rawbytes.com does not uniquely identify itself about where it belongs and can be ambiguous.
But I just said that a resolver won't accept partially qualified domain names and we never type FQDNs on our browser (I bet you don't). So why and how does our domain names ever get resolved? Well, thank thy browser. It knows how lazy we are to put a boring and trailing dot at the end of every domain name we want to surf, and so it puts them for you.
DNS Name Resolution Process
Let's now take a look at the actual name resolution process. Imagine you want to visit a website www.linux.rawbytes.com. Let's find out what happens within that few seconds, the moment you type that domain name into your browser and press enter - till the page gets loaded.
- Your browser with the help of your operating system, will take a look at the host file which every operating system has, for the local name resolution. If it finds there, the story is over. But let's not make things so easy.
- Next it will try to find the domain name www.linux.rawbytes.com. in its local DNS cache. Still no luck.
- It will look for the Primary DNS resolver's address that is set in the networking options of the OS, and send a recursive DNS query to that address.
- If that resolver is the address of a DNS server that your ISP gave you or if you happen to use any public DNS then that DNS resolver will send iterative queries to resolve your domain name. If the resolver set is the address of the local home router, then that router will again send a recursive query for your domain name to the primary DNS resolver's address given by your ISP.
- The DNS resolver now has to iteratively query various name servers to resolve your domain name. But where to start from? Well, remember the dot at the end of every domain name we spoke of as The Root? Its starts from there - The Root Servers. Now how and where to get the addresses of the root servers? Every DNS Resolver must have this file with it. It contains the addresses of all the root servers.
- From the root server, the DNS resolver finds out the name server for the top level domain com and queries it.
- The name server for com domain, will point to the rawbytes.com. name server. After iteratively querying the rawbytes.com. name server it will get the address of linux.rawbytes.com. name server, which will eventually give the final address of the subdomain www, meaning the final IP Address of www.linux.rawbytes.com.
- At every point, the DNS Resolver will try to find the name server address in its cache. if it finds that then it will use it or else will send the query. Also after every query is answered by the name server, the DNS Resolver will save that address to its cache for further use.
Finally the user, ignorant of this whole, lengthy and transparent background process, will get to enjoy the contents of the website.
The Internet has changed the world more than any other technology in the history of mankind. The internet adoption curve is also the fastest among all other great technologies in the history. The most important thing about the internet is that its intelligence is scattered evenly at all points and edges and not just focused in the center. But all the credit for the current state and success of the internet goes only and only to us. We made the internet happen. The internet is truly for the people, of the people and by the people.