Many kinds of tracks are left behind when using the Internet as network packets travel through switches, routers, and other types of electronic appliances. With recent revelations that governments and corporations are watching the metadata network traffic of individuals, some knowledge of what this metadata is could benefit a more meaningful discussion of the issues involved concerning the collection of this metadata.
A packet is a small electronic package of information that is sent as a unit across the Internet. Packet sizes vary greatly and can be dynamic depending upon hardware and protocols, but a sample standard is the 1,500 byte maximum size for the Ethernet technology. The amount of text in this paragraph can roughly be compared to the size of a single packet. A network connection is broken down into numerous packets for transmission across the Internet. Even access to a simple, almost empty, web page with Hypertext Transfer Protocol (HTTP) can consist of approximately 50 network packets.
A packet contains data and metadata. For the imaginary packet that contains the above paragraph, the data is the paragraph, itself. The metadata in this imaginary packet is data about the data. An example of metadata for the imaginary packet containing the above paragraph is that its length is 523 characters (counting the opening tab character and spaces). Other examples of metadata for Internet packets are the source and destination of the packet, the type of protocol that is being used to send the packet, and the time the packet is being sent across the network. Sometimes the data part of a packet is called the content.
Some metadata and evidence of network activity are created before a network packet is formed, even before a computer is turned on, leading into the first of 13 kinds of Internet metadata tracks:
1. MAC Address Tracks. A MAC address is an online serial number for the hardware being used to connect to the Internet. MAC stands for Media Access Control and is a unique hexadecimal number that identifies the manufacturer and item. MAC addresses are usually shown in the form 00:00:00:00:00:00 or 00-00-00-00-00-00. MAC addresses are assigned to the subunits on a computer that make the network connections, meaning that a computer can have multiple MAC addresses, say one for a wired connection and another one for a wireless connection.
A MAC address is determined when a piece of hardware is produced and thus exists before a computer is even turned on. One of the purposes of a MAC address is to identify computers from each other that are on the same local network, meaning computers that are attached to the same modem, switch, or router. This would typically be within a home, or within an office. Many switches and routers keep logs of these MAC address connections. Although MAC addresses are used on local networks, records of their use can be propagated beyond the local network, meaning that not only can someone often tell that you are connected to the Internet, but that your network connection can be traced to the exact computer that you are using to connect.
A computer’s hostname is similar to a MAC address but is not used as much in tracing network usage. A hostname is the name given to a computer by a user when a new computer is set up.
2. IP Address Tracks. An IP address is a network address that is used to communicate across long distances of the Internet. IP stands for Internet Protocol and an IP address is like a postal address that tells switches and routers on the Internet where to send network packets. In the United States, IP addresses are typically designated by a grouping of four three-digit numbers, such as 000.000.000.000. 74.125.224.72 is an IP address for Google, for example. IP addresses can be static or dynamic. A static IP address is set by your Local Area Network (LAN) administrator and is entered into the configuration (such as the Control Panel) of your computer. IP addresses are sent long distances across the Internet and can be used to determine general (sometimes specific) locations of network connection endpoints. With an IP address and access to local logs, someone can determine to some extent your location and (with the MAC address) what hardware you used.
3. DNS log Tracks. Domain names make it easier to navigate the Internet than IP addresses. The domain name google.com, for example, is easier to remember than 74.125.224.72. Domain Name System (DNS) appliances keep logs of translations between domain names and IP addresses. When you access google.com, for example, a DNS appliance somewhere notes in a log, along with a timestamp, that you asked for the IP address of Google. (An appliance is, generally speaking, a computer dedicated to a specialized task.)
4. DHCP Log Tracks. If a computer is configured to use dynamic IP addresses, then the computer dynamically obtains an IP address each time that it first connects to a network. The dynamic IP address is assigned by a network appliance using the Dynamic Host Configuration Protocol (DHCP). DHCP assignments and timestamps can be logged so that match-ups can later be made between IP addresses and the MAC addresses of the computers using them at specific times. So, even if your computer is changing IP addresses via DHCP, Internet connections can still be traced back to you when this logging is enabled.
5. Username Tracks. In addition to the username that you log on with, your computer also has a host name that someone gave the computer when the operating system was installed. While these names normally stay locally on a computer, they can be sent across the Internet. Computer logs normally exist on a computer of login times of individual users.
6. Active Directory (AD) Log Tracks. Organizations manage local networks with systems such as Microsoft’s Active Directory, which keeps logs such as when computers are on the network and when users log on and off.
7. Wireless Log Tracks. Anytime that you log on to a wireless access point, such as at home, work, or a coffee shop, logs may be kept of the MAC address of your computer and when you were logged on. Virtual Private Network (VPN) logs with timestamps are also normally kept when you use a VPN to log on to wireless or wired networks.
8. Firewall Log Tracks. Firewalls prevent unwanted network traffic and are generally of two types: host and network. Host firewalls are on a computer itself, and any host firewall logs are normally kept on the same computer where the firewall is running, but can be sent in real time to other locations. Network firewalls are located at borders between subnets, for example between a company’s network and the rest of the Internet. When you access a company’s web site, you are probably going through that company’s firewall before accessing that company’s web site. You may be going through three firewalls: one at that company’s border with the internet, one at that company’s border with its data center, and one on that company’s server which is providing the web site that you are accessing.
IP addresses are subdivided into ports so that multiple network connections, or services, can be made by a single computer at the same time. Your computer can check your email at the same time as you browse the World Wide Web (WWW), for example. Ports are numbered and many ports also have service names. Port 80 is usually assigned to the HTTP service. Port 25 is usually assigned to the Simple Mail Transfer Protocol (SMTP) service. There are 65,536 port numbers. Firewall logs keep track of port numbers as well as IP addresses. The service being used can often be determined by the port number.
Network traffic also consists of various types of protocols, such as Transmission Control Protocol (TCP) and User Datagram Protocol (UDP), which, like ports, can indicate the kind of network transmissions that are occurring.
Firewalls may only be logging suspicious network traffic that indicate possible attacks on their systems, or they could be logging all activity that passes through the firewall. Timestamps are included in the logs.
9. Intrusion Detection System (IDS) Tracks. Many organizations have IDS in addition to firewalls. Generally speaking, while a firewall can log metadata from network packets, IDS can go deeper and look inside the packets for indications of malicious network intrusions (or other things). These entire packets, both data and metadata, can then be logged, including timestamps.
10. Web Log Tracks. Web sites can keep logs with timestamps of all of the web pages that an IP address accesses along with any information sent to that web site. A cookie is just a number or phrase that a web site sends back and forth to a computer in order to keep track of multiple accesses to the web site. These cookies are kept by the web site in a log or database. A referral field is information sent to a web site that indicates the previous web site that you were just on. This way webmasters can keep track of how users get to their web sites.
11. Expanded Email Header Tracks. The email headers that you normally see include To, From, Time, Subject, and Date, but there is a lot more information in email headers than what is normally shown. This additional information is commonly called expanded email header information and can be accessed in various ways depending upon the software that you are using to read your emails. (Do a Google or other search on expanded email header for information on how to look at this additional information.) The information in expanded email headers can be used to trace the origins of emails via IP addresses. Looking at this process in reverse, it means that other people can trace your emails back to you. Sometimes the trace goes all of the way back to the computer used to originate the email; other times the trace only goes back to the server from which the email originated.
12. Net Flow Tracks. Net flow data is voluminous metadata information produced by routers and switches that includes information about every network packet. It can be summarized over time in ways such as how many times each local computer accessed each remote computer out on the Internet. For a large organization this could be a database of 1 billion or more network packet records per day, thus it is an expensive endeavor to keep track of all of this data. Most organizations probably do not keep track of all of this data, but the possibility exists that this information is being collected.
13. Internet Service Provider (ISP) Tracks. The ISP is the company that you connect to the Internet with. This could be the telephone company, the cable TV company, or a company that specializes in Internet networking. Since all of your Internet traffic goes through your ISP, all of your packets can be seen by them--kind of like a telephone man having the capability of being able to listen to your landline telephone calls from the line on a telephone pole. Also, anyone can trace any IP address to the corresponding ISP simply by doing a whois browser search.
Examples of Internet Tracks
Suppose that you send an email to a friend about a political issue and your friend replies by email recommending a blog post on the subject of your discussion. If all device logging was turned on, what tracks have you left?- Your MAC address (#1) was sent to at least the next network device on the way to the first email server.
- Your IP address (#2) was sent in the packets at least as far as the first email server. Then, it is sent in the email extended header to the recipient of the email.
- A log entry was made in your DNS server (#3) that you were looking up the IP address of the email server.
- If you were using a dynamic IP address, then a log entry was made tying your IP address to the MAC address of your computer in a DHCP server (#4).
- Your computer logged your username (#5) when you logged on and your name was passed in the email header along with your From email address.
- If you were using a work computer, then your (and your computer) login times were recorded in your employer’s Active Directory logs (#6).
- If you were using a wireless access point, at work, say, or in a coffee shop, then your MAC address and IP address was stored in the wireless logs (#7). If you used a Virtual Private Network (VPN), say to get a private connection to work before sending the email, then the VPN server logged your IP address and MAC address.
- If any firewalls were between your computer and the email server, then the source and destination IP addresses, ports, and protocols, were logged by the firewalls (#8).
- If an IDS was protecting the email server, then your entire transmission could have been saved by the IDS (#9). If your email was not encrypted, then your email could have been read by whoever has access to the IDS.
- All of the routers and switches between your computer and the email server would have kept track of the source and destination IP addresses, ports, protocols, and how many packets were sent (#12).
- Anyone with access to your IP address could look up your Internet Service Provider (ISP). Your ISP could look at all of your packets (#13).
- If your friend is using a different email server than you are, then extended email header information is forwarded from your email server to the recipient’s email server (#11) and then on to the recipient’s computer.
- Your MAC address (#1) was sent at least as far as the next hop in the network on the way to the vendor’s web site.
- Your IP address (#2) was sent through the Internet to the vendor’s web site.
- A log entry was made in your DNS server (#3) that you were looking up the IP address of the vendor’s web site.
- If you were using a dynamic IP address, then the DHCP server (#4) logged your MAC address and IP address.
- Your computer has a log that you were logged on at the time (#5).
- If you were at work, your (and your computer’s) login times were recorded by your employer (#6) with Active Directory.
- If you were using a wireless access point and/or a VPN, your IP address and MAC address were logged by those devices (#7).
- Firewalls between your computer and the vendor’s web server logged your IP addresses and ports and the protocol used (#8).
- IDSs between your computer and the vendor’s web server could have saved all of the network packets (#9).
- The vendor’s web server logged your IP address, all of the web pages that you accessed, and all information that you sent to the web site, including the book that you were interested in (#10). The vendor’s web site also knows which blog contained the advertisement that you clicked on in order to get to the vendor’s web site. By exchanging cookies with your computer, the vendor’s web site also knows that it is you the next time that you access their web site.
- All of the routers and switches between your computer and the vendor’s web server logged your IP addresses, ports, protocols used, and how many network packets were sent (#12).
- The vendor’s web site can determine your ISP, and your ISP can see all of the network packets that went to the vendor’s web site (#13).
Conclusion
Many kinds of tracks are left behind when using the Internet as network packets travel through switches, routers, and other types of electronic appliances. Some 13 types of them were noted here. Others also exist. With recent revelations that governments and corporations are watching the metadata network traffic of individuals, some knowledge of what this metadata is could benefit a more meaningful discussion of the issues involved concerning the collection of this metadata.Like muddy footprints,
Internet metadata
Shows where you have been.
Suggested Comments:
What other types of metadata tracking are there?