Reading time:
6 minutes
Date:
April 20, 2021

IPFS - InterPlanetary File System

Blockchain
IPFS

Introduction and background

The internet consists of a vast amount of centralized and decentralized nodes talking to each other. The challenges with these networks are the connecting nodes that can disrupt internet connections. Protocol Labs created InterPlanetary File System to solve this. It is a distributed network that connects every node with peer-to-peer and loads data faster than the traditional network. It uses content-based addressing instead of location-based. A file is version controlled and can be accessed based on crypto hashes instead of using DNS to a server and possibly ending up with a broken link or 404 pages [1]. InterPlanetary because the idea would theoretically work anywhere as long as there is an internet connection.

Technical details

NETWORK TOPOLOGY

The IPFS is a distributed network, which means everything is connected. As shown in the first figure, a centralized connection has a single point of failure. If the server goes down, the internet goes down. In the second figure, we can see how a potential server failure would affect the nodes using decentralization. Decentralization gives us a backup or redundancy. In case one server goes down, we connect through the others. IPFS is based on a similar concept, but instead of having more single failures, every device on the internet communicates with each other, known as a distributed network. It means that each peer offers available space on their device to share content and makes it available globally. Instead of crossing the world to ask a server for a picture, you can ask your neighbor with less latency as long as they have the same picture you were asking for [1-2].

Centralized Network
Decentralized Network
Distributed Network

CID

IPFS hashes refer to as Content Identifiers (CID). CID uses cryptographic hashes in the SHA series with SHA2-256 as default and a multi-codec with information about the hash algorithms to encrypt and decrypt information between devices securely [3]. It works similar to the most used P2P service, BitTorrent. If you download a picture of a cat, the content address for that cat.jpeg will be a unique hash value. Single-pixel editing on the cat.jpeg will result in a completely different hash value to keep the file's integrity. Thankfully there is no need to remember the hash addresses. A system called InterPlanetary Name System (IPNS) will help you find the latest version of a file with DNSLink, which uses records to find hashes in human-readable format [4-5]. When you have a small amount of space left on your device, garbage collection will delete all unpinned blocks from your local repo/directory. It is crucial to pin the critical files to ensure they remain on your device and not just as cache, stored temporarily. Pinning makes it possible to access the files at any time, even offline. [6] To find a given CID in several providers (peers), type ipfs dht findprovs <CID> in an ipfs terminal. It would either give you the CID of peers with the same hash or tell you who has a similar hash or anyone who knows that they have a matching hash down the chain. Bitswap gives the data. Bitswap has two jobs

  1. To request blocks you need from your connected peers.
  2. To send blocks, you have to peers who want them.

DAG

Directed Acyclic Graph is a type of Merkle tree where multiple nodes can point at the same child, and circular references are impossible except if you use vulnerable hashes like SHA-1 [7].

The following website visualizes DAG in different ways; https://dag.ipfs.io. I inserted a small picture called ipfs_file.png (6kb), and it looks like this.

DAG Chain Calculation

Look at the figure above to understand how DAG calculates files in chunks. The green boxes represent 512 bytes. As there are ~6000 bytes, there are 12 boxes in total. It wraps in UnixFS, making the total amount of bytes different from raw files due to the wrapping. There are also options like Balanced vs. Trickle nodes. Trickles will be more suitable for streaming because it is faster, while Balanced is more reliable. It can be associated with how UDP/TCP works in terms of prioritization.

DAG Chain Tree

This figure shows the raw format of a balanced DAG tree in 1024 byte chunk size, which results in 6 red boxes in total. As you can see, the number of bytes in total and the raw bytes in the tree are the same. It is because there is no wrapping around the file.  

To be able to handle massive amounts of nodes, IPFS uses Distributed Hash Tables (DHT). Kademlia is one of the algorithm used here, which defines who to tell and who to ask about where to find data over a changing network of peers. It is a protocol for storing and retrieving key/value pairs stored across changing network peers [8]. IPFS, as default, is not meant to share large files because of the way it parses data chunk by chunk. If you want to optimize it for more extensive data, another version called IPFS cluster is fit to handle it. I will not go in-depth on IPFS clusters in this article.

IMPROVEMENTS

The CID comes in two versions. CID v0 and v1. v0 starts with Qm and uses base 58-encoding, used widely in the protocol, but is less flexible. v1 starts with bafy or bafk improved using multibase, which is a dynamic way to find which encoding is used by checking the first chunks of the hash [3].

OS SUPPORT

The protocol is written in Go language and supports Windows, Mac, Linux, and FreeBSD, supporting ARM-based CPUs like Raspberry Pi and smartphones [9].

In March 2020, the Norwegian browser Opera integrated and enabled IPFS by default for Android and was the first browser to do so [10].

SIMILARITIES AND WEAKNESS

We can compare the privacy of IPFS to the TOR network in terms of traceability. It is a one-way track and makes it impossible to trace back to the peer. When we compare nodes, IPFS has (nodes times total devices on IPFS network) while TOR has ~6000 known nodes. To get the best of both worlds, you can use IPFS through a TOR gateway [11] to make it even more private if set up correctly. There are warnings that it is a beta version and should be used with caution. E.g., if a file is hashed using SHA-1, the file is vulnerable to exploitation. Problems with IPFS and TOR are that criminals can misuse privacy to do illegal activities online. If illegal content is shared, every single node will have a copy of that content. If it is a child abuse picture, it can be hard to destroy IPFS due to the hashing and versioning. A person can change a pixel on the picture and continue sharing it with a completely different hash [1].

TOOLS

The two components used to test the IPFS were a Macintosh and a Raspberry Pi 3. The goal was to send three ipfs_file.png (original, copy version, and modified version) from Mac to a raspberry pi. I used the GUI version on Mac to demonstrate the flexibility, while the raspberry used the command line version. You can also use a browser extension, IPFS Cluster, for large files, build your app, and integrate it with IPFS. While Rust was recently announced as a supported language from their blog (18-03-2020), JavaScript and Go language are currently supported.

Installation

INSTALLING IPFS ON MAC

  1. Go to https://github.com/ipfs-shipyard/ipfs-desktop
  2. Download IPFS-Desktop-xxxx.dmg
  3. Open IPFS from applications
  4. IPFS is now successfully running on the Mac

USE IPFS DESKTOP

  1. The status tab shows how the traffic is on the IPFS network and discovered peers.
  2. Files tab is your private directory that keeps your files. Here you can add a file or folder and get the shareable hash value.
  3. The peer tab shows the location of the peers and latency in milliseconds.
IPFS Desktop Version
IPFS Desktop Connected

INSTALLING IPFS ON A RASPBERRY PI 3 (RPI)

Install Node

sudo apt-get install nodejs

Install Go lang

sudo apt-get install golang

Install IPFS

  1. Find go-ipfs here: https://dist.ipfs.io/#go-ipfs
  2. Install ARM version for Linux
  3. Extract file in terminal: tar xzf <filename>
  4. Go to extracted folder and execute file by sudo ./install.sh
  5. Check if it is installed by typing ipfs version

Start using IPFS

  1. Initialize ipfs by typingg
  • ipfs init (one time only)
  • ipfs daemon (every time you want to run IPFS)

IPFS is now running on the RPI

Use this link to find commands: https://docs.ipfs.io/reference/api/cli/.

FILE STRUCTURE AND TRANSFER

I created a folder 4RPI3 with three png files on the GUI version on my Mac. Two of the files were identical with the names ipfs_file.png and ipfs_file_copy.png, while the third file, ipfs_file_mod.png, was modified with just a few pixels. As we can see, the hash value of the modified file is different from the other two.

IPFS File Structure

The Mac shared IPFS hash values securely to the raspberry pi. The hash was added to the rpi by the commands ipfs get <hash value>. List the files in the downloaded folder by using this command; ipfs ls <hash value>. It displayed the same hash values as the GUI version on the Mac. I opened the files in the raspberry home folder, and it successfully transferred the pictures.

IPFS RPI Version

Summary

PROS OF IPFS

It has good integrity due to the hashes CID address, which is one way and unique. You cannot tamper with data without changing the hash. Using IPFS, you have the freedom of information (or speech) to suppress journalists and others in countries like China. By pinning a file, you can access it offline at any time on any device. Services like https://globalupload.io make it easy for people without IPFS installed on the system to access a file with both hash and standard web address.

CONS OF IPFS

It still supports SHA-1, which makes it less secure for the files hashed with this. By default, anyone with the hash can access the file. The purpose of IPFS was first and foremost to be able to share files quickly and serverless. Sharing sensitive information should therefore be avoided using IPFS default state. Ensure confidentiality by using asymmetric encryption to sensitive files. The encryption must be done manually with GPG, which is a drawback. There is not an easy way to limit storage and bandwidth usage. For now, use the commands node.repo.storage_max and node.network_bandwidth_max to solve this limitation. It should be an easier way to do this.

References

All figures used in this article were created by the author.

[1] J. Benet, “IPFS - Content Addressed, Versioned, P2P File System [Draft 3], 2014.”
[2] A. Segall, "Distributed network protocols," in IEEE Transactions on Information Theory, vol. 29, no. 1, pp. 23-35, January 1983.
[3] “Content addressing,” IPFS Docs. [Online]. Available: https://docs-beta.ipfs.io/concepts/content-addressing/#identifier-formats. [Accessed: 01-Apr-2020].
[4] “IPNS,” IPFS Docs. [Online]. Available: https://docs-beta.ipfs.io/concepts/ipns/#example-ipns-setup. [Accessed: 01-Apr-2020].
[5] “DNSLink,” IPFS Docs. [Online]. Available: https://docs-beta.ipfs.io/concepts/dnslink/#publish-using-a-subdomain. [Accessed: 01-Apr-2020].
[6] “Persistence,” IPFS Docs. [Online]. Available: https://docs-beta.ipfs.io/concepts/persistence/#pinning-in-context. [Accessed: 02-Apr-2020].
[7] B. Karrer and M. E. J. Newman, “Random graph models for directed acyclic networks,” Physical Review E, vol. 80, no. 4, 2009.
[8] P. Maymounkov and D. Mazières, “Kademlia: A Peer-to-Peer Information System Based on the XOR Metric,” Peer-to-Peer Systems Lecture Notes in Computer Science IPFS Distributions. [Online]. Available: https://dist.ipfs.io/#go-ipfs. IPFS Blog, pp. 53–65, 2002.
[9] “fs-repo-migrations,” [Accessed: 02-Apr-2020].
[10] D. Ayala, “IPFS in Opera for Android,”. [Online]. Available: https://blog.ipfs.io/2020-03-30-ipfs-in-opera-for-android/. [Accessed: 06-Apr-2020].
[11] “Lesson: Access IPFS content through Tor gateways (experimental),” IPFS Primer. [Online]. Available: https://dweb-primer.ipfs.io/avenues-for-access/tor-gateways. [Accessed: 06-Apr-2020].