Merkle DAG- The Data Structure of web 3

April 9, 2022

Introduction

It does not matter if you are a programmer or not, you are surrounded by structured data. Various data structures like list, dictionary, trees etc. help us organise information. For example in your phone you have categorised various photos according to the events. You have put photos clicked during diwali in the folder named diwali, photos clicked when you were on trip to Manali in a folder named Manali and so on.

Structuring data enables us to find relevant information more quickly, gives meaning to information and helps us manage large amounts of data. In programming Data structure is everywhere.

If you want to know more about Data Structure, we have discussed it in this article (Data Structures).

In this article our main focus will be to explore DAG (Directed Acyclic Graph) and Merkle DAG.

In this article our main focus will be to explore DAG (Directed Acyclic Graph) and Merkle DAG.

What is a Graph?

A Graph is a non-linear data structure which consists of nodes and edges. Graph shows the relationship between two or more objects.Where relationship refers to edges and nodes represent objects.

Hashing

Graph as a data structure is the most flexible data structure and it can represent almost any kind of data. For example Graphs are ideal for representing highway networks between cities, social networks, and friendship between students in school.

highway_network

Acyclic Graph:

If there are no loops in the graph then the graph is called Acyclic Graph. given any node in the graph, there is no way to navigate from that node back to itself along the graph’s edges.

Let’s understand this with an example. Suppose you place an order from amazon. Your parcel travels from the Amazon office to you. But it doesn’t go back to the Amazon office.

The transport which delivers the parcel goes back to the Amazon office hence it is an example of a cyclic graph. In an acyclic graph the data can move only in one direction i.e from parent node to child node.

Directed Graph:

When you order the parcel you give your delivery address. That address is nothing but the destination of the parcel. That address provides a direction to the delivery van where the parcel is to be delivered. The relationships between nodes (Amazon office and your home) only properly associate in one direction, and this direction is indicated by a single-headed arrow. If each edge in a graph has some sort of direction then it is called directed graph.

Now the Amazon office contains parcels of various customers in your area. So the amazon office will be called parent node and all the customers are called child nodes;

unless any customer is gifting that parcel to their loved ones. In that case the customer who gives the parcel is called intermediate node because now the destination of the parcel is not that of the customer but the person whom they have gifted the parcel.

DAG (Directed Acyclic Graph):

DAG is a combination of directed and acyclic graphs. Let’s understand this with an example. Your grandparents gave birth to your parents and you parents gave birth to you. Now this can’t go in the reverse manner like, you gave birth to your parents and your parents gave birth to your grandparents. That is impossible. This is how DAG works. It flows in just one direction.

Merkle DAG:

In DAG we use an arrow to point out the next node but in real life applications we would not have the luxury to draw an arrow. But we can give our nodes a unique hash which we can use to express an edge from one node to another. Doing this we create a special data structure known as Merkle DAG. Merkle allows us to map large amounts of data and easily identify where changes have occurred.

Let’s first try to understand how the Merkle tree works.

Consider a glass of juice. If you want to encode a juice glass this is how it is done:

img1

If You want to encode 2 juice glasses this is how it’s done.

img2

If You want to encode 4 juice glasses this is how it’s done using Naive approach and Merkle tree.

img3

Now what if one juice glass is changed then how is it going to affect our final hash?

img4

If we want to verify that data which is a part of a bunch represented by a final hash then we have to use the entire dataset. Whereas using a merkle tree we just need to use a little data.

img5

Using Naive approach verifying membership in a bunch requires rerunning the entire process but in Merkle tree the job is done in a few computations.

img5(1)

The above example shows the binary Merkle tree. 10 glasses of juice represent small blocks of a large dataset. If you want to download the entire dataset you need a final hash. If any of the blocks is tampered, it will cause a change in final hash and that change can be traced easily as shown before.

In Merkle DAG first leaves are created or nodes without children and then parent nodes are created, and then leaves are linked to parent nodes. Similar to the Merkle tree, Merkle DAG are immutable. Any change in node can cause a change in final hash creating another DAG.

Merkle-DAGs are similar to Markle trees but balancing is not required. Nodes can have many parent nodes. In Merkle DAG objects are identified using a unique hash called Content Identifier or CID.

Merkle-DAGs are self-verified and immutable structures. The CID of a node is linked to the contents of its payload and those of all its descendants. Thus two nodes with the same CID represent exactly the same DAG.

Merkle DAGs are widely used by Source control systems like git, In distributed databases like Dynamo , In Blockchains and Cryptocurrencies like Bitcoin ETC. IPFS also uses Merkle DAG to store and distribute files. They are key data structures in web 3.

Do follow our LinkedIn page for updates: [ Myraah IO on LinkedIn ]