Distributed File System written in C
C 41.2%
Shell 0.6%
Nix 0.5%
Makefile 0.4%
Other 57.3%
77 1 0

Clone this repository

https://tangled.org/stau.space/dfs
git@tangled.org:stau.space/dfs

For self-hosted knots, clone URLs may differ based on your setup.

README.md

Project#

This is a distributed filesystem written in C!

To look at extra information about the executables visit, the executables README.md. To look at extra information about the rest of the project, visit the lib README.md.

For internal use only: link to DFS explanation video.

Preface#

This project ended up being a lot larger than I thought, here is a review of all the files:

Language files blank comment code
C 14 341 157 1760
C/C++ Header 9 218 632 460
Markdown 4 146 0 429
Bourne Shell 1 16 0 39
Nix 1 5 8 35
YAML 1 6 0 34
make 1 8 1 20
Bourne Again Shell 1 3 5 11
SUM: 33 841 2391 3227

Because of this, hunting for documentation of the code became very important to me. However, I will make a distinction between "implementation files" (*.c) and "header files" (*.h). This is necessary because I did not put a lot of comments inside the implementation files, but I did comment absolutely all the header files. I ended up doing this because I realized that I was using the header files as a way to quickly find what a function did and/or how I would use it. Commenting things in the inside of the implementation files was not useful to me.

In conclusion, the implementation files only have comments in places where I noticed that I was doing something weird and I needed to justify that decision. For example, ./src/lib/lib.c:47 has the following lines:

	// INFO: This needs to get done one more time in case there isn't a '/' at
	// the end of the fpath
	if (mkdir(tmp, mode) != 0) {
		if (errno != EEXIST)
			try(-1);
	}
	return true;

At first sight, a line like this would look like a bug to me. So the comment there specifies why that is being done and why it's not a bug.

A consequence of this is that the docementation is easily viewable through mechanisms like Doxygen. Check it out here

Documentation#

You can read the documentation by:

  • simply opening the source files in ./src
  • running make docs. This will create a directory called docs. Open it in your browser like file:///path/to/this/project/docs/index.html.
  • visiting the website: dfs-docs.

NOTE: In order for make docs to work, you must have Doxygen installed. This is why there is a file called ./Doxyfile in the folder. This will put all the documentation inside a folder called ./docs in this project. Open it using your favorite browser!

Usage#

First, check out the different parts in the project:

This will compile all the necessary executables and put them in a directory called: ./.build. The executables provided are:

  • metadata-server
  • data-node
  • ls
  • copy

NOTE: The make all command that this project produces lots of other intermediary files that are useful for debugging and testing. Please, ignore these!

If this is not the first time you run the project, you might want to clear the data directory. In the following configuration, you can do this by simply running make clean-data

The first thing you have to do after compiling everything is create the database. To do this, run the following command:

createdb

This command does not take any parameters.

Then, you have to start the metadata server:

metadata-server Port

where:

  • Port is any valid number that can be a port. This is the port that the metadata server will be listening on.

After starting the metadata server, start a couple data nodes:

data-node IPv4 Port Path Port

where:

  • IPv4 is any valid IPv4 address. This is the IP addresss of the metadata server.
  • Port is any valid number that can be a port. This is the port that the metadata server is listening on.
  • Path is the file path to the data directory for this data node.
  • Port is any valid number that can be a port. This is the port that the data node will be listening on.

You can now copy files to and from the server. To do this, use the copy command:

copy IPv4 Port [-s] Path [-s] Path

where:

  • IPv4 is any valid IPv4 address. This is the IP addresss of the metadata server.
  • Port is any valid number that can be a port. This is the port that the metadata server is listening on.
  • Path is the file path to the source file that you want to copy.
  • Path is the file path to the destination of the file you want to copy. Notice that there is a [-s]. This must only be supplied once. This flag is to indicate that the next path represents a file that is in the server. For example, the following are correct ways to use this command:
copy 136.145.10.2 42069 -s /home/root/.bashrc /home/cheo/.bashrc
copy 136.145.10.2 42069 /etc/passwd -s /home/sona/important_files.txt

The following would be an incorrect way to use this command:

copy 127.0.0.1 58008 -s /home/root/.bashrc -s /home/cheo/.bashrc
# ERROR:             ^^                    ^^
# The -s flag appears twice!

copy 127.0.0.1 58008 /etc/passwd /home/sona/important_files.txt
# ERROR:            ^           ^
# The -s flag does not appear!

To list the files that are on the server you can use the ls command:

ls IPv4 Port

where:

  • IPv4 is any valid IPv4 address. This is the IP addresss of the metadata server.
  • Port is any valid number that can be a port. This is the port that the metadata server is listening on.

Example#

If you want to run this project as an example, try the following:

First, make sure you compile all the executables.

make all

Remember, this will put all the executables in a directory called .build.

Then, you have to create the database:

./.build/createdb

Now, you can start launching things! First, you want to create the metadata server. Like this:

./.build/metadata-server 127.0.0.1 42069

Leave this running in the background. Now, go to a different terminal and launch several data nodes in other terminals:

Terminal 1:

./.build/data-node 127.0.0.1 8001 ./.build/d1 42069

Terminal 2:

./.build/data-node 127.0.0.1 8002 ./.build/d2 42069

Terminal 3:

./.build/data-node 127.0.0.1 8003 ./.build/d3 42069

I recommend that you put the data directory for the data nodes inside .build so that cleaning up is a lot easier.

Now we're ready to start copying files!

In order to do this, use the copy command. For example, let's say that we want to copy this README.md to the server in /tmp. We would do that like this:

./.build/copy 127.0.0.1 42069 README.md -s /tmp/README.md

Then, to verify that the file was copied correctly, you can use the ls command. Do it like this:

./.build/ls 127.0.0.1 42069

This will show an output of:

/tmp/README.md 8421 bytes

Now, we can copy that file back to here:

./.build/copy 127.0.0.1 42069 -s /tmp/README.md README2.md

In order to check that both files are actually the same, use the Linux diff command:

diff -s README.md README2.md

This should say that both files are identical!

Advanced Usage#

I put a file called ./process-compose.yml in the project directory. This file is used by a program called process-compose. The command that that project gives you is called process-compose and it lets you launch several commands at the same time and watch their output. This means that we can use this to launch prepare the project, launch the metadata server and the data nodes correctly and monitor their status.

To do this, install process-compose and then run the following command:

process-compose up --keep-project

The --keep-project flag will make sure that your project doesn't close as soon as the metadata server and data nodes close.

Additionally, you might want to automate copying and testing files. For this I have created the following:

There is a ./test.sh file that can help you test out your files. It will create a 500MB file. I tested this with files up to 5GB so if you want to try that, just add another 0 to the line in ./test.sh that has the following:

cat /dev/random | head -c5000000000 > 5GB.bin