Project#
This is a distributed filesystem written in C!
To look at extra information about the executables visit, the executables README.md. To look at extra information about the rest of the project, visit the lib README.md.
For internal use only: link to DFS explanation video.
Preface#
This project ended up being a lot larger than I thought, here is a review of all the files:
| Language | files | blank | comment | code |
|---|---|---|---|---|
| C | 14 | 341 | 157 | 1760 |
| C/C++ Header | 9 | 218 | 632 | 460 |
| Markdown | 4 | 146 | 0 | 429 |
| Bourne Shell | 1 | 16 | 0 | 39 |
| Nix | 1 | 5 | 8 | 35 |
| YAML | 1 | 6 | 0 | 34 |
| make | 1 | 8 | 1 | 20 |
| Bourne Again Shell | 1 | 3 | 5 | 11 |
| SUM: | 33 | 841 | 2391 | 3227 |
Because of this, hunting for documentation of the code became very important to
me. However, I will make a distinction between "implementation files" (*.c)
and "header files" (*.h). This is necessary because I did not put a lot of
comments inside the implementation files, but I did comment absolutely all the
header files. I ended up doing this because I realized that I was using the
header files as a way to quickly find what a function did and/or how I would
use it. Commenting things in the inside of the implementation files was not
useful to me.
In conclusion, the implementation files only have comments in places where I
noticed that I was doing something weird and I needed to justify that decision.
For example, ./src/lib/lib.c:47 has the following lines:
// INFO: This needs to get done one more time in case there isn't a '/' at
// the end of the fpath
if (mkdir(tmp, mode) != 0) {
if (errno != EEXIST)
try(-1);
}
return true;
At first sight, a line like this would look like a bug to me. So the comment there specifies why that is being done and why it's not a bug.
A consequence of this is that the docementation is easily viewable through mechanisms like Doxygen. Check it out here
Documentation#
You can read the documentation by:
- simply opening the source files in
./src - running
make docs. This will create a directory calleddocs. Open it in your browser likefile:///path/to/this/project/docs/index.html. - visiting the website: dfs-docs.
NOTE: In order for make docs to work, you must have
Doxygen installed. This is why there is a file called
./Doxyfile in the folder. This will put all the documentation inside a folder
called ./docs in this project. Open it using your favorite browser!
Usage#
First, check out the different parts in the project:
This will compile all the necessary executables and put them in a directory
called: ./.build. The executables provided are:
metadata-serverdata-nodelscopy
NOTE: The make all command that this project produces lots of other
intermediary files that are useful for debugging and testing. Please, ignore
these!
If this is not the first time you run the project, you might want to clear the data directory. In the following configuration, you can do this by simply running
make clean-data
The first thing you have to do after compiling everything is create the database. To do this, run the following command:
createdb
This command does not take any parameters.
Then, you have to start the metadata server:
metadata-server Port
where:
Portis any valid number that can be a port. This is the port that the metadata server will be listening on.
After starting the metadata server, start a couple data nodes:
data-node IPv4 Port Path Port
where:
IPv4is any validIPv4address. This is the IP addresss of the metadata server.Portis any valid number that can be a port. This is the port that the metadata server is listening on.Pathis the file path to the data directory for this data node.Portis any valid number that can be a port. This is the port that the data node will be listening on.
You can now copy files to and from the server. To do this, use the copy
command:
copy IPv4 Port [-s] Path [-s] Path
where:
IPv4is any validIPv4address. This is the IP addresss of the metadata server.Portis any valid number that can be a port. This is the port that the metadata server is listening on.Pathis the file path to the source file that you want to copy.Pathis the file path to the destination of the file you want to copy. Notice that there is a[-s]. This must only be supplied once. This flag is to indicate that the next path represents a file that is in the server. For example, the following are correct ways to use this command:
copy 136.145.10.2 42069 -s /home/root/.bashrc /home/cheo/.bashrc
copy 136.145.10.2 42069 /etc/passwd -s /home/sona/important_files.txt
The following would be an incorrect way to use this command:
copy 127.0.0.1 58008 -s /home/root/.bashrc -s /home/cheo/.bashrc
# ERROR: ^^ ^^
# The -s flag appears twice!
copy 127.0.0.1 58008 /etc/passwd /home/sona/important_files.txt
# ERROR: ^ ^
# The -s flag does not appear!
To list the files that are on the server you can use the ls command:
ls IPv4 Port
where:
IPv4is any validIPv4address. This is the IP addresss of the metadata server.Portis any valid number that can be a port. This is the port that the metadata server is listening on.
Example#
If you want to run this project as an example, try the following:
First, make sure you compile all the executables.
make all
Remember, this will put all the executables in a directory called .build.
Then, you have to create the database:
./.build/createdb
Now, you can start launching things! First, you want to create the metadata server. Like this:
./.build/metadata-server 127.0.0.1 42069
Leave this running in the background. Now, go to a different terminal and launch several data nodes in other terminals:
Terminal 1:
./.build/data-node 127.0.0.1 8001 ./.build/d1 42069
Terminal 2:
./.build/data-node 127.0.0.1 8002 ./.build/d2 42069
Terminal 3:
./.build/data-node 127.0.0.1 8003 ./.build/d3 42069
I recommend that you put the data directory for the data nodes inside .build
so that cleaning up is a lot easier.
Now we're ready to start copying files!
In order to do this, use the copy command. For example, let's say that we want
to copy this README.md to the server in /tmp. We would do that like this:
./.build/copy 127.0.0.1 42069 README.md -s /tmp/README.md
Then, to verify that the file was copied correctly, you can use the ls
command. Do it like this:
./.build/ls 127.0.0.1 42069
This will show an output of:
/tmp/README.md 8421 bytes
Now, we can copy that file back to here:
./.build/copy 127.0.0.1 42069 -s /tmp/README.md README2.md
In order to check that both files are actually the same, use the Linux diff
command:
diff -s README.md README2.md
This should say that both files are identical!
Advanced Usage#
I put a file called ./process-compose.yml in the project directory. This file
is used by a program called process-compose.
The command that that project gives you is called process-compose and it lets
you launch several commands at the same time and watch their output. This means
that we can use this to launch prepare the project, launch the metadata server
and the data nodes correctly and monitor their status.
To do this, install process-compose and then run the following command:
process-compose up --keep-project
The --keep-project flag will make sure that your project doesn't close as soon
as the metadata server and data nodes close.
Additionally, you might want to automate copying and testing files. For this I have created the following:
There is a ./test.sh file that can help you test out your files. It will create
a 500MB file. I tested this with files up to 5GB so if you want to try that,
just add another 0 to the line in ./test.sh that has the following:
cat /dev/random | head -c5000000000 > 5GB.bin