Distributed File System written in C
1# Assignment 04: Distributed File systems
2
3University of Puerto Rico at Rio Piedras
4
5Department of Computer Science
6
7CCOM4017: Operating Systems
8
9# Introduction
10
11In this project the student will implement the main components of a file system
12by implementing a simple, yet functional, distributed file system (DFS). The
13project will expand students' knowledge of the main components of a file system
14(inodes, and data blocks), will further develop the student skills in
15inter-process communication, and will increase their system security awareness.
16
17The components to implement are:
18
19* **Metadata server**, which will function as an inodes repository
20* **Data servers**, that will serve as the disk space for file data blocks
21* **List client**, that will list the files available in the DFS
22* **Copy client**, that will copy files from and to the DFS
23
24# Objectives
25
26* Study the main components of a distributed file system
27* Get familiarized with File Management
28* Implementation of a distributed system
29
30# Prerequisites
31
32* Python:
33 * [www.python.org](http://www.python.org/)
34* Python SocketServer library: for **TCP** socket communication.
35 *
36[https://docs.python.org/3/library/socketserver.html](https://docs.python.org/3/
37library/socketserver.html)
38* uuid: to generate unique IDs for the data blocks
39 *
40[https://docs.python.org/3/library/uuid.html](https://docs.python.org/2/library/
41uuid.html)
42* **Optionally** you may read about the json and sqlite3 libraries used in the
43skeleton of the program.
44 *
45[https://docs.python.org/3/library/json.html](https://docs.python.org/3/library/
46json.html)
47 *
48[https://docs.python.org/3/library/sqlite3.html](https://docs.python.org/3/libra
49ry/sqlite3.html)
50
51### **The metadata server's database manipulation functions.**
52
53No expertise in database management is required to accomplish this project.
54However sqlite3 is used to store the file inodes in the metadata server. You
55don't need to understand the functions but you need to read the documentation
56of the functions that interact with the database. The metadata server database
57functions are defined in file mds\_db.py.
58
59#### **Inode**
60
61For this implementation an **inode** consists of:
62
63* File name
64* File size
65* List of blocks
66
67#### **Block List**
68
69The **block list** consists of a list of:
70
71* data node address \- to know the data node the block is stored
72* data node port \- to know the service port of the data node
73* data node block\_id \- the id assigned to the block
74
75Functions:
76
77* AddDataNode(address, port): Adds new data node to the metadata server
78Receives IP address and port. I.E. the information to connect to the data node.
79
80* GetDataNodes(): Returns a list of data node tuples **(address, port)**
81registered. Useful to know to which data nodes the data blocks can be sent.
82* InsertFile(filename, fsize): Insert a filename with its file size into the
83database.
84* GetFiles(): Returns a list of the attributes of the files stored in the DFS.
85(addr, file size)
86* AddBlockToInode(filename, blocks): Add the list of data blocks information of
87a file. The data block information consists of (address, port, block\_id)
88* GetFileInode(filename): Returns the file size, and the list of data block
89information of a file. (fsize, block\_list)
90
91### **The packet manipulation functions:**
92
93The packet library is designed to serialize the communication data using the
94json library. No expertise with json is required to accomplish this assignment.
95These functions were developed to ease the packet generation process of the
96project. The packet library is defined in file Packet.py.
97
98In this project all packet objects have a packet type among the following
99command type options:
100
101* reg: to register a data node
102* list: to ask for a list of files
103* put: to put a files in the DFS
104* get: to get files from the DFS
105* dblks: to add the data block ids to the files.
106
107#### **Functions:**
108
109##### **General Functions**
110
111* getEncodedPacket(): returns a serialized packet ready to send through the
112network. First you need to build the packets. See Build**\<X\>**Packet
113functions.
114* DecodePacket(packet): Receives a serialized message and turns it into a
115packet object.
116* getCommand(): Returns the command type of the packet
117
118##### **Packet Registration Functions**
119
120* BuildRegPacket(addr, port): Builds a registration packet.
121* getAddr(): Returns the IP address of a server. Useful for registration
122packets
123* getPort(): Returns the Port number of a server. Useful for registration
124packets
125
126##### **Packet List Functions**
127
128* BuildListPacket(): Builds a list packet for file listing
129* BuildListResponse(filelist): Builds a list response packet with the list of
130files.
131* getFileArray(): Returns a list of files
132
133##### **Get Packet Functions**
134
135* BuildGetPacket(fname): Builds a get packet to get a file name.
136* BuildGetResponse(metalist, fsize): Builds a list of data node servers with
137the blocks of a file, and the file size.
138* getFileName(): Returns the file name in a packet.
139* getDataNodes(): Returns a list of data servers.
140
141##### **Put Packet Functions (Put Blocks)**
142
143* BuildPutPacket(fname, size): Builds a put packet to put fname and file size
144in the metadata server.
145* getFileInfo(): Returns the file info in a packet.
146* BuildPutResponse(metalist): Builds a list of data node servers where the data
147blocks of a file can be stored. I.E a list of available data servers.
148* BuildDataBlockPacket(fname, block\_list): Builds a data block packet.
149Contains the file name and the list of blocks for the file. See [block
150list](http://ccom.uprrp.edu/~jortiz/clases/ccom4017/asig04/#block_list) to
151review the content of a block list.
152* getDataBlocks(): Returns a list of data blocks
153
154##### **Get Data block Functions (Get Blocks)**
155
156* BuildGetDataBlockPacket(blockid): Builds a get data block packet. Usefull
157when requesting a data block from a data node.
158* getBlockID(): Returns the block\_id from a packet.
159
160# Instructions
161
162Write and complete code for an unreliable and insecure distributed file server
163following the specifications below.
164
165### **Design specifications.**
166
167For this project you will design and complete a distributed file system. You
168will write a DFS with tools to list the files, and to copy files from and to
169the DFS.
170
171Your DFS will consist of:
172
173* A metadata server: which will contain the metadata (inode) information of the
174files in your file system. It will also keep a registry of the data servers
175that are connected to the DFS.
176* Data nodes: The data nodes will contain chunks (some blocks) of the file that
177you are storing in the DFS.
178* List command: A command to list the files stored in the DFS.
179* Copy command: A command that will copy files from and to the DFS.
180
181### **The metadata server**
182
183The metadata server contains the metadata (inode) information of the files in
184your file system. It will also keep a registry of the data servers that are
185connected to the DFS.
186
187Your metadata server must provide the following services:
188
1891. Listen to the data nodes that are part of the DFS. Every time a new data
190node registers to the DFS the metadata server must keep the contact information
191of that data node. This is (IP Address, Listening Port).
192 * To ease the implementation of the DFS, the directory file system must
193contain three things:
194 * the path of the file in the file system (filename)
195 * the nodes that contain the data blocks of the files
196 * the file size
1972. Every time a client (commands list or copy) contacts the metadata server
198for:
199 * get: requesting to read a file: the metadata server must check if the file
200is in the DFS database, and if it is, it must return the nodes with the
201blocks\_ids that contain the file.
202 * put: requesting to write a file: the metadata server must:
203 * insert in the database the path of the new file (with its name), and its
204size.
205 * return a list of available data nodes where to write the chunks of the
206file
207 * dblks: then store the data blocks that have the information of the data
208nodes and the block ids of the file.
209 * list: requesting to list files:
210 * the metadata server must return a list with the files in the DFS and
211their size.
212
213The metadata server must be run:
214
215python meta-data.py \<port, default=8000\>
216
217If no port is specified the port 8000 will be used by default.
218
219### **The data node server**
220
221The data node is the process that receives and saves the data blocks of the
222files. It must first register with the metadata server as soon as it starts its
223execution. The data node receives the data from the clients when the client
224wants to write a file, and returns the data when the client wants to read a
225file.
226
227Your data node must provide the following services:
228
2291. put: Listen to writes:
230 * The data node will receive blocks of data, store them using an unique id,
231and return the unique id.
232 * Each node must have its own block storage path. You may run more than one
233data node per system.
2342. get: Listen to reads
235 * The data node will receive requests for data blocks, and it must read the
236data block, and return its content.
237
238The data nodes must be run:
239
240python data-node.py \<server address\> \<port\> \<data path\> \<metadata
241port,default=8000\>
242
243Server address is the metadata server address, port is the data-node port
244number, data path is a path to a directory to store the data blocks, and
245metadata port is the optional metadata port if it was run in a different port
246other than the default port.
247
248**Note:** Since you most probably do not have many different computers at your
249disposal, you may run more than one data-node in the same computer but the
250listening port and their data block directory must be different.
251
252### **The list client**
253
254The list client just sends a list request to the metadata server and then waits
255for a list of file names with their size.
256
257The output must look like:
258
259```txt
260/home/cheo/asig.cpp 30 bytes
261/home/hola.txt 200 bytes
262/home/saludos.dat 2000 bytes
263```
264
265The list client must be run:
266
267python ls.py \<server\>:\<port, default=8000\>
268
269Where server is the metadata server IP and port is the metadata server port. If
270the default port is not indicated the default port is 8000 and no ':' character
271is necessary.
272
273### **The copy client**
274
275The copy client is more complicated than the list client. It is in charge of
276copying the files from and to the DFS.
277
278The copy client must:
279
2801. Write files in the DFS
281 * The client must send to the metadata server the file name and size of the
282file to write.
283 * Wait for the metadata server response with the list of available data
284nodes.
285 * Send the data blocks to each data node.
286 * You may decide to divide the file over the number of data servers.
287 * You may divide the file into X size blocks and send it to the data
288servers in round robin.
2892. Read files from the DFS
290 * Contact the metadata server with the file name to read.
291 * Wait for the block list with the bloc id and data server information
292 * Retrieve the file blocks from the data servers.
293 * This part will depend on the division algorithm used in step (1).
294
295The copy client must be run:
296
297Copy from DFS:
298
299python copy.py \<server\>:\<port\>:\<dfs file path\> \<destination file\>
300
301To DFS:
302
303python copy.py \<source file\> \<server\>:\<port\>:\<dfs file path\>
304
305Where server is the metadata server IP address, and port is the metadata server
306port.
307
308# Creating an empty database
309
310The script createdb.py generates an empty database *dfs.db* for the project.
311
312 python createdb.py
313
314# Deliverables
315
316* The source code of the programs (well documented)
317* A README file with:
318 * description of the programs, including a brief description of how they
319work.
320 * who helped you or discussed issues with you to finish the program.
321* Video description of the project with implementation details. Any doubt
322please consult the professor.
323
324# Rubric
325
326* (10 pts) the programs run
327* (80 pts) quality of the working solutions
328 * (20 pts) Metadata server implemented correctly
329 * (25 pts) Data server implemented correctly
330 * (10 pts) List client implemented correctly
331 * (25 pts) Copy client implemented correctly
332* (10 pts) quality of the README
333 * (10 pts) description of the programs with their description.
334* No project will be graded without submission of the video explaining how the
335project was implemented.