Distributed File System written in C
1# Assignment 04: Distributed File systems 2 3University of Puerto Rico at Rio Piedras 4 5Department of Computer Science 6 7CCOM4017: Operating Systems 8 9# Introduction 10 11In this project the student will implement the main components of a file system 12by implementing a simple, yet functional, distributed file system (DFS). The 13project will expand students' knowledge of the main components of a file system 14(inodes, and data blocks), will further develop the student skills in 15inter-process communication, and will increase their system security awareness. 16 17The components to implement are: 18 19* **Metadata server**, which will function as an inodes repository 20* **Data servers**, that will serve as the disk space for file data blocks 21* **List client**, that will list the files available in the DFS 22* **Copy client**, that will copy files from and to the DFS 23 24# Objectives 25 26* Study the main components of a distributed file system 27* Get familiarized with File Management 28* Implementation of a distributed system 29 30# Prerequisites 31 32* Python: 33 * [www.python.org](http://www.python.org/) 34* Python SocketServer library: for **TCP** socket communication. 35 * 36[https://docs.python.org/3/library/socketserver.html](https://docs.python.org/3/ 37library/socketserver.html) 38* uuid: to generate unique IDs for the data blocks 39 * 40[https://docs.python.org/3/library/uuid.html](https://docs.python.org/2/library/ 41uuid.html) 42* **Optionally** you may read about the json and sqlite3 libraries used in the 43skeleton of the program. 44 * 45[https://docs.python.org/3/library/json.html](https://docs.python.org/3/library/ 46json.html) 47 * 48[https://docs.python.org/3/library/sqlite3.html](https://docs.python.org/3/libra 49ry/sqlite3.html) 50 51### **The metadata server's database manipulation functions.** 52 53No expertise in database management is required to accomplish this project. 54However sqlite3 is used to store the file inodes in the metadata server. You 55don't need to understand the functions but you need to read the documentation 56of the functions that interact with the database. The metadata server database 57functions are defined in file mds\_db.py. 58 59#### **Inode** 60 61For this implementation an **inode** consists of: 62 63* File name 64* File size 65* List of blocks 66 67#### **Block List** 68 69The **block list** consists of a list of: 70 71* data node address \- to know the data node the block is stored 72* data node port \- to know the service port of the data node 73* data node block\_id \- the id assigned to the block 74 75Functions: 76 77* AddDataNode(address, port): Adds new data node to the metadata server 78Receives IP address and port. I.E. the information to connect to the data node. 79 80* GetDataNodes(): Returns a list of data node tuples **(address, port)** 81registered. Useful to know to which data nodes the data blocks can be sent. 82* InsertFile(filename, fsize): Insert a filename with its file size into the 83database. 84* GetFiles(): Returns a list of the attributes of the files stored in the DFS. 85(addr, file size) 86* AddBlockToInode(filename, blocks): Add the list of data blocks information of 87a file. The data block information consists of (address, port, block\_id) 88* GetFileInode(filename): Returns the file size, and the list of data block 89information of a file. (fsize, block\_list) 90 91### **The packet manipulation functions:** 92 93The packet library is designed to serialize the communication data using the 94json library. No expertise with json is required to accomplish this assignment. 95These functions were developed to ease the packet generation process of the 96project. The packet library is defined in file Packet.py. 97 98In this project all packet objects have a packet type among the following 99command type options: 100 101* reg: to register a data node 102* list: to ask for a list of files 103* put: to put a files in the DFS 104* get: to get files from the DFS 105* dblks: to add the data block ids to the files. 106 107#### **Functions:** 108 109##### **General Functions** 110 111* getEncodedPacket(): returns a serialized packet ready to send through the 112network. First you need to build the packets. See Build**\<X\>**Packet 113functions. 114* DecodePacket(packet): Receives a serialized message and turns it into a 115packet object. 116* getCommand(): Returns the command type of the packet 117 118##### **Packet Registration Functions** 119 120* BuildRegPacket(addr, port): Builds a registration packet. 121* getAddr(): Returns the IP address of a server. Useful for registration 122packets 123* getPort(): Returns the Port number of a server. Useful for registration 124packets 125 126##### **Packet List Functions** 127 128* BuildListPacket(): Builds a list packet for file listing 129* BuildListResponse(filelist): Builds a list response packet with the list of 130files. 131* getFileArray(): Returns a list of files 132 133##### **Get Packet Functions** 134 135* BuildGetPacket(fname): Builds a get packet to get a file name. 136* BuildGetResponse(metalist, fsize): Builds a list of data node servers with 137the blocks of a file, and the file size. 138* getFileName(): Returns the file name in a packet. 139* getDataNodes(): Returns a list of data servers. 140 141##### **Put Packet Functions (Put Blocks)** 142 143* BuildPutPacket(fname, size): Builds a put packet to put fname and file size 144in the metadata server. 145* getFileInfo(): Returns the file info in a packet. 146* BuildPutResponse(metalist): Builds a list of data node servers where the data 147blocks of a file can be stored. I.E a list of available data servers. 148* BuildDataBlockPacket(fname, block\_list): Builds a data block packet. 149Contains the file name and the list of blocks for the file. See [block 150list](http://ccom.uprrp.edu/~jortiz/clases/ccom4017/asig04/#block_list) to 151review the content of a block list. 152* getDataBlocks(): Returns a list of data blocks 153 154##### **Get Data block Functions (Get Blocks)** 155 156* BuildGetDataBlockPacket(blockid): Builds a get data block packet. Usefull 157when requesting a data block from a data node. 158* getBlockID(): Returns the block\_id from a packet. 159 160# Instructions 161 162Write and complete code for an unreliable and insecure distributed file server 163following the specifications below. 164 165### **Design specifications.** 166 167For this project you will design and complete a distributed file system. You 168will write a DFS with tools to list the files, and to copy files from and to 169the DFS. 170 171Your DFS will consist of: 172 173* A metadata server: which will contain the metadata (inode) information of the 174files in your file system. It will also keep a registry of the data servers 175that are connected to the DFS. 176* Data nodes: The data nodes will contain chunks (some blocks) of the file that 177you are storing in the DFS. 178* List command: A command to list the files stored in the DFS. 179* Copy command: A command that will copy files from and to the DFS. 180 181### **The metadata server** 182 183The metadata server contains the metadata (inode) information of the files in 184your file system. It will also keep a registry of the data servers that are 185connected to the DFS. 186 187Your metadata server must provide the following services: 188 1891. Listen to the data nodes that are part of the DFS. Every time a new data 190node registers to the DFS the metadata server must keep the contact information 191of that data node. This is (IP Address, Listening Port). 192 * To ease the implementation of the DFS, the directory file system must 193contain three things: 194 * the path of the file in the file system (filename) 195 * the nodes that contain the data blocks of the files 196 * the file size 1972. Every time a client (commands list or copy) contacts the metadata server 198for: 199 * get: requesting to read a file: the metadata server must check if the file 200is in the DFS database, and if it is, it must return the nodes with the 201blocks\_ids that contain the file. 202 * put: requesting to write a file: the metadata server must: 203 * insert in the database the path of the new file (with its name), and its 204size. 205 * return a list of available data nodes where to write the chunks of the 206file 207 * dblks: then store the data blocks that have the information of the data 208nodes and the block ids of the file. 209 * list: requesting to list files: 210 * the metadata server must return a list with the files in the DFS and 211their size. 212 213The metadata server must be run: 214 215python meta-data.py \<port, default=8000\> 216 217If no port is specified the port 8000 will be used by default. 218 219### **The data node server** 220 221The data node is the process that receives and saves the data blocks of the 222files. It must first register with the metadata server as soon as it starts its 223execution. The data node receives the data from the clients when the client 224wants to write a file, and returns the data when the client wants to read a 225file. 226 227Your data node must provide the following services: 228 2291. put: Listen to writes: 230 * The data node will receive blocks of data, store them using an unique id, 231and return the unique id. 232 * Each node must have its own block storage path. You may run more than one 233data node per system. 2342. get: Listen to reads 235 * The data node will receive requests for data blocks, and it must read the 236data block, and return its content. 237 238The data nodes must be run: 239 240python data-node.py \<server address\> \<port\> \<data path\> \<metadata 241port,default=8000\> 242 243Server address is the metadata server address, port is the data-node port 244number, data path is a path to a directory to store the data blocks, and 245metadata port is the optional metadata port if it was run in a different port 246other than the default port. 247 248**Note:** Since you most probably do not have many different computers at your 249disposal, you may run more than one data-node in the same computer but the 250listening port and their data block directory must be different. 251 252### **The list client** 253 254The list client just sends a list request to the metadata server and then waits 255for a list of file names with their size. 256 257The output must look like: 258 259```txt 260/home/cheo/asig.cpp 30 bytes 261/home/hola.txt 200 bytes 262/home/saludos.dat 2000 bytes 263``` 264 265The list client must be run: 266 267python ls.py \<server\>:\<port, default=8000\> 268 269Where server is the metadata server IP and port is the metadata server port. If 270the default port is not indicated the default port is 8000 and no ':' character 271is necessary. 272 273### **The copy client** 274 275The copy client is more complicated than the list client. It is in charge of 276copying the files from and to the DFS. 277 278The copy client must: 279 2801. Write files in the DFS 281 * The client must send to the metadata server the file name and size of the 282file to write. 283 * Wait for the metadata server response with the list of available data 284nodes. 285 * Send the data blocks to each data node. 286 * You may decide to divide the file over the number of data servers. 287 * You may divide the file into X size blocks and send it to the data 288servers in round robin. 2892. Read files from the DFS 290 * Contact the metadata server with the file name to read. 291 * Wait for the block list with the bloc id and data server information 292 * Retrieve the file blocks from the data servers. 293 * This part will depend on the division algorithm used in step (1). 294 295The copy client must be run: 296 297Copy from DFS: 298 299python copy.py \<server\>:\<port\>:\<dfs file path\> \<destination file\> 300 301To DFS: 302 303python copy.py \<source file\> \<server\>:\<port\>:\<dfs file path\> 304 305Where server is the metadata server IP address, and port is the metadata server 306port. 307 308# Creating an empty database 309 310The script createdb.py generates an empty database *dfs.db* for the project. 311 312 python createdb.py 313 314# Deliverables 315 316* The source code of the programs (well documented) 317* A README file with: 318 * description of the programs, including a brief description of how they 319work. 320 * who helped you or discussed issues with you to finish the program. 321* Video description of the project with implementation details. Any doubt 322please consult the professor. 323 324# Rubric 325 326* (10 pts) the programs run 327* (80 pts) quality of the working solutions 328 * (20 pts) Metadata server implemented correctly 329 * (25 pts) Data server implemented correctly 330 * (10 pts) List client implemented correctly 331 * (25 pts) Copy client implemented correctly 332* (10 pts) quality of the README 333 * (10 pts) description of the programs with their description. 334* No project will be graded without submission of the video explaining how the 335project was implemented.