2
mr-user
4y

I am thinking of how I can make data upload reliable. I am sure that I am making it more complex as it could be and I need some pointer.

My goal is to have a pause/resume feature in file uploading.

Here is how it would work.

In order to start uploading , you give the server

1) File Name
2) Folder path you want to upload it to
3) Checksum of the file

Here the server will check whether you can upload it to a folder, whether the file have been previously uploaded (by file name and checksum)

If you could actually upload to the folder , server will return "unique file token" , "folder path" , "unique byte token". Let call it init_upload().

The client will use "unique file token" (to identify the file) , folder path (to know where to upload it to) , "unique byte token" and byte[] (data which to actually upload). Let call this operation data_upload().

If the operation is actually complete , server will return new "unique byte token"

Internally it will actually work like this.Let say we want to upload "file.mp3", when the client call init_upload() it will create

file.mp3 and unique_byte_token.file.mp3.

When the client upload data first time , it will append data to unique_byte_token.file.mp3.

When the client upload data second time , it will check whether the "byte token" that client put is the same as previous "unique_byte_token". If it is same ,
1) we move the data from unique_byte_token.file.mp3 to file.mp3
2) Delete unique_byte_token.file.mp3
3) Create new unique_byte_token.file.mp3
4) Append data to unique_byte_token.file.mp3

The reason I am using "byte token" is because I want to check whether previous upload is actually success.

Let say we need to call 50 part of data_upload() will put 49 part to file.mp3 and 1 part to byte_token.file.mp3.

Finally the client need to call data_upload_complete() which will

1) Put reminding 1 part to file.mp3
2) Remove byte_token.file.mp3 as cleanup

Comments
  • 0
    Wait! Does checking just the file size of the uploaded file actually allow pause/resume upload feature?
  • 4
    I don't see why you'd even need that file token stuff. There is only one instance who actually knows whether an upload was successful, and that's the server, so the server should be asked.

    I'd probably go for chunks of a defined size like up to 64k (except the last one, so the chunks need a data length field anyway) with appended CRC32. Now the kicker has to be that the chunks can be transmitted in any desired order, not with an ACK-NACK-protocol where latency would kill your throughput especially on mobile.

    So the plot would be to initiate the upload, and since the client knows the file size, it knows how many chunks it will have, which it tells the server. The server can make a temporary directory like upload-yourfilename with chunk-x fragments and a file storing how many total chunks that should become, assemble that to yourfilename upon completion, and delete the directory.
  • 4
    The client could ask the server at any time which chunk numbers the server already has. Chunks with bad CRC would have been discarded immediately. The chunk list would be in the format of both ranges where possible and individual numbers where not. Like "yourfilename:1-10:13:17-20:24", indicating that chunks 11-12, 14-16 and beyond 24 are still missing.

    Or, if a file is already complete on the server side and therefore no chunks are there, the server could return an answer like "yourfilename:DONE:N:X" with N being the file size in bytes and X its hash. This hash shouldn't be CRC32, rather something like SHA-256. That would allow the client to additionally check whether everything went fine, and even to compare its local file with the server version.
  • 0
    Wouldn't that fuck with the server if someone upstreams a massive file and then just pauses and leaves for a couple hours?
    The server would just wait and waste valuable resources right?
  • 3
    @Ranchu What resources? The ports? You'll always have the slow loris attack, but upon inactivity, the connection could be closed after a timeout. Resuming a partial download needs to work with a new connection anyway.
  • 1
    @Ranchu

    It shouldn't be a problem since the server can drop the connection and incomplete upload after a specific amount of time.
Add Comment