19
Cyanite
7y

Uggg..... I'm trying to encode a binary file in Python which may be an image or may be an executable, and then decode it back into a file (I plan on editing it in the middle, but baby steps for now..) but nothing is working!!

My plan is to:

Open binary file.

Decode as base64, or something else that could easily handle binary.

Convert byte data to string (for editory perpousos - I won't be editing bytes, I'll be doing custom encoding but that's irrelevant for this test)

Convert back to a byte string/array (with .encode(), probably)

Write to file.

I do this, yet the output has been altered... Though I haven't touched anything..

It's so enfuriating.. x.x

Comments
  • 2
    Its altered by the write even if data is identical.
  • 4
    Well maybe it starts the base 64 string with ==? Also why don't you use the binary file directly instead of doubling its size with base64 if you want binary then use binary.
  • 1
    @krister-alm

    How do I get around that?
  • 1
    @ilikeglue

    I need to ensure compatibility with Unicode characters (from non binary files), and also the characters are heavily manipulated in the full version of my program anyways so if it cannot survive this than I'm hopeless.
  • 3
    Python and binary is tricky.

    First of all: Don't use Python2, Python3 has lots of improvements on binary data access.

    I never got binary data working with decode() and encode(). I always use a combination of bytearray() and the struct-module.

    The struct-module can convert from and to binary data, e.g. reading files, converting for internal use or writing data into a file, as long as you know the binary structure of the given data.

    For a fiddling with small chunks of binary data or when there is no structure to handle, I usually use bytearray() to access the bytes directly.

    The binascii-module seems to have some nice functions, but it's always annoying when some functions only use ASCII (0-127).

    I'm really missing a consistent module for binary data handling in python. Often it would be enough to directly access single bytes within any datatype (at least in a string would be nice), like Lua. But I have not found a nice way to do so, there's always some kind of conversion involved.
  • 1
    @ddephor

    I do use Python3. And, could you help me over Telegram? Or send me a pastebin to an example of how to simply read a binary file (say .exe or .png) and then write it back as a new file?
  • 2
    One way could be to read a file in binary mode, which returns bytes() (=immutable), but can be assigned to bytearray for changes.

    Small example:

    #!/usr/bin/python3
    # Content of test.bin: "abcdefg"
    # Hex:0x61626364656667
    with open('test.bin', 'rb') as binFile:
    binData = bytearray( binFile.read() )
    binData[2] = 0xFF

    with open( 'testout.bin', 'wb' ) as binOutFile:
    binOutFile.write( binData )
    # Content of testout.bin: "ab defg"
    # Hex:0x6162FF64656667
  • 1
    @ddephor

    the only edits I did was I changed "test.bin" to "test.png" (my test file) and "testout.bin" to "testout.png".

    I get the error in windows photo viewer:

    testout.png: We can't open this file.
  • 1
    @ddephor

    Ohh!

    It works after commenting out "binData[2] = 0xFF"
  • 1
    @ddephor

    But, now how does one convert to a string and back to bytes..? I HAVE to have a standard string to edit..

    with open('test.png', 'rb') as inFile:
    fileData = inFile.read()

    with open( 'thisisthetestout.png', 'wb' ) as outFile:
    outFile.write(bytearray(fileData))

    # The above code works to write a bytearray to file... However, I need to edit the file... I'll either be opening a standard ASCII text file and then decryping it to using my algo into a ByteArray and writing to a Binary file, or opening a Binary file and encrypting it with my algo into an ASCII encryption ( Example: From: b"\x01\x01\x01\x01\x01\x01\x01\x01\x01\x01" To: "V9iWCo/m8&|k/`7qVM.3K7uZWgK3,eMeck8YEx~jh/NV`UrcCCxTMKHe?.qju<YH,!|qw*-e&O;<rcNnDO<s_;mXqsN+&^AK7plP>69yR,<yP,^/" ) and outputting it into a ASCII text file.

    So I'll need to edit it between opening it and closing it, so it'll need to know how to turn it into a standard string ( str() ) and back.

    Edit: Obviously .encode() didn't work.
  • 2
  • 1
    Quick update: It seems like there is no way to determine the encoding type of bytes from a raw string (at least outside of Linux, and this is a crossplatform app and I dev on Windows) so...

    I'll default write the bytes to file as plain text (UTF-8 or 16) and and have an argument where users can set the encoding to whatever they want.. I might also default the encoding (if the utility has a specific flag or of the output file has a specific exetention) for images.

    This utility is mostly intended for ASCII/Unicode text, so if you want to encode images or .exe files going through extra hastle is expected (most devs wouldn't even support it).
  • 1
    @Cyanite well even if you use Unicode characters in strings in binary it'll still be the same. You just have to know how long a string is. Ffaf is still Ffaf in any binary. Also your other binary stuff can look like Unicode. For example you could have "nahfjehdishxjwh" in a binary file and that could be an int array it depends on how you make sense of it.
  • 1
    @Cyanite Binary representation has no encoding, it's just plain bits/bytes. The usage then defines semantics like encoding, UTF8/16, etc.

    I don't know what you wanna do, but it sounds more like you want to use the data in a specific format, rather than plain bytes. To do so you have to convert the data to your format, or let your code interpret the binary representation in your given format.

    My code was just a small example how it could be done, but it may not fit your needs. Maybe using struct would be better to decode the interesting parts of the data.
    And talking about PNG or other image formats, which can be quite large, it may not be a good idea to read the whole file at once. I don't know PNG internals, but image formats usually have a header describing the file, so it may be better to parse the header and read/modify just the interesting parts of the file. Or maybe use a PNG- or image-library for easy access and modifications of the data.
    But that all depends on your task.
  • 1
    @ddephor

    I'm heading to bed now, but if you're willing, I would love to show you my project and get your opinion on it and advice. Maybe even have you look at some code.

    You obviously know much more about this that me.
  • 0
    @Cyanite Oh, good god, I'm a python noob, I use it mainly for quick hacks and small tools to simplify my work. Sometimes some rapid prototyping as well. It's really handy with so many modules available.

    Feel free to ask, but your github projects look like you're way better at python than I am (additionally I'm syntactically messed up from years of C/C++).
    My main interests are in the embedded world, so I'm used to working with data on binary level and low-level interfaces.
  • 0
    @ddephor

    If your looking at my python projects, firecoder is what I need help with.

    I may be reading/writing ASCII files or Unicode or binary.. I will always be working with custom ASCII encryptions regardless of if the input is ASCII compatable or not (I encode the bytes at Unicode data in a ASCII format or whatever to work around that)
  • 1
    📌
Add Comment