Reading binary files in C++

I learned how to read binary files in C++ today. My function (which creates an MD5 hash of a file) ended up looking a little different to the example that I learned from:

  const int BUFFER_SIZE = 1024;

  int length;
  char buffer[ BUFFER_SIZE ];

  MD5_CTX ctx;
  MD5Init( &ctx );

  ifstream is( path.c_str(), ios::binary );

  while ( is.good() ) {

    is.read( (char *)buffer, BUFFER_SIZE );

    streamsize count = is.gcount();

    MD5Update( &ctx, (unsigned char *)buffer, count );

  }

  if ( is.eof() ) {

    // it's ok, we're at the end of the file

  }
  else if ( is.bad() ) {

    // bad bit is set
    cout << "Bad bit is set while reading '" << path << "'." << endl;

    cout << strerror( errno ) << endl;

    exit ( 1 );

  }
  else if ( is.fail() ) {

    cout << "Fail bit is set while reading '" << path << "'." << endl;

    cout << strerror( errno ) << endl;

    exit( 1 );

  }

  is.close();

File names on Windows

I’ve been reading up on file names in Windows because I’m having a problem with my C++ code processing a file with an odd TM character in the file name. I’m not sure why, but it seems that file names returned by the POSIX readdir function don’t necessarily exist when then given to ifstream.open, or some weird character encoding thing is going on. Hopefully I get to the bottom of it. It’s a complete fluke that I actually had a file this failed on available and did testing on it, lucky I guess.

I did some more reading and discovered that there are ‘wide character’ versions of the file functions for Windows that use UTF-16 encoded strings rather than ‘code page’ encoded strings, I guess. Anyway, I don’t think I’m going to bother with such things, if the file can’t be opened because it has a weird character in it then I’ll just fail with an error message and the user can look at fixing it. This program is only being “developed and tested” in Windows, there are no plans to actually run it on Windows, it will run on Linux, which won’t have this weird character encoding issue.