What is the difference between hex bytes output using different types of encoding schemes in C#? -


consider following c# code

     int x = 126;      string s = "126";      filestream fs = new filestream("test.txt", filemode.create);      streamwriter sw = new streamwriter(fs);      sw.writeline(x);      sw.writeline(s); 

the output(in hex bytes stored in test.txt) 31 32 36 0d 0a 31 32 36 0d 0a

if make changes line 4:

streamwriter sw = new streamwriter(fs, encoding.unicode); 

the output is: ff fe 31 00 32 00 36 00 0d 00 0a 00 31 00 32 00 36 00 0d 00 0a 00

could me logic. there reference regarding different encoding schemes , behavior file systems using c#

i suggest read joel spolsky's excellent article on subject of character sets , encodings. in short:

  • a file sequence of bytes.
  • a string sequence of characters.
  • a character set defines collection of characters , assignes unique code point (an integer represents character - note "integer" not int) each character.
  • when want store string in file, need convert character sequence byte sequence. character sets 256 characters or less, there one-to-one correspondence between characters , bytes, bigger character sets, such unicode, gets more complicated.
  • an encoding defines how code points characters of string should translated bytes.

therefore, when change encoding, same string gets translated different sequence of bytes.

note behavior of character sets , encodings independent of programming language. change how refer , use various encodings , character sets (usually, encoding tied particular character set, selecting encoding implicitly select character set). in c#'s case, encoding.unicode poorly named - it's unicode character set, utf-16le encoding (in every second byte 00 if use english characters).

also, note strings represented char arrays internally in program, each char value represents 2 subsequent bytes utf-16 encoding (so fancy characters might represented 2 char values). can't access array directly, , of string functionality tries abstract away fact. internal encoding doesn't affect how strings written files (either, select encoding manually, or default character set of operation you're invoking - streamwriter utf-8 (thanks @xanatos correction)).


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -