32 bit - Why is sort.exe truncating large output on 32-bit Windows?
2014-07
We have a data file from a client which is 1,443,777,659 bytes in size.
Sorted output has lines missing and is only 1,269,801,985 bytes in size.
Sample command: sort -k 1,10 -T . -s -i file_to_sort.txt -o out.txt
We've tried on 32-bit Win 7 and XP systems.
We've tried the supplied sort.exe that comes with windows, as well as binaries from UnxUtils and Gnu coreutils.
None give an error, however all result in the exact same output size. I've tried another freeware utility that works but is much slower.
I believe this may be due to a 32-bit limitation, however the file size doesn't seem near any of the usual suspects, and these programs work by writing and merging together smaller files, none of which approach 2 GB in size.
Any tips on how to get to the bottom of this? Thanks.
OK so the issue was not related to the size of the file at all. It seems to be that the file is opened in text mode, and contains a 0x1A (^Z or EOF on Windows) character near the end.
Once it hits this character during input, it stops reading. There's no way around this as there is no flag to open the file as binary.
I should have found this quicker, but it's not so easy to dig around a 1.5GB file :)
Related query: http://stackoverflow.com/questions/13582804/why-can-windows-not-read-beyond-the-0x1a-eof-character-but-unix-can
You don't want to ignore non-printable characters if the file contains them. Drop the -i option and run with LC_ALL=C.
e.g.
export LC_ALL=C
sort -k 1,10 -s <file_to_sort.txt >out.txt
I was looking for Gnu Tools for Windows and came across two links:
http://unxutils.sourceforge.net/
and
http://gnuwin32.sourceforge.net/
Does anyone know what is the difference between them and which one has more comprehensive or better tools ?
Some solutions:
Finally, if you want comprehensive, you're after Cygwin, which is the "standard" method of getting GNU tools on Windows, but is ... rather bulky. And the moment you have some third-party software installed, where the Windows binary was built using Cygwin and which bundles the DLL, you enter DLL hell.
In a comment Madhur Ahuja asks if cygwin is portable - the answer is no and yes. The standard install of cygwin will NOT support portability it relies on a large set of files.
BUT, if you only need a few of the tools that cygwin provides, for example sed, gawk and grep, you can put those and the files they depend on, on a USB and it will work.
The below list of files, all located in the /bin dir of cygwin, will allow you to run - find, gawk, grep, ls and sed from a USB drive.
cyggcc_s-1.dll
cygicons-0.dll
cygiconv-2.dll
cygintl-8.dll
cygpcre-0.dll
cygreadline7.dll
cygsigsegv-2.dll
cygwin1.dll
find.exe
gawk.exe
grep.exe
ls.exe
sed.exe
Recently came to know gow[1]. Installed it and it works quite well. cygwin is too heavy.
MobaXterm
is a rich ftp/ssh/VNC/RDP/Telnet/rsh client and X-server as well, but also has a lot of GNU tools built in to use interactively