bash - How can you tell if a file is a text file without using the file utility? -

i'm in middle of bash script @ point needs distinguish between 2 categories of files: text files vs. non-text files (images, core dumps, binaries).

normally, find out if mystery file foo text file without relying on file name extension, i'd call file foo , see if text somewhere in output.

what can if os doesn't have file utility? edit: alas, not have permission install on os.

i'd prefer fast, local, , portable solution if possible (which work on linux machine vs. sending file computer , asking run file).

installing file should first choice. if not possible, here simple attempt @ testing whether file text or not. following reads first 1000 characters of file , tests presence of non-printable characters:

head -c1000 file | sed 's/[[:print:][:blank:]]//g' | grep -q . && echo "not text"

or:

head -c1000 file | tr -d '[:print:][:blank:]' | grep -q . && echo "not text"

character encoding issues

for above work, sed (in first command above) or tr (in second) needs understand file's character encoding. encoding gnu sed expects depends on current locale , supports many encodings including utf-8. mklement0 notes in comments however, gnu tr not support utf-8. according wikipedia, true of versions of tr:

most versions of tr, including gnu tr , classic unix tr, operate on single-byte characters , not unicode compliant. exception heirloom toolchest implementation, provides basic unicode support.

Autos

Search This Blog

bash - How can you tell if a file is a text file without using the file utility? -

character encoding issues