Bash: unbom (to remove UTF-8 BOMs)

Tests for and removes UTF8 BOMs.

#!/bin/bash
for F in $1
do
  if [[ -f $F && `head -c 3 $F` == $'\xef\xbb\xbf' ]]; then
      # file exists and has UTF-8 BOM
      mv $F $F.bak
      tail -c +4 $F.bak > $F
      echo "removed BOM from $F"
  fi
done

USAGE: ./unbom *.txt

The magic is tail -c +4 which strips the first 3 bytes.

WordPress thinks these are related 🤷‍♂️

2 thoughts on “Bash: unbom (to remove UTF-8 BOMs)”

Alessandro says:

2015 Sep 30 at 4:27 am

Good & thanks for sharing. You may want to change:
for F in $1
do

to something like:

while [[ x$1 != x ]]
do
F=”$1″
shift

Charles Lakewood says:

2018 Sep 13 at 1:40 pm

Even more terse:
sed -i ‘1s/^\xEF\xBB\xBF//’ $1

Alessandro says:

2015 Sep 30 at 4:27 am

Good & thanks for sharing. You may want to change:
for F in $1
do

to something like:

while [[ x$1 != x ]]
do
F=”$1″
shift
Charles Lakewood says:

2018 Sep 13 at 1:40 pm

Even more terse:
sed -i ‘1s/^\xEF\xBB\xBF//’ $1

mrclay.org

Song ▚ for ▚ a ▚ future ▚ generation

Bash: unbom (to remove UTF-8 BOMs)

WordPress thinks these are related 🤷‍♂️

2 thoughts on “Bash: unbom (to remove UTF-8 BOMs)”

Leave a ReplyCancel reply

Share this:

WordPress thinks these are related 🤷‍♂️

2 thoughts on “Bash: unbom (to remove UTF-8 BOMs)”

Leave a ReplyCancel reply