unix - I have file called songs and i'm trying to find the longest and shortest song name using AWK
2014-07
RANK NAME BAND YEAR GENERE DOMESTIC/INTERNATIONAL
206:Reach Out, I'll Be There:The Four Tops:1978:Pop:3/2
207:Bye Bye Love:The Everly Brothers:1950:Classic:3/2
208:Gloria:Them:1965:Classic:1/1
209:In My Room:The Beach Boys:1985:Classic:5/7
210:96 Tears:? & the Mysterians:1964:Classic:20/15
211:Caroline, No:The Beach Boys:1975:Classic:5/7
212:1999:Prince:1958:Classic:5/7
213:Your Cheatin' Heart:Hank Williams:1988:Soul:7/6
214:Rockin' in the Free World:Neil Young:1960:Pop:5/7
215:Sh-Boom:The Chords:1967:Alternative:3/2
216:Do You Believe in Magic:The Lovin' Spoonful:1988:Classic
217:Jolene:Dolly Parton:1998:Classic:7/6
218:Boom Boom:John Lee Hooker:1966:Classic:7/6
Assuming the header is not part of the file:
awk -F: '
NR == 1 {max=$2; min=$2; next}
length($2) > length(max) {max=$2}
length($2) < length(min) {min=$2}
END {print "longest=" max; print "shortest=" min}
' songs
longest=Rockin' in the Free World
shortest=1999
I'm on OS X, and I have a folder that contains a number of subfolders. There are two things I want to do. The first is to ensure that each subfolder has a file in it of the form [subfolder-name].grade.xml, then I need to search and replace within the appropriate file to make a couple of changes.
For the second part, I know how to use sed on an individual file to do what I need to, but I'm having the problem of verifying that the file is there, and then running the command on it. Any tips on doing this would be appreciated.
Note: I don't necessarily need a full answer, especially since I'm trying to learn here. A pointer in the right direction would be nice though.
(I realize that there may be better ways than the command line to accomplish this, but I've needed and will in the future need to do similar things on other Unix-based systems, so I'd rather know :)
#!/bin/bash
# Get directory name from argument, default is .
DIR=${1:-.}
# For each subfolder in DIR (maxdepth limits to one level in depth)
find "${DIR}" -maxdepth 1 -type d | while read dir; do
# Check that there is a file called $dir.grade.xml in this dir
dirname=$(basename "${dir}")
gradefile="${dirname}"/"${dirname}".grade.xml
if [ -e "${gradefile}" ]; then
sed -i .bak "s/foo/bar/g" "${gradefile}"
else
echo "Warning: ${dir} does not contain ${gradefile}"
fi done
Minor tweaks around Raphink's framework.
Key points:
- check directly for file existence with
[ -e filename ]
rather than running ls - put all variables in
${variablename}
; often not strictly neccessary, but avoids ambiguity (${variablename}
and${variable}name
are clearly distinct,$variablename
could mean either) - pass an extension to sed to make backup files. This is both good practice (in case your munging goes wrong), and, on OSX, mandatory (raphink's version interprets
s/foo/bar/g
as being the extension you want on the backup files, then tries to parse the filename as a command. - Okay, I lied, it's not actually mandatory - you could use
sed -i "" "s/foo/bar/g" ${gradefile}
to pass an empty extension, which would cause sed not make a backup.
OK, you need to make a script for that. I'll pick bash for it.
#!/bin/bash
# Get directory name from argument, default is .
DIR=${1:-.}
# For each subfolder in DIR (maxdepth limits to one level in depth)
find $DIR -maxdepth 1 -type d | while read dir; do
# Check that there is a file called *.xml in this dir
if ls $dir/*.xml &>/dev/null; then
# Loop through xml files found in $dir
# or do you actually need to check that there is only ONE file?
for xmlfile in $dir/*xml; do
# Do whatever treatment with sed you wish to do
sed -i 's/foo/bar/g' $xmlfile
done
else
echo "Warning: $dir does not have a *xml file in it"
fi
done
Save that file as a .sh script, then run
$ chown +x yourscript.sh
$ ./yourscript.sh /path/to/dir # Path is optional, defaults to .