16 Dec 21 “Subtracting” one file from another

I recently had the occasion to refactor a script (not mine) in which there was some convoluted logic to ensure that the contents of file A (an ascii file) were fully contained within file B (another ascii file). Lots of looping and grepping were the order of the day. I imagined that there had to be better way and first went down the path of using grep with a file as the source of the things I was looking for. However, while that would get me partially there, it would not tell me if File A was wholly contained with file B. I am sure I could made it work (somehow) using this idea, but again, I thought that there has to be a better way.

With the help of Google (because of course), I stumbled across an HP-UX command (not unique to HP-UX of course) that in my 25 years of scripting on the HP-UX platform, I had never encountered nor used before: the comm command. This command lets one implement set logic on ascii files.

Let’s have a look . . .

From the man page for comm:

 NAME
      comm - select or reject lines common to two sorted files

 SYNOPSIS
      comm [-[123]] file1 file2

DESCRIPTION
      comm reads file1 and file2, which should be ordered in increasing
      collating sequence (see sort(1) and Environment Variables below), and
      produces a three-column output:

       Column 1:   Lines that appear only in file1,
       Column 2:   Lines that appear only in file2,
       Column 3:   Lines that appear in both files.

  If - is used for file1 or file2, the standard input is used.

  Options 1, 2, or 3 suppress printing of the corresponding column.
  Thus comm -12 prints only the lines common to the two files; comm -23
  prints only lines in the first file but not in the second; comm -123
  does nothing useful.

So, “comm -12” performs the intersection operation on the files. But “comm -23” should do the trick for what I was after: Subtracting file B from File A. In my use case File B should be a proper subset of File A. I can test for that via:

checkCount=$(comm -23 $FILE_B_SORTED $FILE_A_SORTED | wc -l)

If the resulting count is “0”, I know that File B is wholly contained within File B. If the count is not “0”, then I can consume what is in File B but not in File A and take appropriate action.

Note that the two ASCII files need to be in sorted order – easy enough. Here is an example of all of this put together to accomplish of “subtracting File B from File A”:

FILE_B_SORTED=$(mktemp)
FILE_A_SORTED=$(mktemp)

sort -u $FILE_B > $FILE_B_SORTED
sort -u $FILE_A > $FILE_A_SORTED

#
# 'comm -23 file1 file2' says show me all the lines that appear in
# file2 but not in file1.  Thus, here, we are ensuring that there is
# nothing in the file2 that is NOT in file1.
#
# We count how many such lines there are - we expect there to be zero.
#

checkCount=$(comm -23 $FILE_B_SORTED $FILE_A_SORTED | wc -l)
if (( checkCount == 0 )); then
   echo "All entries in the File B are in File A  (goodness)"
else
   # Do something with the entries in File B that are not in File A
   checkList=$(mktemp)
   comm -23 $FILE_B_SORTED $FILE_A_SORTED > $checkList
   exec  4< $checkList
   while read entry <&4; do
     # Do whatever is that needs to be done
   done
fi

All good stuff 🙂

Tags: "set logic on files", "subtracting one file from another", hp-ux script

Filed in Scripting with 0 Comments

09 Feb 17 A Script to identify entries for a particular user

Starting a series on automation scripting.

This one is meant to be run from a master of the universe host, eg a host with root public keys placed on all work servers.

cat searchforid.ksh

#!/usr/bin/ksh
#
# test script
#
. ./.scriptenv
# provides standardization for example SSH_CMD="ssh -q -f -o ConnectionAttempts=3 -o ConnectTimeout=10 -o PasswordAuthentication=no -o BatchMode=yes"

LF="${LOGDIR}/${0}.logfile.txt"
> ${LF}
sc=0

uid=$1
date >> ${LF}

awk '{ print $1 }' $serverlist | while read -r hn
do
echo "################### ${hn} searching for user ${uid} ######################"
echo "################### ${hn} searching for user ${uid} ######################" >> ${LF}
if [ "${hn}" != "mygush0" ]
then
  ${SSH_CMD} ${hn} "grep ${uid} /opt/iexpress/sudo/etc/sudoers;grep ${uid} /etc/passwd"
  sleep 5
  ${SSH_CMD} ${hn} "grep ${uid} /opt/iexpress/sudo/etc/sudoers;grep ${uid} /etc/passwd" >> ${LF}

else
  grep ${uid} /opt/iexpress/sudo/etc/sudoers;grep ${uid} /etc/passwd
  grep ${uid} /opt/iexpress/sudo/etc/sudoers >> ${LF};grep ${uid} 
  /etc/passwd >> ${LF}
  echo  
 
 "#######################################################################################################"
echo "#######################################################################################################" >> ${LF}

fi
done
echo "Success count: ${sc} " >> ${LF}

Tags: automation script, hp-ux script, user search script

Filed in Scripting, Systems Administration with 0 Comments

16 Dec 21 “Subtracting” one file from another

09 Feb 17 A Script to identify entries for a particular user

Search Forums

Recent Topics