Synchronizing windows network file share (NFS) to your webserver.

In a recent project that I’ve been working on, users required a way to update documentation and files for their website, mostly PDFs for things such as spec sheets and technical notes. The solution that had been used for years was, send the file to the marketing department, who would put the files into the ticket system of the outside vendor web design firm, who would upload the file via the Drupal 7 CMS and attach it to the content.

Well, how about that for a bottleneck for what should be a pretty simple thing to do?  Lots of manual intervention, e-mail chain, 3rd party, uploads, and not to mention how badly Drupal manages files. So I decided to set about to get this process fixed up an automated, with a little shell scripting project I ended up calling SyncStat.

So here is the main problem that needed solving: “Have several thousand files hosted on the web, sometimes the same file will have updates or a new version, sometimes a file needs to be removed, and make the current process of getting these files there less labor intensive.”

Some of the initial roadblocks to solving this problem.

  1. Many document publishers: There are several individuals that create, edit and update these documents, so the system needs to be universally usable by them, to directly publish their documents to the web.
  2. Document publishers may be non-technical, or non-privileges users that would not have credentials to login to a webserver to upload directly.
  3. Accountability for what gets uploaded and when.

I thought of a few ideas, initially I though using GIT would be great because it provides versioning, tracking, I could use webhooks to get the latest changes pushed to the webserver in real-time.. but the roadblock was the document publishers, getting them setup and training them on using GIT, even with friendly Windows GUIs like SourceTree, it seemed like too much to expect.

So the solution was going to be creating a network share that everyone could dump their files into, a well organized one hopefully with nice and strict file naming conventions, active directory controlled user rights access, pretty much standard for a NFS.  Now the trick would be getting that to sync with the RHEL based webserver where the files would ultimately be accessible.  I found a number of applications and services available for this, but nothing really satisfied me either with cost or licensing, or just general ease of use.  So I hopped on to a utility CentOS VM on the network and started to whip up a script. Some things that would need to be addressed:

  1. Mount a Windows NFS in Linux.
  2. Use rsync to replicate the NFS in the Cloud.
  3. Bonus points:
    1. Keep the cloud relatively up to date at any given time.
    2. Don’t sync if you don’t have to.

Mounting the file share on CentOS 7:

mount -t cifs -o username=<share user>,password=<share password>,domain=example.com,dir_mode=0755,file_mode=0755 //WIN_PC_IP/<share name> /mnt

In order to make this persistent over reboots, add the mount to /etc/fstab

Then there is the rsync command to sync the mounted folder to the cloud server:

rsync -avz --delete --exclude-from "exclude-list.txt" /mnt/myNFS/ myUser@myHost.com:/home/myUser/web/

So we have the basics on how to sync the fileshare to the cloud.  Now just to solve doing it regularly, which is easy enough by making this into a shell script and adding it to cron.  However, there is a problem, we don’t want to run rsync if it’s not necessary, no reason to establish an SSH session if nothing on the file system has changed.  This is a bit of a nitpick really, rsync only updates the file system so it’s not a full 50GB upload every time, but my thinking is why even run rsync if there are no new files on the NFS… so I wrote up this little script.

#!/bin/sh

echo "Checking SyncStat $(date)"

# LOCAL CONFIGURATION
HOME_DIR="/home/myuser/syncstat/"
# To mount an NFS Drive using CIFS
# mount -t cifs -o username=<share user>,password=<share password>,domain=example.com,dir_mode=0755,file_mode=0755 //WIN_PC_IP/<share name> /mnt
DIR_TO_CHECK="/mnt/pub_docs/"
OLD_STAT_FILE="${HOME_DIR}old_stat.txt"
TMP_STAT_FILE="${HOME_DIR}tmp_stat.txt"
# Add any files to exclude from RSYNC to this text file
EXCLUDE_FILE="${HOME_DIR}exclude_list.txt"

# REMOTE CONFIGURATION - assumes SSH KEY has been setup for this user
REMOTE_USER="myuser@myhost.com"
REMOTE_HOME="/home/myuser/web/"

# Get the last check info
if [ -e $OLD_STAT_FILE ]
then
    OLD_STAT=`cat $OLD_STAT_FILE`
else
    OLD_STAT="nothing"
fi

# Query the directory to sync to get the most recently updated file
NEW_STAT=`find $DIR_TO_CHECK -printf "%T@ %Tc %p\n" | sort -n | tail -n 1`
echo $NEW_STAT > $TMP_STAT_FILE
TMP_STAT=`cat $TMP_STAT_FILE`

# Compare OLD vs NEW so see if anything has changed
if [ "$OLD_STAT" != "$TMP_STAT" ]
then
    echo "Contents of directory have changed: "
    # BEGIN Sync and custom code

    rsync -avz --delete --exclude-from "$EXCLUDE_FILE" $DIR_TO_CHECK ${REMOTE_USER}:${REMOTE_HOME}

    # Maybe we want to run some other script on the remote server as well after the rsync completes
    ssh $REMOTE_USER "./runOtherScript.sh"

    # END Sync and update the OLD_STAT_FILE
    echo $NEW_STAT > $OLD_STAT_FILE
fi

echo "Finished SyncStat $(date)"

So, that’s it. Seems to work pretty good. I have it on a 15 minute cron job, using “find” on the mounted directory I get the information about the most recently updated file and compare it to the last check… if it’s different, then we can execute our rsync.

If you can think of any improvements or you think this is just plain overkill when simply running rsync every 15 minutes would have been just fine, feel free to let me know.  I did get to learn a couple things writing this up, mainly that I had never actually mounted a windows network share to linux before.

Leave a Reply

Your email address will not be published. Required fields are marked *