this post was submitted on 25 Jul 2024
7 points (100.0% liked)

Bash

720 readers
1 users here now

Talk about the Bash Shell and Bash scripting

founded 4 years ago
MODERATORS
 

Edit

After a long process of roaming the web, re-runs and troubleshoot the script with this wonderful community, the script is functional and does what it's intended to do. The script itself is probably even further improvable in terms of efficiency/logic, but I lack the necessary skills/knowledge to do so, feel free to copy, edit or even propose a more efficient way of doing the same thing.

I'm greatly thankful to @AernaLingus@hexbear.net, @GenderNeutralBro@lemmy.sdf.org, @hydroptic@sopuli.xyz and Phil Harvey (exiftool) for their help, time and all the great idea's (and spoon-feeding me with simple and comprehensive examples ! )

How to use

Prerequisites:

  • parallel package installed on your distribution

Copy/past the below script in a file and make it executable. Change the start_range/end_range to your needs and install the parallel package depending on your OS and run the following command:

time find /path/to/your/image/directory/ -type f | parallel ./script-name.sh

This will order only the pictures from your specified time range into the following structure YEAR/MONTH in your current directory from 5 different time tag/timestamps (DateTimeOriginal, CreateDate, FileModifyDate, ModifyDate, DateAcquired).

You may want to swap ModifyDate and FileModifyDate in the script, because ModifyDate is more accurate in a sense that FileModifyDate is easily changeable (as soon as you make some modification to the pictures, this will change to your current date). I needed that order for my specific use case.

From: '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/'

To: '-directory<$DateAcquired/' '-directory<$FileModifyDate/' '-directory<$ModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/'

As per exfitool's documentation:

ExifTool evaluates the command-line arguments left to right, and latter assignments to the same tag override earlier ones.

#!/bin/bash

if [ $# -eq 0 ]; then
    echo "Usage: $0 <filename>"
    exit 1
fi

# Concatenate all arguments into one string for the filename, so calling "./script.sh /path/with spaces.jpg" should work without quoting
filename="$*"

start_range=20170101
end_range=20201230

FIRST_DATE=$(exiftool -m -d '%Y%m%d' -T -DateTimeOriginal -CreateDate -FileModifyDate -DateAcquired -ModifyDate "$filename" | tr -d '-' | awk '{print $1}')

if [[ "$FIRST_DATE" != '' ]] && [[ "$FIRST_DATE" -gt $start_range ]] && [[ "$FIRST_DATE" -lt $end_range ]]; then
        exiftool -api QuickTimeUTC -d %Y/%B '-directory<$DateAcquired/' '-directory<$ModifyDate/' '-directory<$FileModifyDate/' '-directory<$CreateDate/' '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename"

else
        echo "Not in the specified time range"

fi



Hi everyone !

Please no bash-shaming, I did my outmost best to somehow put everything together and make it somehow work without any prior bash programming knowledge. It took me a lot of effort and time.

While I'm pretty happy with the result, I find the execution time very slow: 16min for 2288 files.

On a big folder with approximately 50,062 files, this would take over 6 hours !!!

If someone could have a look and give me some easy to understand hints, I would greatly appreciate it.

What Am I trying to achieve ?

Create a bash script that use exiftool to stripe the date from images in a readable format (20240101) and compare it with an end_range to order only images from that specific date range (ex: 2020-01-01 -> 2020-12-30).

Also, some images lost some EXIF data, so I have to loop through specific time fields:

  • DateTimeOriginal
  • CreateDate
  • FileModifyDate
  • DateAcquired

The script in question

#!/bin/bash

shopt -s globstar

folder_name=/home/user/Pictures
start_range=20170101
end_range=20180130


for filename in $folder_name/**/*; do

	if [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal "$filename") =~ ^[0-9]+$ ]]; then
		DateTimeOriginal=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateTimeOriginal "$filename")
	        if  [ "$DateTimeOriginal" -gt $start_range ] && [ "$DateTimeOriginal" -lt $end_range ]; then
			/usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateTimeOriginal/' '-FileName=%f%-c.%e' "$filename"
			echo "Found a value"
		echo "Okay its $(tput setab 22)DateTimeOriginal$(tput sgr0)"

		fi

        elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -CreateDate "$filename") =~ ^[0-9]+$ ]]; then
                CreateDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -CreateDate "$filename")
                if  [ "$CreateDate" -gt $start_range ] && [ "$CreateDate" -lt $end_range ]; then
                        /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$CreateDate/' '-FileName=%f%-c.%e' "$filename"
                        echo "Found a value"
                echo "Okay its $(tput setab 27)CreateDate$(tput sgr0)"
                fi

        elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -FileModifyDate "$filename") =~ ^[0-9]+$ ]]; then
                FileModifyDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -FileModifyDate "$filename")
                if  [ "$FileModifyDate" -gt $start_range ] && [ "$FileModifyDate" -lt $end_range ]; then
                        /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$FileModifyDate/' '-FileName=%f%-c.%e' "$filename"
                        echo "Found a value"
                echo "Okay its $(tput setab 202)FileModifyDate$(tput sgr0)"
                fi


        elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateAcquired "$filename") =~ ^[0-9]+$ ]]; then
                DateAcquired=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateAcquired "$filename")
                if  [ "$DateAcquired" -gt $start_range ] && [ "$DateAcquired" -lt $end_range ]; then
                        /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateAcquired/' '-FileName=%f%-c.%e' "$filename"
                        echo "Found a value"
                echo "Okay its $(tput setab 172)DateAcquired(tput sgr0)"
                fi

        elif [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -ModifyDate "$filename") =~ ^[0-9]+$ ]]; then
                ModifyDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -ModifyDate "$filename")
                if  [ "$ModifyDate" -gt $start_range ] && [ "$ModifyDate" -lt $end_range ]; then
                        /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$ModifyDate/' '-FileName=%f%-c.%e' "$filename"
                        echo "Found a value"
                echo "Okay its $(tput setab 135)ModifyDate(tput sgr0)"
                fi

        else
                echo "No EXIF field found"

done

Things I have tried

  1. Reducing the number of if calls

But it didn't much improve the execution time (maybe a few ms?). The syntax looks way less readable but what I did, was to add a lot of or ( || ) in the syntax to reduce to a single if call. It's not finished, I just gave it a test drive with 2 EXIF fields (DateTimeOriginal and CreateDate) to see if it could somehow improve time. But meeeh :/.

#!/bin/bash

shopt -s globstar

folder_name=/home/user/Pictures
start_range=20170101
end_range=20201230

for filename in $folder_name/**/*; do

        if [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -DateTimeOriginal "$filename") =~ ^[0-9]+$ ]] || [[ $(/usr/bin/vendor_perl/exiftool -m -d '%Y%m%d' -T -CreateDate "$filename") =~ ^[0-9]+$ ]]; then
                DateTimeOriginal=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -DateTimeOriginal "$filename")
		CreateDate=$(/usr/bin/vendor_perl/exiftool -d '%Y%m%d' -T -CreateDate "$filename")
                if  [ "$DateTimeOriginal" -gt $start_range ] && [ "$DateTimeOriginal" -lt $end_range ] || [ "$CreateDate" -gt $start_range ] && [ "$CreateDate" -lt $end_range ]; then
                        /usr/bin/vendor_perl/exiftool -api QuickTimeUTC -r -d %Y/%B '-directory<$DateTimeOriginal/' '-directory<$CreateDate/' '-FileName=%f%-c.%e' "$filename"
                        echo "Found a value"
                echo "Okay its $(tput setab 22)DateTimeOriginal$(tput sgr0)"

                else
			echo "FINISH YOUR SYNTAX !!"
		fi

	fi
done

  1. Playing around with find

To recursively find my image files in all my folders I first tried the find function, but that gave me a lot of headaches... When my image file name had some spaces in it, it just broke the image path strangely... And all answers I found on the web were gibberish, and I couldn't make it work in my script properly... Lost over 4 yours only on that specific issue !

To overcome the hurdle someone suggest to use shopt -s globstar with for filename in $folder_name/**/* and this works perfectly. But I have no idea If this could be the culprit of slow execution time?

  1. Changing all [ ] into [[ ]]

That also didn't do the trick.

How to Improve the processing time ?

I have no Idea if it's related to my script or the exiftool call that makes the script so slow. This isn't that much of a complicated script, I mean, it's a comparison between 2 integers not a hashing of complex numbers.

I hope someone could guide me in the right direction :)

Thanks !

you are viewing a single comment's thread
view the rest of the comments
[–] AernaLingus@hexbear.net 2 points 3 months ago (1 children)

Wow, nice find! I was going to handle it by just arbitrarily picking the first tag which ended with CreateDate, FileModifyDate, etc., but this is a much better solution which relies on the native behavior of exiftool. I feel kind of silly for not looking at the documentation more carefully: I couldn't find anything immediately useful when looking at the documentation for the class used in the script (ExifToolHelper) but with the benefit of hindsight I now see this crucial detail about its parameters:

All other parameters are passed directly to the super-class constructor: exiftool.ExifTool.__init__()

And sure enough, that's where the common_args parameter is detailed which handles this exact use case:

common_args (list of str*, or* None.) –

Pass in additional parameters for the stay-open instance of exiftool.

Defaults to ["-G", "-n"] as this is the most common use case.

  • -G (groupName level 1 enabled) separates the output with groupName:tag to disambiguate same-named tags under different groups.

  • -n (print conversion disabled) improves the speed and consistency of output, and is more machine-parsable

Passed directly into common_args property.

As for the renaming, you could handle this by using os.path.exists as with the directory creation and using a bit of logic (along with the utility functions os.path.basename and os.path.splitext) to generate a unique name before the move operation:

# Ensure uniqueness of path
basename = os.path.basename(d['SourceFile'])
filename, ext = os.path.splitext(basename)
count = 1        
while os.path.exists(f'{subdirectory}/{basename}'):
  basename = f'{filename}-{count}{ext}'
  count += 1

shutil.move(d['SourceFile'], f'{subdirectory}/{basename}')
[–] N0x0n@lemmy.ml 2 points 3 months ago* (last edited 3 months ago) (1 children)

Hey ha :) !!

Wow, nice find! I was going to handle it by just arbitrarily picking the first tag which ended with CreateDate, FileModifyDate, etc., but this is a much better solution which relies on the native behavior of exiftool. I feel kind of silly for not looking at the documentation more carefully

Yeah I know that feeling, I posted and add unnecessary noise to Phil Harvey's forum about something I though was a "bug" or odd behavior with EXIF-tool, while it's was just my lacking reading skills... I felt so dumb :/. Because I'm unable to build it up form the ground myself,like you did (great work, thanks again !!), I can only fiddle around and do my best reading the documentation to somehow find my way out. I was pretty happy and had a little surge of dopamine level :D !

#Ensure uniqueness of path
basename = os.path.basename(d['SourceFile'])
...

THAT did the trick ! Thank you. I somehow "wrote" something similar but don't look at it, it's nonfunctional and ugly XD but I gave it a try while roaming the web.

        try:
          shutil.move(d['SourceFile'], subdirectory)
        except:
          i = 0
          while os.path.exists(d['SourceFile']):
            i += 1
            base_name, extension = os.path.splitext(d['SourceFile'])
            new_filename = f"{base_name}-{i}{extension}"
            print (new_filename)
            os.rename(d['SourceFile'], new_filename)
            shutil.move(new_filename, subdirectory)


Final words

First, your script is a bomb ! Blazing fast and does everything I wanted ! And you were right with your first impression and the -stay_open switch. That's what PyExifTool uses under the hood (read is somewhere in the docs)! I gave it try to implement that switch with an arg file in my old/ugly/painful bash scirpt, but didn't worked as expected. I will give it another try sometimes in the near future. Right now I'm exhausted from reading and all the re-runs to troubleshoot and test things and more than happy with your script (thanks again for everything !!!).

Second, I hope you won't be mad, but after a thorough re-reading of the exif-tool documentation and playing around a bit, I even managed to get exif-tool do the same thing, it looks something like this:

exiftool -P -d %Y:%m:%d -if '$DateTimeOriginal gt "2018:01:01" and $DateTimeOriginal lt "2021:01:01"' -api QuickTimeUTC -r '-directory<${DateTimeOriginal#;DateFmt("%Y/%B")}/' '-FileName=%f%-c.%e' .

In plain English this translates to:

Scan recursively current directory, in a specific time range condition formatted to %Y:%m:%d and based on DateTimeOriginal tag. Order all images that respects the condition in a reformatted Year/month directory structure with the DateTimeOriginal tag. Rename the files incrementally if duplicate exist to filename-x+1.extension.

This was the first command I was working on before starting to try a bash script, but It somehow messed up the folder creation, long story short: It was because of how my command formatted the date in the condition (-d %Y/%B): 2018/June gt "2018:01:01" (yeah this will cause some strange behavior xD). However, your script is faster !!!! For the same batch:

2200 files
***
Exif-Tool: 24s
PyExifTool: 11s

Compared to my painful and ugly 11 minutes script... uuuhg !


Again, thank you very much for sharing your knowledge, your help/time and staying with me. 👍 😁 I hope we will meet again and maybe/hopefully have a proper conversation on programming/scripting !

Thanks 🙏.

[–] AernaLingus@hexbear.net 1 points 3 months ago* (last edited 3 months ago)

Yeah I know that feeling, I posted and add unnecessary noise to Phil Harvey's forum about something I though was a "bug" or odd behavior with EXIF-tool, while it's was just my lacking reading skills... I felt so dumb :/

Happens to the best of us! As long as you make a genuine effort to find a solution, I think most people will be happy to help regardless.

As for the version of the unique name code you wrote, you got the spirit! The problem is that the try block will only catch the exception the first time around, so if there are two duplicates the uncaught exception will terminate the script. Separately, when working with exceptions it's important to be mindful of which particular exceptions the code in the try block might throw and when. In this case, if the move is to another directory in the same filesystem, shutil.move will match the behavior of os.rename which throws different types of exceptions depending on what goes wrong and what operating system you're on. Importantly, on Windows, it will throw an exception if the file exists, but this will not generally occur on Unix and the file will be overwritten silently.

(actually, I just realized that this may be an issue with pasting in your Python code messing up the indentation--one of the flaws of Python. If this was your actual code, I think it would work:)

        try:
          shutil.move(d['SourceFile'], subdirectory)
        except:
          i = 0
          while os.path.exists(d['SourceFile']):
            i += 1
            base_name, extension = os.path.splitext(d['SourceFile'])
            new_filename = f"{base_name}-{i}{extension}"
          print(new_filename)
          os.rename(d['SourceFile'], new_filename)
          shutil.move(new_filename, subdirectory)

(oh, and I should have mentioned this earlier, but: for Markdown parsers that support it (including Lemmy and GitHub) if you put the name of the language you're writing in after your opening triple ` (e.g. ```python or ```bash) it'll give you syntax highlighting for that language (although not as complete as what you'd see in an actual code editor))

Really cool that you figured out how to do it with exiftool natively--I'll be honest, I probably wouldn't have persevered enough to come up with that had it been me! Very interesting that it ended up being slower than the Python script, which I wouldn't have expected. One thing that comes to mind is that my script more or less separates the reads and writes: first it reads all the metadata, then it moves all the files (there are also reads to check for file existence in the per-file operations, but my understanding is that this happens in compact contiguous areas of the drive and the amount of data read is tiny). If exiftool performs the entire operation for one file at a time, it might end up being slower due to how storage access works.


Happy to have been able to help! Best of luck to you.