I wanted a solution to grab trailers from Apple so that I can feed them into my SageTV media center and browse them within its library. Since I haven't been able to find a ready-made solution I thought I'd need to cook something up myself.

But, I managed to find a bash script written by someone who uses a different media center program. It works but did a few things I didn't like to the filenames of the trailers, so I've been modifying it.

I'm running the script from cygwin on a windows box, so I don't have a full set of tools that might otherwise be installed on something like my web server (or I would have likely tried redoing the whole thing in PHP. wink

The script is below (I've commented out some functional lines to allow me to test the values of some variables (the part I'm having issues with right now).

Here's also a link to the original:
http://forum.team-mediaportal.com/plugins-47/mytrailers-42622/index11.html#post291349

Basically you set a path to store a DB which keeps track of what's already been downloaded and another path to store the files. You set a parameter which tells the script whether to try and get 1080 versions. It reads in the XML for Apple's trailer RSS feeds and parses out the filenames and other fields.

I am trying to modify the original to create a folder for each trailer and name that folder and the trailer according to the name of the movie. That is, instead of the original filename which doesn't have any spaces and may contain extra characters like "tlra_640w" etc...

I don't know very much about how to use awk or sed, nor much about scripting with bash, those are the reasons I'm posting. smile

Main issue at the moment is being able to parse the fields grabbed from the XML. They're created in TRAILERS and used to have a semicolon between them. I've changed this to a ";field" to see if I can pick out this field separator from any other legitimate use f semicolon within the data.

But it's still failing to get the name "Angels & Demons" which is the first movie name with a space and a semicolon within it. We'll move to the subject of re-encoding the & and similar later.

This line specifically:

Code:
MOVIETITLE=`echo $MOVIE | awk  'BEGIN { FS = ";field" } ; { print $2 }'`


Is only bringing back the word "Angels" from the above. Probably failing because of the space, but I don't know how to make it bring back its results enclosed in quotes.

Code:
#!/bin/bash

GET1080p=0
GETPOSTER=1
SAVEPATH="v:/Movies/zzztrailertest/"
DLDBPATH="d:/AppleTrailers/"

FEEDS="http://www.apple.com/trailers/home/xml/current_720p.xml http://www.apple.com/trailers/home/xml/current.xml"


tail -5000 $DLDBPATH.downloaded.db > $DLDBPATH.downloaded.db.tmp
mv $DLDBPATH.downloaded.db.tmp $DLDBPATH.downloaded.db

for FEEDURL in $FEEDS; do

TRAILERS=`xml sel --net -D -T -t -m "/records/movieinfo"\
 -v "@id" -o ";field"\
 -v "info/title" -o ";field"\
 -v "info/postdate" -o ";field"\
 -v "preview/large" -o ";field"\
 -v "poster/xlarge"\
 -n $FEEDURL`

for MOVIE in $TRAILERS; do

MOVIEID=`echo $MOVIE | awk  'BEGIN { FS = ";field" } ; { print $1 }'`

MOVIETITLE=`echo $MOVIE | awk  'BEGIN { FS = ";field" } ; { print $2 }'`
	MOVIETITLEFILE=`echo $MOVIETITLE |sed 's/.*\///'`

#temporary output to show grabbed title

echo "=======##### Title: $MOVIETITLE -----------------"


POSTDATE=`echo $MOVIE | awk 'BEGIN { FS = ";field" } ; { print $3 }'`

BEXTENSION="[Trailer].mov"

PREVIEW=`echo $MOVIE | awk 'BEGIN { FS = ";field" } ; { print $4 }'`
	PREVIEWFILE=`echo $PREVIEW |sed 's/.*\///' |sed 's/\.mov$/.hdmov/g'`
	NEWPREVIEWNAME="$MOVIETITLE $BEXTENSION"

PREVIEW1080p=`echo $MOVIE | awk 'BEGIN { FS = ";field" } ; { print $4 }' |sed 's/a720p\.mov$/h1080p.mov/g'`
	PREVIEWFILE1080p=`echo $PREVIEW1080p |sed 's/.*\///' |sed 's/\.mov$/.hdmov/g'`
	NEWPREVIEWNAME1080p="$MOVIETITLE $PREVIEWFILE1080p"

POSTER=`echo $MOVIE | awk 'BEGIN { FS = ";field" } ; { print $5 }'`
	NEWPOSTERNAME="folder.jpg"

MOVIESAVEPATH="$SAVEPATH$MOVIETITLE/"

#if ! grep -q "###$MOVIEID.PREVIEW" $DLDBPATH.downloaded.db; then
#	mkdir $MOVIESAVEPATH
#fi

if [ "$GET1080p" -eq "1" ]; then
 if `echo $FEEDURL | grep -q 720p`; then
	if ! grep -q "###$MOVIEID.PREVIEW" $DLDBPATH.downloaded.db; then
#		wget -c -O "$MOVIESAVEPATH$NEWPREVIEWNAME1080p" $PREVIEW1080p; PREVIEWOUT1080p=$?
                if [ $PREVIEWOUT1080p -eq 0 ]; then
                        echo "###$MOVIEID.PREVIEW $NEWPREVIEWNAME1080p" >> $DLDBPATH.downloaded.db
                else
                        echo "##### ID:$MOVIEID URL:$PREVIEW1080p FAILED -- TRYING ORIGINAL 720p URL NEXT"
                fi
	fi
 fi 
fi

	if ! grep -q "###$MOVIEID.PREVIEW" $DLDBPATH.downloaded.db; then
#		wget -c -O "$MOVIESAVEPATH$NEWPREVIEWNAME" $PREVIEW; PREVIEWOUT=$?
    		if [ $PREVIEWOUT -eq 0 ]; then   
       		 	echo "###$MOVIEID.PREVIEW $NEWPREVIEWNAME" >> $DLDBPATH.downloaded.db
    		else
			echo "##### ID:$MOVIEID URL:$PREVIEW FAILED -- RETRY NEXT RUN"
		fi
	else
		echo "##### ID:$MOVIEID NAME:$NEWPREVIEWNAME MARKED DONE  -- SKIPPING"
	fi

	if [ "$GETPOSTER" -eq "1" ]; then
	 if ! grep -q "###$MOVIEID.POSTER" $DLDBPATH.downloaded.db; then
#		wget -c -O "$MOVIESAVEPATH$NEWPOSTERNAME" $POSTER; POSTEROUT=$?
		if [ $POSTEROUT -eq 0 ]; then
			echo "###$MOVIEID.POSTER $NEWPOSTERNAME" >> $DLDBPATH.downloaded.db
		else
			echo "##### $ID:$MOVIEID URL:$POSTER FAILED -- RETRY NEXT RUN"
		fi
	else
		echo "##### ID:$MOVIEID NAME:$NEWPOSTERNAME MARKED DONE -- SKIPPING"
	 fi
	fi

done

done


Sample XML file (this is what it's parsing when it hits Apple's feed):

http://mypocket.com/current_720p.xml.zip


As mentioned, I'll also need to clean up the results by re-encoding things like & back to "&" and I have no idea how to do that from this script. In PHP I can decode those using a function call to html_entity_decode().

One of the remaining things to look at is I don't know if the usage of sed that's specified when grabbing the name is sufficient. Because I'm using the movie name to create a folder and a file, I can't have things like colons, slashes or other invalid characters be used. I can't guess what will be coming up in future movie names, so it's possible that invalid characters may be present either originally as plain text or from decoding any html entities (such as greater-than or less-than).


_________________________
Bruno
Twisted Melon : Fine Mac OS Software