Friday 20 November 2009
Brute crawling the App Store.
By Maxime Biais, Friday 20 November 2009 at 11:25 :: iPhone
I was looking for statistics about iPhone applications and raw data to extract my own statistical data. After a quick googling, I didn't found any way to crawl the App Store easily, so I tried to do it quickly.
Each application data is accessible via HTTP on this url: http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=FIXME-ID where FIXME-ID is a number indetifying an application. To make an HTTP request on this url, you have to fake the User-Agent (iTunes/9.0.2 (Macintosh; Intel Mac OS X 10.5.8) AppleWebKit/531.21.8 works).
With theses informations you can write this kind of shell script to brute crawl the App Store:
OUT=downloaded
ENDID=337489918
CURDIR=0
WGETMAX=40
mkdir -p $OUT/$CURDIR
cd $OUT/$CURDIR
i=$ENDID
while [ $i -ge 0 ]; do
if [ $((i % 25)) -eq 0 ]; then
CURDIR=$(($i / 10000))
if [ ! -d ../$CURDIR ]; then
mkdir ../$CURDIR
fi
cd ../$CURDIR
# Limit the maximum number of wget process
nwget=$(ps auxxwww|grep "wget"|grep apple|grep -v grep|wc -l)
if [ $nwget -ge $WGETMAX ]; then
sleep 1
continue
fi
fi
i=$(($i-1))
echo $i
wget "http://itunes.apple.com/WebObjects/MZStore.woa/wa/viewSoftware?id=$i&mt=8" \
-U "User-Agent: iTunes/9.0.2 (Macintosh; Intel Mac OS X 10.5.8) AppleWebKit/531.21.8"\
> /dev/null 2> /dev/null && if [ $(\ls -l "viewSoftware?id=$i&mt=8"|tr -s " "|cut -d " " -f 5) -le 5000 ];\
then rm -f "viewSoftware?id=$i&mt=8"; fi &
done



