There are some public airfare watchdogs online. But we still have tons of reasons to set up a private one. For example, we do not want to share our email address, considering privacy and/or spam. Additionally, private watchdog has quicker response. And flexible customization is also a strong motivation. In this article, I will provide a solution to set up a simple but effective private airfare watchdog.
Install Firefox. Yes, it is possible to write a program to crawl the airfare information. But the solution with Firefox is more portable and robust. More importantly, it needs less labor hours.
Install an extension for Firefox: iMacros. This is the core component to crawl the airfare information from different agents, such as flychina.com and onetravel.com. We need little programming background to write the script. Actually, the iMacros could record your activities on the webpage, and generate the script automatically. As an example, we consider the open jaw trip from Los Angeles (LAX) to San Francisco (SFO) on 2013/11/11 and from Las Vegas (LAS) to Los Angeles (LAX) on 2013/12/12. The following script gets and stores the airfare quote page of flychina.com and onetravel.com to our specific directory E:/flight/temp/flychina and E:/flight/temp/onetravel respectively.
123456789101112131415161718192021222324252627282930VERSION BUILD=8530828 RECORDER=FXTAB T=1SET !REPLAYSPEED SLOWSET !EXTRACT_TEST_POPUP NOSET !ERRORIGNORE YESURL GOTO=http://www.onetravel.com/default.aspx?tabid=1917TAG POS=1 TYPE=INPUT:TEXT FORM=ID:Form ATTR=ID:ctl09_ctl04_ctl00_repCityList_ctl00_Seg_From CONTENT=LAXTAG POS=1 TYPE=INPUT:TEXT FORM=ID:Form ATTR=ID:ctl09_ctl04_ctl00_repCityList_ctl00_Seg_Date CONTENT=11/11/2013TAG POS=1 TYPE=INPUT:TEXT FORM=ID:Form ATTR=ID:ctl09_ctl04_ctl00_repCityList_ctl00_Seg_To CONTENT=SFOTAG POS=1 TYPE=INPUT:TEXT FORM=ID:Form ATTR=ID:ctl09_ctl04_ctl00_repCityList_ctl01_Seg_From CONTENT=LASTAG POS=1 TYPE=INPUT:TEXT FORM=ID:Form ATTR=ID:ctl09_ctl04_ctl00_repCityList_ctl01_Seg_Date CONTENT=12/12/2014TAG POS=1 TYPE=INPUT:TEXT FORM=ID:Form ATTR=ID:ctl09_ctl04_ctl00_repCityList_ctl01_Seg_To CONTENT=LAXTAG POS=1 TYPE=SELECT FORM=ID:Form ATTR=ID:ctl09_ctl04_ctl00_repCityList_ctl00_Seg_Date_PreferredTime CONTENT=%1100TAG POS=1 TYPE=SELECT FORM=ID:Form ATTR=ID:ctl09_ctl04_ctl00_repCityList_ctl01_Seg_Date_PreferredTime CONTENT=%1100TAG POS=1 TYPE=INPUT:IMAGE FORM=ID:Form ATTR=ID:ctl09_ctl04_ctl00_SearchWAIT SECONDS=180TAG POS=1 TYPE=HTML ATTR=* EXTRACT=HTMSAVEAS TYPE=EXTRACT FOLDER=E:flighttemponetravel FILE=+{{!NOW:yyyymmddhhnnss}}URL GOTO=http://www.flychina.com/lfw/desk.aspx?OneWay=2&lan=TAG POS=1 TYPE=INPUT:TEXT FORM=ID:Desk ATTR=ID:tbFrom CONTENT=LAXTAG POS=1 TYPE=INPUT:TEXT FORM=ID:Desk ATTR=ID:tbTo CONTENT=SFOTAG POS=1 TYPE=INPUT:TEXT FORM=ID:Desk ATTR=ID:DepDate CONTENT=11/11/2013TAG POS=1 TYPE=INPUT:TEXT FORM=ID:Desk ATTR=ID:tbFrom1 CONTENT=LASTAG POS=1 TYPE=INPUT:TEXT FORM=ID:Desk ATTR=ID:tbTo1 CONTENT=LAXTAG POS=1 TYPE=INPUT:TEXT FORM=ID:Desk ATTR=ID:DepDate1 CONTENT=12/12/2014TAG POS=1 TYPE=INPUT:SUBMIT FORM=ID:Desk ATTR=ID:ibDeskSearchWAIT SECONDS=180TAG POS=1 TYPE=HTML ATTR=* EXTRACT=HTMSAVEAS TYPE=EXTRACT FOLDER=E:/flight/temp/flychina FILE=+{{!NOW:yyyymmddhhnnss}}WAIT SECONDS=3600Then we could run this script in loop. But beware of the quota of these sites. Lots of them have a daily maximum allowed search limitation. Once you reached it, the inquiry return nothing but warming message. As far as I know, flychina.com has such limitation. And if you search on onetravel.com every three or four minutes, you could do it for 24hour/day and 7days/week.
If you have any programming background, you will find the script is easy to modify. You could improve it according your specific requirements.
Install Python 2.7.X on your computer.
- Install pygooglevoice (http://code.google.com/p/pygooglevoice/). It is a Python package to use Google Voice. We use it to send SMS notification. But it is a little out of date, because Google changed the API interface. So we need fix the change before being able to use it. The fix solution could be found here (https://code.google.com/p/pygooglevoice/issues/detail?id=76&q=galx). If you do not need the SMS notification, you could just ignore this step. For ignoring the SMS notification, you need to comment 16th line in the source code with “#”.
- Copy the following code onto your computer, and give it the execution right if on *nix platform.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330#=============================================================================# Author: Sheng Yu - https://34.145.67.234# Email : yusheng123 at gmail dot com# Last modified: 2013-11-18 10:01# Filename: flight_monitor_gv.py# Description: a component to monitor the flight's price# Changelog:# 2013-11-12: fix a bug in __get_iti_iflychina and __get_iti_onetravel# When the site returns warming information after search# limitation, the old functions will throw exceptions.## fix a bug in __del__ function of Gmail class and SMS class# In some rare cases, the __del__ will throw an exception as# AttributeError: instance has no attribute 'mailServer' or# 'voice'.## 2013-11-18: fix bug in logging in the Gmail and/or Google Voice. When# no action is taken for a long time, the connection to# server might be lost. Solution: catch such exception, and# reconnect if needed.#=============================================================================import sys, os, time, re, smtplib, getpass, mathfrom email.mime.text import MIMETextfrom email.mime.multipart import MIMEMultipartfrom email.mime.base import MIMEBasefrom email import encoders# If you do NOT want to use SMS notification, comment the following ONE linefrom googlevoice import Voice# Configuration Section# If found flights below this price, send emailTHRESHOLD_EMAIL = 1400# If found flights below this price, send SMSTHRESHOLD_SMS = 1300# Every INTERVAL seconds, scan the fold for new price quoteINTERVAL = 300# Gmail account information# If you left them empty here, the program will ask for them everytimemail_usr = ""mail_pwd = ""# Google Voice account information# If you left them empty here, the program will ask for them everytimesms_usr = ""sms_pwd = ""# END Configuration Sectionclass Gmail:''' A class to send email via Gmail servers'''def __init__(self, usr, pwd):self.usr = usrself.pwd = pwdself.login()returndef login(self):# Logging into the serverself.mailServer = smtplib.SMTP("smtp.gmail.com", 587)self.mailServer.ehlo()self.mailServer.starttls()self.mailServer.ehlo()self.mailServer.login(self.usr, self.pwd)returndef __del__(self):if hasattr(self, 'mailServer'):try:self.mailServer.quit()self.mailServer.close()except:del self.mailServerreturndef send(self, to, subject, text, attach=None):''' send an email to address "to" with text being the bodyIf attach is not None, it will be sent as attachment'''msg = MIMEMultipart()msg['From'] = self.usrmsg['To'] = tomsg['Subject'] = subjectmsg.attach(MIMEText(text))if attach != None:part = MIMEBase('application', 'octet-stream')part.set_payload(open(attach, 'rb').read())encoders.encode_base64(part)part.add_header('Content-Disposition','attachment; filename="%s"' % os.path.basename(attach))msg.attach(part)try:self.mailServer.sendmail(self.usr, to, msg.as_string())except:self.mailServer.close()self.mailServer.login()#Try againself.mailServer.sendmail(self.usr, to, msg.as_string())class SMS:''' A class to send SMS via Gmail Voice'''def __init__(self, usr, pwd):''' Initialize the object with a Google Voice account'''self.usr = usrself.pwd = pwdself.voice = Voice()self.voice.login(usr, pwd)returndef __del__(self):if hasattr(self, 'voice'):try:self.voice.logout()except:del self.voicereturndef send(self, phoneNumber, text):''' send a SMS to phoneNumber with text'''try:self.voice.send_sms(phoneNumber, text)except:self.voice = Voice()self.voice.login(self.usr, self.pwd)self.voice.send_sms(phoneNumber, text)returnclass Quote:''' A parser to get information from a specific file, containingflight quote'''def __init__(self, quote_file):self.filename = quote_fileself.price_finder = {"flychina": self.__get_price_iflychina,"onetravel": self.__get_price_onetravel}self.itinerary_finder = {"flychina": self.__get_iti_iflychina,"onetravel": self.__get_iti_onetravel}if not self.site_name() in self.price_finder:# No such price parser is availableraise Exception("No suitable price parser is installed: %s." % self.site_name())if not self.site_name() in self.itinerary_finder:# No such itinerary parser is availableraise Exception("No suitable itinerary parser is installed: %s." % self.site_name())returndef site_name(self):''' Get the site name of current flight quote'''return os.path.basename(os.path.dirname(self.filename))def check_time(self):''' Get the quote time according to the file name'''# basename(filename) is in form of extract20130326213855.csvwhen = os.path.basename(self.filename)[7:-4]when = when[:4]+"-"+when[4:6]+"-"+when[6:8]+" "+when[8:10]+":"+when[10:12]+":"+when[12:]return whendef get_price(self):''' Get the quote priceThe actual work is done by different specific parser'''price = self.price_finder[self.site_name()]()if price == None:return float('nan')else:return pricedef get_itinerary(self):''' Get the itinerary in the quoteThe actual work is done by different specific parser'''itinerary = self.itinerary_finder[self.site_name()]()if itinerary == None:return "Unknown"else:return itinerarydef __get_price_iflychina(self):''' Parse the HTML file from iflychina.com to get the priceWARMING: this function is easy to be out of date, since thesesignatures might be changed'''pattern = '(?<=<span class=""rsubtitle"">$)[1-9][0-9]{0,2}(,[0-9]{3})*</span> </a>per adult <nobr>(taxes and fees included)</nobr>'fhandle = open(self.filename, "rb")content = fhandle.read()fhandle.close()content = content.decode("utf8")price = re.search(pattern,content)if price != None:price = price.group(0)price = price[:price.find("<")]price = int(price.replace(",",""))return pricedef __get_iti_iflychina(self):''' Parse the HTML file from iflychina.com to get the itineraryWARMING: this function is easy to be out of date, since thesesignatures might be changed'''pattern = re.compile('(?<=()[A-Za-z]{3}(?=))')sig1 = "We are searching thousands of fares to find the perfect deal for your travel dates"sig2 = "Leaving on"fhandle = open(self.filename, "rb")content = fhandle.read()fhandle.close()content = content.decode("utf8")content = content[content.find(sig1)+len(sig1):]trip1 = content[:content.find(sig2)]content = content[content.find(sig2)+len(sig2):] # Skip the second for the first triptrip2 = content[:content.find(sig2)]result = ""temp = pattern.findall(trip1)if len(temp) < 2:result = "N/A"else:result += temp[0] + "-" + temp[1] + "; "temp = pattern.findall(trip2)if len(temp) < 2:result += "N/A"else:result += temp[0] + "-" + temp[1]return resultdef __get_price_onetravel(self):''' Parse the HTML file from onetravel.com to get the priceWARMING: this function is easy to be out of date, since thesesignatures might be changedActually, it seems changed from my previous version'''pattern = '(?<=[0-9].[0-9][0-9]"">)[1-9][0-9]{0,3}(?=</span>)'fhandle = open(self.filename, "rb")content = fhandle.read()fhandle.close()content = content.decode("utf8")sig0 = "Sort by:"sig1 = 'This is an alternate date, please verify the date'sig2 = 'Total per person'content = content[content.find(sig0)+len(sig0):]price = Nonewhile content.find(sig2) != -1:head = content[:content.find(sig2)]content = content[content.find(sig2)+len(sig2):]if head.find(sig1) != -1:continueprice = re.search(pattern,content)breakif price != None:price = int(price.group(0))return pricedef __get_iti_onetravel(self):''' Parse the HTML file from onetravel.com to get the itineraryWARMING: this function is easy to be out of date, since thesesignatures might be changed'''sig = "Available flight deals found from "fhandle = open(self.filename, "rb")content = fhandle.read()fhandle.close()content = content.decode("utf8")if content.find(sig) == -1:result = "N/A"else:content = content[content.find(sig)+len(sig):]result = content[:3] + "-" + content[7:10]return resultdef remove_file(self):''' Remove a specific file on OS'''while os.path.exists(self.filename):os.remove(self.filename)time.sleep(0.3)returndef usage(prog):''' print out the usage information of this program'''print("Usage:",prog,"folder")returnif __name__=="__main__":# Check the argumentsif len(sys.argv) != 2 or not os.path.isdir(sys.argv[1]):usage(sys.argv[0])sys.exit(-1)# Prepare for sending emailsif mail_usr == "" or mail_pwd == "":mail_usr = raw_input("Please input your Gmail acount: ")mail_pwd = getpass.getpass(prompt="Please input your Gmail password: ")to_mail = raw_input("Please input the email to recieve notification: ")try:mailer = Gmail(mail_usr,mail_pwd)except:print "Error in logging in Gmail accoutn."sys.exit(-2)del mail_usr,mail_pwd# Prepare for sending smsif 'Voice' in globals(): # The Google Voice package is importedif sms_usr == "" or sms_pwd == "":sms_usr = raw_input("Please input your Google Voice acount [press enter to skip]: ")if sms_usr != "":sms_pwd = getpass.getpass(prompt="Please input your Google Voice password: ")else: # If we do not have the package, how could we use usr and pwd?sms_usr = ""sms_pwd = ""if sms_usr != "":to_sms = raw_input("Please input the phone number to recieve notification: ")try:messenger = SMS(sms_usr, sms_pwd)except:print "Error in logging in Google Voice accoutn."sys.exit(-3)del sms_usr,sms_pwdpre_price = {}try:while True:for root, dirs, files in os.walk(sys.argv[1]):for file in files:# parse each file with flight quote informationflight_quote = Quote(os.path.join(root,file))site = flight_quote.site_name()if not site in pre_price:pre_price[site] = {}when = flight_quote.check_time()itinerary = flight_quote.get_itinerary()if not itinerary in pre_price[site]:pre_price[site][itinerary] = -1quote = flight_quote.get_price()flight_quote.remove_file()if math.isnan(quote) or quote == pre_price[site][itinerary]:# No change or still meet mistakecontinuepre_price[site][itinerary] = quotenewprice = "Get new price as: "+str(quote)+" in "+site+" at "+when+" ("+itinerary+")"print(newprice)if quote <= THRESHOLD_EMAIL:# Acceptable pricemailer.send(to_mail,"Price Alert!",newprice)if quote <= THRESHOLD_SMS and "messenger" in locals():# Great pricemessenger.send(to_sms,newprice )time.sleep(INTERVAL)except KeyboardInterrupt:print "Good luck!"sys.exit(0)
The directory structure on my computer is like the following:
12345678910E:FLIGHTS│ flight_monitor_gv.py│└─temp├─flychina│ extract20131110231222.csv│└─onetravelextract20131110230536.csvextract20131110230901.csv
Then you need to run the Python script, such as: “E:flights>c:Python27python.exe flight_monitor_gv.py temp”. The UI should be like this:
12345678910E:flights>c:Python27python.exe flight_monitor_gv.py tempPlease input your Gmail acount: XXXXXXX@gmail.comPlease input your Gmail password:Please input the email to recieve notification: XXXXXXX@gmail.comPlease input your Google Voice acount: XXXXXXX@gmail.comPlease input your Google Voice password:Please input the phone number to recieve notification: 1234567890Get new price as: 1588 in flychina at 2013-11-09 00:01:15 (LAX-SFO; LAS-LAX)Get new price as: 1446 in onetravel at 2013-11-08 23:57:44 (LAX-SFO)Get new price as: 1493 in onetravel at 2013-11-09 00:04:39 (LAX-SFO) - Have a cup of coffee!
There is a big space to improve for this solution. The price and iteration parser should be designed as plug-in, so that we could add/remove/change them without modifying the main program.