Filter hacker news rss feed
NOTE: This code is just the bare bones, you will notice that it lacks structure and consistency and I’ll be the first to point out that is not “clean” – some of the functions don’t have return statements and the globals are everywhere, in short, it’s a mess – I was just having with “feedparser” when I wrote this code, if time allows, I will refactor or even rewrite the code to make it clean and latter add better functionality.
This implementation is very crude, it parses the rss feed and searches for the user defined tags only in the feed titles :
- for instance, if the user was interested in seeing python related feeds then a feed title containing words like PEP (stands for Python Enhancement Proposals) would not be filtered although it is related, an improvement may include forming a list of related words around every user-defined tag – this is sort of like the concept of a thesaurus, so the “python” tag for instance would be interpreted as a list of words containing “python” plus any other related words like PEP,etc
- recursively searching every feed link for the tag then doing a text search for the tag on the page directed to by feed’s url – this will of course add to the time it takes to parse a feed but it will make the search more comprehensive
I haven’t looked around at all but there may be a existing python module that offers search capabilities similar to the ones described above, it may then be a just a matter of using that module but I like to experiment so I am going to try to implement that myself before looking for an existing module.
import feedparser tags = ['python','gnu','c++','WordPress'] print "LOOKING FOR FEEDS with the following tags: " for tag in tags: print "* %s" %tag matched = list() filtered = list() def extendTags(*args): while True: resp = raw_input("do you wish to extend list of tags (y/n) ? \n >>> ") if resp == 'Y' or resp =='y': add = raw_input ('enter new tag \n >>> ') if add in tags: print 'this tag is already in the existing list' else: tags.append(add) else: break extendTags() def feedQuerySummary(*args): if len(matched) == 0: print "\n:( no matches for this feedTitle" else: print "\n$$$$$$ FOUND: matched tags count for this feedTitle %d , here are the matched %r " %(len(matched),matched) print "\n$$$$$$ HERE IS THE FILTERED FEED (by titles) \n" for feed in filtered: print feed.title def findMatches(*args, **kwargs): print "feed # %r" %(args) matchStatus = False for tag in tags: if tag.lower() in args.lower(): matched.append(tag) matchStatus = True return matchStatus def gatherFeedTitles(*args, **kwargs): feeds = feedparser.parse('https://news.ycombinator.com/rss') for feed in feeds.entries: if findMatches(feed.title): filtered.append(feed) gatherFeedTitles() feedQuerySummary()