RSS Feed Filter for hacker news

Filter hacker news rss feed

NOTE: This code is just the bare bones, you will notice that it lacks structure and consistency and I’ll be the first to point out that is not “clean” – some of the functions don’t have return statements and the globals are everywhere, in short, it’s a mess – I was just having with “feedparser” when I wrote this code, if time allows, I will refactor or even rewrite the code to make it clean and latter add better functionality.

This implementation is very crude, it parses the rss feed and searches for the user defined tags only in the feed titles  :

  • for instance, if the user was interested in seeing python related feeds then a feed title containing words like PEP (stands for Python Enhancement Proposals) would not be filtered although it is related, an improvement may include forming a list of related words around every user-defined tag  - this is sort of like the concept of a thesaurus, so the “python” tag for instance would be interpreted as a list of words containing “python” plus any other related words like PEP,etc
  • recursively searching every feed link for the tag then doing a text search for the tag on the page directed to by feed’s url – this will of course add to the time it takes to parse a feed but it will make the search more comprehensive

I haven’t looked around at all but there may be a existing python module that offers search capabilities similar to the ones described above, it may then be a just  a matter of using that module but I like to experiment so I am going to try to implement that myself before looking for an existing module.

import feedparser

tags = ['python','gnu','c++','WordPress']
print "LOOKING FOR FEEDS with the following tags: "
for tag in tags:
    print "* %s" %tag
matched = list()
filtered = list()
def extendTags(*args):
    while True:
        resp = raw_input("do you wish to extend list of tags (y/n) ?  \n >>> ")
        if resp == 'Y' or resp =='y':
            add = raw_input ('enter new tag \n >>> ')
            if add in tags:
                print 'this tag is already in the existing list'
            else:
                tags.append(add)
        else:
             break
extendTags()

def feedQuerySummary(*args):
    if len(matched) == 0:
        print "\n:( no matches for this feedTitle"
    else:
        print "\n$$$$$$ FOUND: matched tags count for this feedTitle %d , here are the matched %r " %(len(matched),matched)
        print "\n$$$$$$ HERE IS THE FILTERED FEED (by titles) \n"
        for feed in filtered:
            print feed.title

def findMatches(*args, **kwargs):
    print "feed # %r" %(args[0])
    matchStatus = False
    for tag in tags:
        if tag.lower() in args[0].lower():
            matched.append(tag)
            matchStatus = True

    return matchStatus

def gatherFeedTitles(*args, **kwargs):
    feeds = feedparser.parse('https://news.ycombinator.com/rss')

    for feed in  feeds.entries:
        if findMatches(feed.title):
            filtered.append(feed)

gatherFeedTitles()
feedQuerySummary()
About these ads

Posted May 21, 2013 by furcon in hacks

Tagged with , ,

Follow

Get every new post delivered to your Inbox.