Twitter is a magnificent service and for me, it’s really a bottomless pit for new ideas, articles and web pages. I’m been trying to find good and interesting people to follow but that only got me so far and it also biased the links I got so I started thinking there has to be other way. And I decided to grab Tweepy, the Python Twitter library, and some python code and started hacking.

The Big Picture I had was to create a twitter bot that searches tweets that has the word data and contains links. After that, it would save to database the url, the title of the page, user who posted the tweet and the date it was saved. Since most of the links are irrelevant for me, I decided that I need some kind of machine learning algorithms to decide for me, which links are relevant and interesting.

The current situation is that I have the bot ready and operational (with a little bug I haven’t yet been able to fix) and the database and the website showing the results are up and running. However, due to being very busy at different projects, the machine learning part is not yet started and it will be next summer’s agenda.

I have set a cron job to be executed every 15 minutes and it fetches 15 tweets each time. It then checks if the URL or the exact title (because of different URL shortenings) is already in the database and if it’s not, it adds the link, the title, the user and the date to SQLite3 database. At the moment there is a bug which has forced me to for the moment take the cron job off. Every now and then (too often), the connection between my server and Twitter hangs and the bot never halts. That is really disturbing as it completely jams my server and makes it nearly impossible to use.

If someone has ideas concerning the bot, the machine learning part or the bug, please comment and help a man out.

The code is in a BitBucket repo. Feel free to hack it and use it as you wish.


