So I decided to roll my own set of scripts that gather this data for me on my own hardware. This will take some work but since I also wanted to get back in touch with the software engineer in me, this could be a good opportunity to keep my skills in order.
I have to mention that I was also inspired by a Codebits talk by Alexandre Solleiro about Activity Streams. He talked about tracking your activities and building a platform for it. He also mentioned Nabaztag which is a cute rabbit bot (now called Karotz apparently) that interacts with you based on your activities.
Besides the data gathering, I am interested in that interactivity that can be built around this data powered bot.
MyBot designFirst things first: how will I make this work.
I bought a raspberry Pi almost only with this project in mind and it's where the main action will take place. Maybe more than one in the future...
Since this is a hobby and I lack the free time, most of its architecture is being decided as I go. I just cannot afford a better solution at this point. I do have ideas and some of them are already in progress.
PythonThis will be done in python. I am already learning python and its powerful habilities combined with its simplicity of use and portability make this the best solution. I am using a MacBook/Mas OS X and Eclipse for the development but this will run on Linux.
Modular architectureI will split the bot into modules that will run in different processes. For example I will have a module for monitoring my twitter feed, another to receive commands from me, one for monitoring my email activity, etc.
The decision for different processes has nothing to do with performance and multicore (I am running this on a raspberry pi remember?). It is about interactivity (because modules run independently from each other without having to wait for full completion) and it is also about being fail safe. A module crash can not make other modules crash. Not even parent processes. I'm not sure yet if I want the child processes to die if the parent dies.
I also have a few ideas to have the different modules monitor each other but for now I will only have one controller process that will launch the other modules in different processes and keep an eye on them to relaunch them if necessary. This controller module will also be monitored by a another controller.
This will also allow to load new modules without disrupting other already running modules.
Why am I spending so much time around the fail safe features? Because it is fun... and because if a module fails when I am away from home just after introducing a few new buggy lines of code just before I left home, I want it to alert me and try to recover on it's own until I get the time to get back to coding.
CommunicationDuring development I am using a command line console to communicate with the bot. But in the future, I want to remotely send commands or questions from anywhere. The easiest way for doing this is using email. It's probably the fastest app to launch on my phone and its extremely easy to set up a recipient for the email and at least a subject line. Of course these have some security concerns described below.
I am also thinking on communicating with it using twitter, IM or https service but these ideas are still a bit blurry at the moment with lots of questions in my head. Would also be cool to talk to it when I'm home :)
SecuritySee Concerns below...
SecurityI am a bit paranoid about security and since I am building something that will have access to all my life, security is a big concern.
This bot might have to know some passwords to collect data but I'm not sure yet how will I store them and use them. I have a few ideas but not mature enough. The first thing I need to do is reduce the need for the passwords and try to rely as much as possible on alternate authentication mechanisms and data that is already public. I'll then move to private data.
The way I communicate with the bot also concerns me. Emails will probably be the most used communication method but I have little control over authentication and no control over confidentiality of the emails so I will probably limit the kind of information and commands I can do with exchange using email. As IM is concerned, I do not know anything about it at the moment. HTTPS is a potential candidate for exchanging confidential information with the bot but I rather limit it to temporary URLs because I don't want to have an HTTPS service permanently available. SSH is probably the most secure option so far but would also be better not to have it permanently available. SSH is also cumbersome to use from a mobile device.
The codeSince I'm learning python, there are obvious concerns on what I'm doing and if I am using the best practices for coding in python. But I have a few issues I already know I will have to get back to:
1. Using multiprocessing module for launching the bot modules
The multiprocessing python module implements a very easy to use interface for forking processes but at the same time removes some control. Forking copies all the memory from the parent process to the child process. This means that all memory in my control module would be replicated to every other module that it spawns. I don't know if python's garbage collector can take care of this at some point but it won't be the most efficient solution anyway. I will probably move to fork()+exec() in the future or take a look at subprocesses module. In that case, I will also lose some great ways to communicate with the children processes present in multiprocessing module and will have to implement my own sockets communication module.
2. Import modules code
The code that imports the modules is hideous and does not seem very robust. Might have to recode it.
3. Will I ever finish this...
I have so little free time and so much projects I would like to do that some of my projects get left behind. This is probably the most laborious one so...
On the other hand, I really need this learning experience and I'm full of ideas and scenarios I would like to test and learn from (even if not the best solution for this kind of project). Even if I don't finish this project, I am already learning from it.