CheapWeather: Data collection server design

As it happens for me, I usually work on projects over the course of several months, or even years.  While this seems glacial in pace, it actually isn’t considering that I’m usually switching between projects as I get ideas or new data.  So at any given time, I have probably 100 inactive projects that are waiting for a part, or waiting for thought, and about 10 active projects that are worked on as I get inspiration or aspiration.  That’s the awesome thing about a hobby, I can put it down if I tire of it.  See footnote #1.

I took on the server first, because the infrastructure it uses was something I have been playing with for work and other projects.  It also was easier to get that piece working first and debugged because I could use command line tools such as curl or wget to prod it and test it without needing a working sensor node.  Of the mental CPU cycles given to this project, it also happened to be the one aspect that had been given enough time to complete.  I also realized at the first of May 2016 that I had very little time to get this working before summer was here.  Winter had passed, it was half way through spring and I hadn’t picked this project up yet.  I had some of the ESP8266 modules from the July before, but hadn’t put them to use.  I needed to get moving if I was to start collecting data over the summer!

Why did I write my own and not use a cloud service?

Simple: I wanted just to collect data.  There is a learning curve to climb with each system out there, and I wanted to start collecting quickly without needing to learn not only how to ingress data, but how to also get it out when I wanted to process it.  Yes, this does mean that I am missing out on some of the new fancy gadgets that some of these services offer.  It doesn’t mean I’ll always stick with my own server.  I needed something to start with and this worked (and is working) quite well.

Cloud based brings with it all sorts of other potential issues.  Security.  Ability to get out to the web, ability to keep the data as mine, ability to keep things in a format I wish, ability to not deal with changes in service, either by change of interface, change of license or going out of business. Availability of the network.  If there’s an outage, I’m losing data if my sensors can’t get out to the cloud.

There’s a third reason: portability.  This might go into a Raspberry Pi that is out in the middle of a field somewhere pulling data off of a mesh of sensors, and isn’t uplinked all the time.  Portability also refers to the ability to hand this off to my friends (and you, if you want) without needing much more than some python libraries.

A simple server built in python gave me a good jump off platform.  It means that I can get started, but then graduate later.

What did I need to satisfy?
  • I needed to store four aspects of each data point: Sensor data, timestamp, sensor name, and sensor type.
  • Needed to be a simple REST based web application
  • Hosted on my in-house server.  Or a Rasp Pi.  Or anything running python.  For development I had it running on my laptop.
  • Needs to run on standard packages included with python, or easily installed using pip
  • Needs to be something that could run multiplatform.  Linux on a PC at a minimum, Raspberry Pi being a secondary target, with MacOS and Windows last.
  • Data needed to be stored in a concurrent safe manner, that is if two sensors sent data at the same exact time, they wouldn’t collide, they’d be queued properly and inserted.
What did I use to satisfy those requirements?

python, cherrypy, sqlite3.

The code is actually quite simple.  There are two parts: the main server handlers, and the sql thread.  The sql thread is necessary because the server framework CherryPy runs threaded and sqlite3 won’t allow queries to a connection outside of the thread that opened it.  CherryPy has a default of 10 threads in it’s pool, and this is something we’re going to want given that all of the sensors are asynchronously sending their data in.  So we fire up a separate thread that uses a thread safe queue and sits on it waiting for data.  When a request comes in, it formats the data, stuffs it into the sql thread’s queue, and returns.  When the sql thread wakes from it’s 1 second nap, it sees thing(s) in it’s queue, and processes them.

Let’s look at the server code first.  There’s four main sections: Config, exit plugin, app code, and main.

#!/usr/bin/env python
'''
   cwxDataCollector - A simple python webserver set to collect sensor data.

   Copyright (C) 2016 Bitreaper <bitreaper AT n357 DOT com>
   
   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License (GPL) as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.
   
   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.
   
   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.
'''

# system libs
import os
import time
import random
import string
import cherrypy
from cherrypy.process.plugins import SimplePlugin

# local libs
from sqlThread import SqlThread

#############################
# config
#############################

config = {
  'global' : {
    'server.socket_host' : '0.0.0.0',
    'server.socket_port' : 8080,
    'server.thread_pool' : 10
  }
}

##############################
# Exit handler definition.  Needed because Ctrl+C would not work right otherwise.  It's also needed
# because any changes to this file would cause the server to hang instead of restart.
# A good example can be found here:
# http://stackoverflow.com/questions/29238079/why-is-ctrl-c-not-captured-and-signal-handler-called
##############################

class ExitPlugin(SimplePlugin):

    def __init__(self, threadList, bus):
        SimplePlugin.__init__(self, bus)
        self.threadList = threadList

    def start(self):
        self.bus.log('Setting up exit handler plugin')
    # Setting start()'s priority to greater than 65 so that it is fired up after the fork if daemonized
    # See this for more details: https://groups.google.com/forum/#!topic/cherrypy-users/1fmDXaeCrsA
    start.priority = 70
    
    def exit(self):
        ''' iterate over threads and call their stop methods, then iterate again for each join.'''
        for thread in self.threadList:
            thread.stop()

        for thread in self.threadList:
            thread.join()

        self.unsubscribe()


##############################
# app definition
##############################

class SensorNetCollector(object):
    def __init__(self,sql):
        self.sql = sql

    ############
    @cherrypy.expose
    def index(self,):
        return '''try "keep?sensor=sensorname&data=mydata&sensorType=type" for your request.'''

    ############
    # for testing a sensor's link without storing the information.  It will just echo back to you the data you sent.
    @cherrypy.expose
    def testlink(self,sensor,data,sensorType):
        return "OK, {}, {}, {}".format(sensor, data, sensorType)

    ############
    @cherrypy.expose
    def keep(self,sensor,data,sensorType):
        if all([sensor, data, sensorType]):
            timestamp = int(time.time())
            self.sql.queue.put((sensor,timestamp,data,sensorType))
            return 'OK'
        else:
            if sensor:
                msg = '''You forgot to add data'''
            if data:
                msg = '''You forgot to add sensor id'''
            return msg


##############################
# main
##############################

if __name__ == '__main__':

    sqlThread = SqlThread()
    sqlThread.start()

    ExitPlugin([sqlThread], cherrypy.engine).subscribe()
    cherrypy.quickstart(SensorNetCollector(sqlThread), '/', config=config)

The exit plugin is so that our server here can handle any restarts, or more importantly, stop the sqlThread before exiting on a Ctrl+C.   CherryPy handles all of the server work for me, and I just define the endpoints with methods in the SensorNetCollector object.

And now let’s gawk at the sqlThread code:


#!/usr/bin/env python
'''
   cwxDataCollector - A simple python webserver set to collect sensor data.

   Copyright (C) 2016 Bitreaper <bitreaper AT n357 DOT com>
   
   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License (GPL) as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.
   
   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.
   
   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.

'''

import os
import time
import sqlite3
from threading import Thread
from Queue import Queue,Empty,Full

class SqlStatements( object ):
    LIST_TABLES = "select name from sqlite_master where type ='table';"    
    
class SqlThread(Thread):
    sqlfile = "sensordata.sqlite3"

    def __init__(self):
        self.running = False
        self.queue = Queue()
        super(SqlThread, self).__init__()

    def sqlInit(self):
        self.conn = sqlite3.connect( self.sqlfile )
        self.cursor = self.conn.cursor()
        # check to see that the table we need exists create it if it doesn't
        if 0 == len(self.cursor.execute(SqlStatements.LIST_TABLES).fetchall()):
            retdata = self.cursor.execute( "CREATE TABLE sensors (sensorid TEXT, timestamp NUMERIC, data TEXT, sensorType TEXT);" )
            self.conn.commit()

    def start( self ):
        self.running = True
        super(SqlThread, self).start()

    def stop( self ):
        self.running = False

    def run(self):
        self.sqlInit()
        data = None
        while self.running:
            if self.queue.qsize() > 0:
                try:
                    data = self.queue.get(False)
                    #print "Data received, would stuff {} {} {} {}".format(data[0],data[1],data[2],data[3])
                    self.cursor.execute(
                        "INSERT INTO sensors VALUES ('{}',{},'{}','{}')".format(
                            data[0],
                            data[1],
                            data[2],
                            data[3])
                        )
                    self.conn.commit()
                except Empty as e:
                    pass
            time.sleep(1)


#########################
# super simple unit test.    
if __name__ == "__main__":
    sqlThread = SqlThread()
    sqlThread.start()

    timestamp = int(time.time())
    print("pushing data for %d" % timestamp)
    sqlThread.queue.put(('outside',timestamp,'200','light'))
    time.sleep(2)

    timestamp = int(time.time())
    print("pushing data for %d" % timestamp)
    sqlThread.queue.put(('bathroom',timestamp,'2900','vcc'))
    time.sleep(2)
    
    timestamp = int(time.time())
    print("pushing data for %d" % timestamp)
    sqlThread.queue.put(('livingroom',timestamp,'25','temp'))
    time.sleep(2)

    sqlThread.stop()
    sqlThread.join()

Essentially the sqlThread code is in a perpetual loop, accepting tuples on it’s queue, and inserting them into the database.  Not much to it.  If the database doesn’t exist, it creates it and populates the schema.

You can find the code here: https://github.com/bitreaper/cwxDataCollector

Footnotes:

#1: Earlier in my marriage, while we were on vacation my wife asked me why I was working.  I was intently staring at and hacking away on the laptop.  I told her I wasn’t working.

Her: “Well, aren’t you doing the same thing that you do at work? Coding?”

Me:  “Yes, but this is different.”

Her: “How? You’re furiously typing, and swearing at the screen.  It looks the same as when you’re working from home…”

Me: “Ah, OK, I see the confusion.  No, the difference is that if I tire of this exercise, I can either delete it or set it aside.  With work, I’m forced to continue until it’s solved.  That’s where the stress comes in, and it ceases being fun.”

 

Advertisements

A solution to Pidgin, Plasma5 and the missing systray icon

tl;dr:

I found out a while back that Pidgin’s underlying library libpurple has the ability to have many functions called through dbus.  So if I can’t figure out how to get the systray icon back, could I at least poke it to show the buddy list again?  Turns out with a short bit of research and some playing around in an ipython shell with the dbus module, I was able to come up with the following (hackish) script to display the buddy list window when it accidentally gets  closed:


#!/usr/bin/env python
# coding: utf-8
import dbus
bus = dbus.SessionBus()
purple = bus.get_object("im.pidgin.purple.PurpleService", "/im/pidgin/purple/PurpleObject")
purple.PurpleBlistSetVisible(1)

I put this in a script called “buddylist”, and call it if I ever close the buddy list window.  Works like a charm.

The longer story:

In the wake of KDE’s Plasma 5 desktop being released last year, I discovered what many did about some systray icons: they no longer appeared.  This is a problem for any application that minimizes to the systray like Pidgin does.  While it’s not hard to leave Pidgin’s buddy list (it’s main window) open, sometimes it accidentally gets closed, which means it closes down to the systray.  If the icon isn’t able to display in the systray, then all is naught, because killing pidgin and bringing it back only restarts it in the status it was last time, i.e. minimized to the systray. Google didn’t help here, because I kept finding old posts about a different systray problem from 2010, or 2013.  Great.  I had fallen into this hole of Pidgin being minimized, even on restart.  Not fun, I had work to do and people at work to IM to collaborate with.

I love all of the hard work that the Plasma 5 and KDE devs do.  You guys rock, and my world has been better for it for the last 16 years of using KDE.  Consider this my heartfelt thanks for all of your hard work.  But as with most things that change out from under you, it’s irksome to get a new version and things are broken.  Of course, the first thought it to blame someone for things being broken.  There’s a phrase I keep saying and trying to remember : there’s always extenuating circumstances.  In other words, when criticising other coders make sure you understand all the factors that went into their decision before opening your mouth, otherwise you’ll only embarrass yourself.  That is definitely true here, as demonstrated in the following two article links.  It looks as if they’re attempting to correct hackish behavior with respect to systray icons, so that we can transition to something like Wayland in the future and get rid of X baggage.  Ah, so there *IS* a reason…

Where are my systray icons?

System Tray in Plasma Next

Footnote:  I rely heavily upon Pidgin, and before you suggest XYZ client because it’s better, we all use software that we’re accustomed to for a reason, and we’re more efficient for it.  Doesn’t mean I’m unwilling to try out new clients but sometimes sticking with what you know when it just works, is worth it.  I like to save my brain power (or brain damage) for the new things I must learn.