CheapWeather: Data collection server design

As it happens for me, I usually work on projects over the course of several months, or even years.  While this seems glacial in pace, it actually isn’t considering that I’m usually switching between projects as I get ideas or new data.  So at any given time, I have probably 100 inactive projects that are waiting for a part, or waiting for thought, and about 10 active projects that are worked on as I get inspiration or aspiration.  That’s the awesome thing about a hobby, I can put it down if I tire of it.  See footnote #1.

I took on the server first, because the infrastructure it uses was something I have been playing with for work and other projects.  It also was easier to get that piece working first and debugged because I could use command line tools such as curl or wget to prod it and test it without needing a working sensor node.  Of the mental CPU cycles given to this project, it also happened to be the one aspect that had been given enough time to complete.  I also realized at the first of May 2016 that I had very little time to get this working before summer was here.  Winter had passed, it was half way through spring and I hadn’t picked this project up yet.  I had some of the ESP8266 modules from the July before, but hadn’t put them to use.  I needed to get moving if I was to start collecting data over the summer!

Why did I write my own and not use a cloud service?

Simple: I wanted just to collect data.  There is a learning curve to climb with each system out there, and I wanted to start collecting quickly without needing to learn not only how to ingress data, but how to also get it out when I wanted to process it.  Yes, this does mean that I am missing out on some of the new fancy gadgets that some of these services offer.  It doesn’t mean I’ll always stick with my own server.  I needed something to start with and this worked (and is working) quite well.

Cloud based brings with it all sorts of other potential issues.  Security.  Ability to get out to the web, ability to keep the data as mine, ability to keep things in a format I wish, ability to not deal with changes in service, either by change of interface, change of license or going out of business. Availability of the network.  If there’s an outage, I’m losing data if my sensors can’t get out to the cloud.

There’s a third reason: portability.  This might go into a Raspberry Pi that is out in the middle of a field somewhere pulling data off of a mesh of sensors, and isn’t uplinked all the time.  Portability also refers to the ability to hand this off to my friends (and you, if you want) without needing much more than some python libraries.

A simple server built in python gave me a good jump off platform.  It means that I can get started, but then graduate later.

What did I need to satisfy?
  • I needed to store four aspects of each data point: Sensor data, timestamp, sensor name, and sensor type.
  • Needed to be a simple REST based web application
  • Hosted on my in-house server.  Or a Rasp Pi.  Or anything running python.  For development I had it running on my laptop.
  • Needs to run on standard packages included with python, or easily installed using pip
  • Needs to be something that could run multiplatform.  Linux on a PC at a minimum, Raspberry Pi being a secondary target, with MacOS and Windows last.
  • Data needed to be stored in a concurrent safe manner, that is if two sensors sent data at the same exact time, they wouldn’t collide, they’d be queued properly and inserted.
What did I use to satisfy those requirements?

python, cherrypy, sqlite3.

The code is actually quite simple.  There are two parts: the main server handlers, and the sql thread.  The sql thread is necessary because the server framework CherryPy runs threaded and sqlite3 won’t allow queries to a connection outside of the thread that opened it.  CherryPy has a default of 10 threads in it’s pool, and this is something we’re going to want given that all of the sensors are asynchronously sending their data in.  So we fire up a separate thread that uses a thread safe queue and sits on it waiting for data.  When a request comes in, it formats the data, stuffs it into the sql thread’s queue, and returns.  When the sql thread wakes from it’s 1 second nap, it sees thing(s) in it’s queue, and processes them.

Let’s look at the server code first.  There’s four main sections: Config, exit plugin, app code, and main.

#!/usr/bin/env python
'''
   cwxDataCollector - A simple python webserver set to collect sensor data.

   Copyright (C) 2016 Bitreaper <bitreaper AT n357 DOT com>
   
   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License (GPL) as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.
   
   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.
   
   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.
'''

# system libs
import os
import time
import random
import string
import cherrypy
from cherrypy.process.plugins import SimplePlugin

# local libs
from sqlThread import SqlThread

#############################
# config
#############################

config = {
  'global' : {
    'server.socket_host' : '0.0.0.0',
    'server.socket_port' : 8080,
    'server.thread_pool' : 10
  }
}

##############################
# Exit handler definition.  Needed because Ctrl+C would not work right otherwise.  It's also needed
# because any changes to this file would cause the server to hang instead of restart.
# A good example can be found here:
# http://stackoverflow.com/questions/29238079/why-is-ctrl-c-not-captured-and-signal-handler-called
##############################

class ExitPlugin(SimplePlugin):

    def __init__(self, threadList, bus):
        SimplePlugin.__init__(self, bus)
        self.threadList = threadList

    def start(self):
        self.bus.log('Setting up exit handler plugin')
    # Setting start()'s priority to greater than 65 so that it is fired up after the fork if daemonized
    # See this for more details: https://groups.google.com/forum/#!topic/cherrypy-users/1fmDXaeCrsA
    start.priority = 70
    
    def exit(self):
        ''' iterate over threads and call their stop methods, then iterate again for each join.'''
        for thread in self.threadList:
            thread.stop()

        for thread in self.threadList:
            thread.join()

        self.unsubscribe()


##############################
# app definition
##############################

class SensorNetCollector(object):
    def __init__(self,sql):
        self.sql = sql

    ############
    @cherrypy.expose
    def index(self,):
        return '''try "keep?sensor=sensorname&data=mydata&sensorType=type" for your request.'''

    ############
    # for testing a sensor's link without storing the information.  It will just echo back to you the data you sent.
    @cherrypy.expose
    def testlink(self,sensor,data,sensorType):
        return "OK, {}, {}, {}".format(sensor, data, sensorType)

    ############
    @cherrypy.expose
    def keep(self,sensor,data,sensorType):
        if all([sensor, data, sensorType]):
            timestamp = int(time.time())
            self.sql.queue.put((sensor,timestamp,data,sensorType))
            return 'OK'
        else:
            if sensor:
                msg = '''You forgot to add data'''
            if data:
                msg = '''You forgot to add sensor id'''
            return msg


##############################
# main
##############################

if __name__ == '__main__':

    sqlThread = SqlThread()
    sqlThread.start()

    ExitPlugin([sqlThread], cherrypy.engine).subscribe()
    cherrypy.quickstart(SensorNetCollector(sqlThread), '/', config=config)

The exit plugin is so that our server here can handle any restarts, or more importantly, stop the sqlThread before exiting on a Ctrl+C.   CherryPy handles all of the server work for me, and I just define the endpoints with methods in the SensorNetCollector object.

And now let’s gawk at the sqlThread code:


#!/usr/bin/env python
'''
   cwxDataCollector - A simple python webserver set to collect sensor data.

   Copyright (C) 2016 Bitreaper <bitreaper AT n357 DOT com>
   
   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License (GPL) as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.
   
   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.
   
   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.

'''

import os
import time
import sqlite3
from threading import Thread
from Queue import Queue,Empty,Full

class SqlStatements( object ):
    LIST_TABLES = "select name from sqlite_master where type ='table';"    
    
class SqlThread(Thread):
    sqlfile = "sensordata.sqlite3"

    def __init__(self):
        self.running = False
        self.queue = Queue()
        super(SqlThread, self).__init__()

    def sqlInit(self):
        self.conn = sqlite3.connect( self.sqlfile )
        self.cursor = self.conn.cursor()
        # check to see that the table we need exists create it if it doesn't
        if 0 == len(self.cursor.execute(SqlStatements.LIST_TABLES).fetchall()):
            retdata = self.cursor.execute( "CREATE TABLE sensors (sensorid TEXT, timestamp NUMERIC, data TEXT, sensorType TEXT);" )
            self.conn.commit()

    def start( self ):
        self.running = True
        super(SqlThread, self).start()

    def stop( self ):
        self.running = False

    def run(self):
        self.sqlInit()
        data = None
        while self.running:
            if self.queue.qsize() > 0:
                try:
                    data = self.queue.get(False)
                    #print "Data received, would stuff {} {} {} {}".format(data[0],data[1],data[2],data[3])
                    self.cursor.execute(
                        "INSERT INTO sensors VALUES ('{}',{},'{}','{}')".format(
                            data[0],
                            data[1],
                            data[2],
                            data[3])
                        )
                    self.conn.commit()
                except Empty as e:
                    pass
            time.sleep(1)


#########################
# super simple unit test.    
if __name__ == "__main__":
    sqlThread = SqlThread()
    sqlThread.start()

    timestamp = int(time.time())
    print("pushing data for %d" % timestamp)
    sqlThread.queue.put(('outside',timestamp,'200','light'))
    time.sleep(2)

    timestamp = int(time.time())
    print("pushing data for %d" % timestamp)
    sqlThread.queue.put(('bathroom',timestamp,'2900','vcc'))
    time.sleep(2)
    
    timestamp = int(time.time())
    print("pushing data for %d" % timestamp)
    sqlThread.queue.put(('livingroom',timestamp,'25','temp'))
    time.sleep(2)

    sqlThread.stop()
    sqlThread.join()

Essentially the sqlThread code is in a perpetual loop, accepting tuples on it’s queue, and inserting them into the database.  Not much to it.  If the database doesn’t exist, it creates it and populates the schema.

You can find the code here: https://github.com/bitreaper/cwxDataCollector

Footnotes:

#1: Earlier in my marriage, while we were on vacation my wife asked me why I was working.  I was intently staring at and hacking away on the laptop.  I told her I wasn’t working.

Her: “Well, aren’t you doing the same thing that you do at work? Coding?”

Me:  “Yes, but this is different.”

Her: “How? You’re furiously typing, and swearing at the screen.  It looks the same as when you’re working from home…”

Me: “Ah, OK, I see the confusion.  No, the difference is that if I tire of this exercise, I can either delete it or set it aside.  With work, I’m forced to continue until it’s solved.  That’s where the stress comes in, and it ceases being fun.”

 

The CheapWeather project

tl;dr – I’m finding a way to gather data so I can figure out how to better utilize energy in heating/cooling my home.

Air conditioning is a marvel of our modern age. Using the phase change of a gas to a liquid and back to a gas, between two pressure domains, heat is absorbed in one and discharged in another.  The system is driven by a compressor and an expansion valve.  The gas is compressed, and sent through a coil where it dumps it’s heat to one domain (usually outside), sent through a valve where it can expand (into the lower pressure side, like the evaporator coil in the furnace) and by expansion evaporate from a liquid into a gas and take heat with it.  From the evaporator coil, it’s back to the compressor to repeat the cycle.  This is amazing feat.  But it takes work.  The compressor and fans involved are necessary to move this gas through it’s phase change stages.  This work can be expensive.  And if you throw in any inefficiencies, like equipment that has aged and is no longer operating at peak efficiency, you end up with a very expensive air conditioning bill.

This is my current problem.  I’m always feeling like I’m never getting the cooling I’m paying for.

Why don’t I replace it?  It’s just not an option at this time.  While the system lives and breathes, it won’t get replaced.  Until it’s demise, I need to investigate alternatives that could help.  Turning on the blower fan all the time tends to even out the temperatures throughout the house,  or so I was told by an AC guy.  Attic fans could also help reduce the enormous thermal battery that is my attic (testified to by my friend who put them in his attic and immediately felt a difference).  Keeping the house at a warmer temp and using fans.  These are all strategies that might work.

The key to knowing, is data.

Data is hard to pin down if it’s a subjective feeling.  Feeling like it’s working can differ from person to person and day to day.  In addition, remembering it is even harder and sticking to a schedule to write down each data point is near impossible.  That’s where having a system that can report the temperature throughout the day and record them is critical to making decisions.  I needed a way to record multiple points of data throughout my house.  I needed to know data points over time such as:

  • When my furnace or AC was on
  • What the temperature at the thermostat was when it went on
  • What the temperature in each room was, throughout the day
  • What the temperature outside was, to know the delta between inside and outside temps.
  • A temp sensor for every room, or at least the rooms that count
  • A humidity sensor, only really need one of these within the house
  • An outside station that captures
    • light, to know when the clouds have overcast
    • temp
    • humidity
    • barometric pressure (maybe?  nice to have?)
  • The battery level at each station.

The central server piece of software needed to satisfy these requirements:

  • Would be a simple REST based web application, hosted on my in-house server.
  • Needs to run on free packages available.
  • Needs to be something that could run multiplatform.  Linux on a PC at a minimum, Raspberry Pi being a secondary target, with MacOS and Windows last.
  • Data needed to be stored in a concurrent safe manner, that is if two sensors sent data at the same exact time, they wouldn’t collide, they’d be queued properly and inserted.

Stay tuned, in future posts I’ll get into the sensor design itself including the firmware running it, the design of the server software, the design of the outside sensor node, and the processing of the data that it’s produced so far.