CheapWeather: Data collection server design

As it happens for me, I usually work on projects over the course of several months, or even years.  While this seems glacial in pace, it actually isn’t considering that I’m usually switching between projects as I get ideas or new data.  So at any given time, I have probably 100 inactive projects that are waiting for a part, or waiting for thought, and about 10 active projects that are worked on as I get inspiration or aspiration.  That’s the awesome thing about a hobby, I can put it down if I tire of it.  See footnote #1.

I took on the server first, because the infrastructure it uses was something I have been playing with for work and other projects.  It also was easier to get that piece working first and debugged because I could use command line tools such as curl or wget to prod it and test it without needing a working sensor node.  Of the mental CPU cycles given to this project, it also happened to be the one aspect that had been given enough time to complete.  I also realized at the first of May 2016 that I had very little time to get this working before summer was here.  Winter had passed, it was half way through spring and I hadn’t picked this project up yet.  I had some of the ESP8266 modules from the July before, but hadn’t put them to use.  I needed to get moving if I was to start collecting data over the summer!

Why did I write my own and not use a cloud service?

Simple: I wanted just to collect data.  There is a learning curve to climb with each system out there, and I wanted to start collecting quickly without needing to learn not only how to ingress data, but how to also get it out when I wanted to process it.  Yes, this does mean that I am missing out on some of the new fancy gadgets that some of these services offer.  It doesn’t mean I’ll always stick with my own server.  I needed something to start with and this worked (and is working) quite well.

Cloud based brings with it all sorts of other potential issues.  Security.  Ability to get out to the web, ability to keep the data as mine, ability to keep things in a format I wish, ability to not deal with changes in service, either by change of interface, change of license or going out of business. Availability of the network.  If there’s an outage, I’m losing data if my sensors can’t get out to the cloud.

There’s a third reason: portability.  This might go into a Raspberry Pi that is out in the middle of a field somewhere pulling data off of a mesh of sensors, and isn’t uplinked all the time.  Portability also refers to the ability to hand this off to my friends (and you, if you want) without needing much more than some python libraries.

A simple server built in python gave me a good jump off platform.  It means that I can get started, but then graduate later.

What did I need to satisfy?
  • I needed to store four aspects of each data point: Sensor data, timestamp, sensor name, and sensor type.
  • Needed to be a simple REST based web application
  • Hosted on my in-house server.  Or a Rasp Pi.  Or anything running python.  For development I had it running on my laptop.
  • Needs to run on standard packages included with python, or easily installed using pip
  • Needs to be something that could run multiplatform.  Linux on a PC at a minimum, Raspberry Pi being a secondary target, with MacOS and Windows last.
  • Data needed to be stored in a concurrent safe manner, that is if two sensors sent data at the same exact time, they wouldn’t collide, they’d be queued properly and inserted.
What did I use to satisfy those requirements?

python, cherrypy, sqlite3.

The code is actually quite simple.  There are two parts: the main server handlers, and the sql thread.  The sql thread is necessary because the server framework CherryPy runs threaded and sqlite3 won’t allow queries to a connection outside of the thread that opened it.  CherryPy has a default of 10 threads in it’s pool, and this is something we’re going to want given that all of the sensors are asynchronously sending their data in.  So we fire up a separate thread that uses a thread safe queue and sits on it waiting for data.  When a request comes in, it formats the data, stuffs it into the sql thread’s queue, and returns.  When the sql thread wakes from it’s 1 second nap, it sees thing(s) in it’s queue, and processes them.

Let’s look at the server code first.  There’s four main sections: Config, exit plugin, app code, and main.

#!/usr/bin/env python
'''
   cwxDataCollector - A simple python webserver set to collect sensor data.

   Copyright (C) 2016 Bitreaper <bitreaper AT n357 DOT com>
   
   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License (GPL) as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.
   
   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.
   
   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.
'''

# system libs
import os
import time
import random
import string
import cherrypy
from cherrypy.process.plugins import SimplePlugin

# local libs
from sqlThread import SqlThread

#############################
# config
#############################

config = {
  'global' : {
    'server.socket_host' : '0.0.0.0',
    'server.socket_port' : 8080,
    'server.thread_pool' : 10
  }
}

##############################
# Exit handler definition.  Needed because Ctrl+C would not work right otherwise.  It's also needed
# because any changes to this file would cause the server to hang instead of restart.
# A good example can be found here:
# http://stackoverflow.com/questions/29238079/why-is-ctrl-c-not-captured-and-signal-handler-called
##############################

class ExitPlugin(SimplePlugin):

    def __init__(self, threadList, bus):
        SimplePlugin.__init__(self, bus)
        self.threadList = threadList

    def start(self):
        self.bus.log('Setting up exit handler plugin')
    # Setting start()'s priority to greater than 65 so that it is fired up after the fork if daemonized
    # See this for more details: https://groups.google.com/forum/#!topic/cherrypy-users/1fmDXaeCrsA
    start.priority = 70
    
    def exit(self):
        ''' iterate over threads and call their stop methods, then iterate again for each join.'''
        for thread in self.threadList:
            thread.stop()

        for thread in self.threadList:
            thread.join()

        self.unsubscribe()


##############################
# app definition
##############################

class SensorNetCollector(object):
    def __init__(self,sql):
        self.sql = sql

    ############
    @cherrypy.expose
    def index(self,):
        return '''try "keep?sensor=sensorname&data=mydata&sensorType=type" for your request.'''

    ############
    # for testing a sensor's link without storing the information.  It will just echo back to you the data you sent.
    @cherrypy.expose
    def testlink(self,sensor,data,sensorType):
        return "OK, {}, {}, {}".format(sensor, data, sensorType)

    ############
    @cherrypy.expose
    def keep(self,sensor,data,sensorType):
        if all([sensor, data, sensorType]):
            timestamp = int(time.time())
            self.sql.queue.put((sensor,timestamp,data,sensorType))
            return 'OK'
        else:
            if sensor:
                msg = '''You forgot to add data'''
            if data:
                msg = '''You forgot to add sensor id'''
            return msg


##############################
# main
##############################

if __name__ == '__main__':

    sqlThread = SqlThread()
    sqlThread.start()

    ExitPlugin([sqlThread], cherrypy.engine).subscribe()
    cherrypy.quickstart(SensorNetCollector(sqlThread), '/', config=config)

The exit plugin is so that our server here can handle any restarts, or more importantly, stop the sqlThread before exiting on a Ctrl+C.   CherryPy handles all of the server work for me, and I just define the endpoints with methods in the SensorNetCollector object.

And now let’s gawk at the sqlThread code:


#!/usr/bin/env python
'''
   cwxDataCollector - A simple python webserver set to collect sensor data.

   Copyright (C) 2016 Bitreaper <bitreaper AT n357 DOT com>
   
   This program is free software: you can redistribute it and/or modify
   it under the terms of the GNU General Public License (GPL) as published by
   the Free Software Foundation, either version 3 of the License, or
   (at your option) any later version.
   
   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.
   
   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.

'''

import os
import time
import sqlite3
from threading import Thread
from Queue import Queue,Empty,Full

class SqlStatements( object ):
    LIST_TABLES = "select name from sqlite_master where type ='table';"    
    
class SqlThread(Thread):
    sqlfile = "sensordata.sqlite3"

    def __init__(self):
        self.running = False
        self.queue = Queue()
        super(SqlThread, self).__init__()

    def sqlInit(self):
        self.conn = sqlite3.connect( self.sqlfile )
        self.cursor = self.conn.cursor()
        # check to see that the table we need exists create it if it doesn't
        if 0 == len(self.cursor.execute(SqlStatements.LIST_TABLES).fetchall()):
            retdata = self.cursor.execute( "CREATE TABLE sensors (sensorid TEXT, timestamp NUMERIC, data TEXT, sensorType TEXT);" )
            self.conn.commit()

    def start( self ):
        self.running = True
        super(SqlThread, self).start()

    def stop( self ):
        self.running = False

    def run(self):
        self.sqlInit()
        data = None
        while self.running:
            if self.queue.qsize() > 0:
                try:
                    data = self.queue.get(False)
                    #print "Data received, would stuff {} {} {} {}".format(data[0],data[1],data[2],data[3])
                    self.cursor.execute(
                        "INSERT INTO sensors VALUES ('{}',{},'{}','{}')".format(
                            data[0],
                            data[1],
                            data[2],
                            data[3])
                        )
                    self.conn.commit()
                except Empty as e:
                    pass
            time.sleep(1)


#########################
# super simple unit test.    
if __name__ == "__main__":
    sqlThread = SqlThread()
    sqlThread.start()

    timestamp = int(time.time())
    print("pushing data for %d" % timestamp)
    sqlThread.queue.put(('outside',timestamp,'200','light'))
    time.sleep(2)

    timestamp = int(time.time())
    print("pushing data for %d" % timestamp)
    sqlThread.queue.put(('bathroom',timestamp,'2900','vcc'))
    time.sleep(2)
    
    timestamp = int(time.time())
    print("pushing data for %d" % timestamp)
    sqlThread.queue.put(('livingroom',timestamp,'25','temp'))
    time.sleep(2)

    sqlThread.stop()
    sqlThread.join()

Essentially the sqlThread code is in a perpetual loop, accepting tuples on it’s queue, and inserting them into the database.  Not much to it.  If the database doesn’t exist, it creates it and populates the schema.

You can find the code here: https://github.com/bitreaper/cwxDataCollector

Footnotes:

#1: Earlier in my marriage, while we were on vacation my wife asked me why I was working.  I was intently staring at and hacking away on the laptop.  I told her I wasn’t working.

Her: “Well, aren’t you doing the same thing that you do at work? Coding?”

Me:  “Yes, but this is different.”

Her: “How? You’re furiously typing, and swearing at the screen.  It looks the same as when you’re working from home…”

Me: “Ah, OK, I see the confusion.  No, the difference is that if I tire of this exercise, I can either delete it or set it aside.  With work, I’m forced to continue until it’s solved.  That’s where the stress comes in, and it ceases being fun.”

 

Advertisements

CheapWeather: First Steps

If data was what I needed, the question was “how was I going to get it?”.  I already knew that I wasn’t going to get the data if I had to walk to each room, on a schedule, and measure it.  So automation was the way to go.  But just because a sensor would be read on a schedule still didn’t solve the part of collecting the data in one place.  That’s where it made logical sense to make each sensor wireless, with one station collecting the data for the entire set.

The design revolved around two major factors:

  • It had to be cheap, as I was going to be putting more than one of these together.  The minimum would be seven sensor nodes, but potentially more.
  • I wanted it to be easy.  I didn’t have all the time in the world to devote to this project, and like any good geek, I have a million and one other projects I’m working on.  Not to mention the normal obligations of a day job and family.

Other factors that bubbled up while contemplating the design of the system:

  • The sensor node had to be battery efficient, or would have to be able to be close to a wall socket.  Battery efficient meant more than 3 months on a set of batteries.
  • It needed to have a central repository where the data was collected and available on the network for me to get to from my laptop.  This could be a cloud service, or a database on a raspberry pi, or whatever.  Being available also meant I could work on it on my lunch hour, or when monitoring other activities.

With this small set of requirements, I started looking into my options for wireless communications.  There were a couple of options available to me at the time:

  • cc3100 wifi chipset
  • 433mhz wireless link
  • ZigBee module (like xbee), or other 802.15.4 chips and stacks like Microchip’s MiWi stack on top of ZigBee hardware
  • Nordic Semiconductor’s nRF24L01+ chipset
CC3100

This one was costly at first, somewhere around $30/module (it’s predecessor was, it’s now down around $10).  It was out because it was too expensive for the small single purpose sensors.

433mhz wireless links

These are the cheap $2-$3 modules on eBay.  They’re easy to interface with, but they’re also not smart, so collision avoidance and noise are things you have to put into your software stack to deal with the physical layer that this is.  So they meet the cheap requirement, but they don’t meet the easy.

ZigBee chips or modules

I’ve loved the idea of ZigBee from the first time I heard of it.  A mesh network for embedded communications is an awesome idea.  Unfortunately, I’ve found that it’s a heavy lift to get into the ZigBee stacks. The benefit of ZigBee’s design is that the software stack is where most of the work is done for mesh networking.  The downside though is that you need a stack, and if you’re not coding it up yourself, you’re going to need to get one prewritten.  That can mean extra license fees.  Microchip has a chipset and stack called MiWi that attempts to solve the problem that ZigBee was made to solve, with a lighter weight stack.  MiWi might be better, but again, it’d take some dedication to get into it.  They’re also not too cheap, or weren’t.  The later chips and popularity have brought these down to affordable prices for small modules.  The modules from Microchip are about $10.  The XBee modules are still in the $30 to $40 range, so that’s too expensive for this project.

Nordic Semiconductor’s nRF24L01+

These chips (and the modules I bought) met the requirements of cheap, but they weren’t as easy as I would have liked.  However, there were good examples and there was good code to use to begin understanding the details of using them.  You can get them for a good price on eBay, but beware, the really cheap modules that you can get for sub $2, are probably counterfeit and your failure rate might be high.  Out of the modules I bought, half of them had a problem with ACK’ing packets sent to them.  They could read, they could write, but while the other side saw the packet and had no problems with the exchange, the sending side said that it didn’t get an ACK.  Dan, a good friend of mine, has used them in his product for quite a few years.  His are the genuine chips, and don’t have the problems mine did.  Mine were either counterfeit or marginal chips that got made into modules and sold for surplus prices.

Since the nRF24L01+ met the cheap, and mostly easy requirements, it was the wireless link I started moving forward with.  The design started with an Arduino mini pro driving a nRF24L01+ module for each sensor node, with a Raspberry Pi on the other end listening to a nRF24L01+ module on it’s SPI port.  The Pi was going to be the gateway of the sensor net.  I got to the point of getting the example code pinging back and forth between two arduinos, and between an arduino and the pi.  I was able to do strength tests throughout my house, and found a sweet spot for where the Pi could sit, and hear the entire house.

This is where the project got derailed.  Any geek will tell you there’s far too many projects that beg to be completed, and usually if a project isn’t in high demand, it gets shelved in favor of spending that time on the new and shiny.  During the winters, since the demand for AC wasn’t there, there was always a “I’ll get back to it before summer” excuse, and off it went to the mental shelf to collect dust.  It’s not that I couldn’t use the sensor net throughout the year, my house has issues with the cold just as it does with the heat, but it’s easier solved with space heaters, so the demand for this project just wasn’t there in the winter.

The game changer: ESP8266

It wasn’t until the advent of the ESP8266 that this idea became a reality.  The ESP8266 solved many problems, not just the two main criteria of easy and cheap.  The ability to program this module with code meant that I also could consolidate the microcontroller functionality into it.  Now we’re less one component in the design.  It’s built-in support for WiFi removed the need for an additional computer, like a Raspberry Pi, for use as a gateway.  The sensors could communicate directly with the data store.  Given the Arduino environment and boot code, this module also could use standard libraries for many things.

Now it looked as if my sensor core design could be simplified quite a bit.  The microcontroller and the wifi link could be one module/chip and the data collector could be anything, anywhere I could get to on the network.

Next article I’ll detail the server design and choices.  I started with the server because I could get that done quickly, and test it without needing to have the sensor core done

Footnotes:

Easy is quite subjective here.  If you’ve never heard the phrase, “penny wise but pound foolish”, it perfectly describes this term.  There are many times we pick something to be “easy” (cutting some corners) but end up paying more elsewhere (for example the time spent learning a new platform just to make use of a library).  Luckily, that wasn’t the case here.  The new platform I learned was an good choice for future projects.

The CheapWeather project

tl;dr – I’m finding a way to gather data so I can figure out how to better utilize energy in heating/cooling my home.

Air conditioning is a marvel of our modern age. Using the phase change of a gas to a liquid and back to a gas, between two pressure domains, heat is absorbed in one and discharged in another.  The system is driven by a compressor and an expansion valve.  The gas is compressed, and sent through a coil where it dumps it’s heat to one domain (usually outside), sent through a valve where it can expand (into the lower pressure side, like the evaporator coil in the furnace) and by expansion evaporate from a liquid into a gas and take heat with it.  From the evaporator coil, it’s back to the compressor to repeat the cycle.  This is amazing feat.  But it takes work.  The compressor and fans involved are necessary to move this gas through it’s phase change stages.  This work can be expensive.  And if you throw in any inefficiencies, like equipment that has aged and is no longer operating at peak efficiency, you end up with a very expensive air conditioning bill.

This is my current problem.  I’m always feeling like I’m never getting the cooling I’m paying for.

Why don’t I replace it?  It’s just not an option at this time.  While the system lives and breathes, it won’t get replaced.  Until it’s demise, I need to investigate alternatives that could help.  Turning on the blower fan all the time tends to even out the temperatures throughout the house,  or so I was told by an AC guy.  Attic fans could also help reduce the enormous thermal battery that is my attic (testified to by my friend who put them in his attic and immediately felt a difference).  Keeping the house at a warmer temp and using fans.  These are all strategies that might work.

The key to knowing, is data.

Data is hard to pin down if it’s a subjective feeling.  Feeling like it’s working can differ from person to person and day to day.  In addition, remembering it is even harder and sticking to a schedule to write down each data point is near impossible.  That’s where having a system that can report the temperature throughout the day and record them is critical to making decisions.  I needed a way to record multiple points of data throughout my house.  I needed to know data points over time such as:

  • When my furnace or AC was on
  • What the temperature at the thermostat was when it went on
  • What the temperature in each room was, throughout the day
  • What the temperature outside was, to know the delta between inside and outside temps.
  • A temp sensor for every room, or at least the rooms that count
  • A humidity sensor, only really need one of these within the house
  • An outside station that captures
    • light, to know when the clouds have overcast
    • temp
    • humidity
    • barometric pressure (maybe?  nice to have?)
  • The battery level at each station.

The central server piece of software needed to satisfy these requirements:

  • Would be a simple REST based web application, hosted on my in-house server.
  • Needs to run on free packages available.
  • Needs to be something that could run multiplatform.  Linux on a PC at a minimum, Raspberry Pi being a secondary target, with MacOS and Windows last.
  • Data needed to be stored in a concurrent safe manner, that is if two sensors sent data at the same exact time, they wouldn’t collide, they’d be queued properly and inserted.

Stay tuned, in future posts I’ll get into the sensor design itself including the firmware running it, the design of the server software, the design of the outside sensor node, and the processing of the data that it’s produced so far.