README.md

   1 Waffle is a Python library for storing data in a schema-less, document oriented way using a relational database
   2
   3 Similar designs:
   4
   5  - CouchDB: http://couchdb.apache.org/
   6  - FriendFeed: http://bret.appspot.com/entry/how-friendfeed-uses-mysql
   7
   8 Advantages:
   9
  10  - Easy sharding: spread your data load across machines
  11  - Flexible schema; schema changes can be online and as flexible as you want (most databases require a full table copy).
  12  - May obviate the need for complex ORM tools and queries in your application since you can store objects directly
  13  - Record versioning (unimplemented)
  14  - Client side replication (unimplemented)
  15  - Indices can be created and populated online
  16  - Indices can be created on separate databases
  17  - Index values can be customized on the client side (for example MySQL only lets you customize the prefix)
  18  - Works with any database SQLAlchemy supports (SQLite, MySQL, PostgreSQL, Oracle)
  19
  20 Disadvantages:
  21
  22  - Your records are stored in an opaque format the server cannot inspect.  This can jeopardize platform neutrality and maintainability.  For example if you choose to use encode your object data with Pickle (Python's main serialization format), you probably won't be able to access your data from other languages without first exporting it to a compatible format.
  23  - Along the same lines, your record data will no longer be directly queryable with your SQL client
  24  - Because the record data is opaque to the server, table-scan queries will executed on the client instead of the server.  This will likely be much slower than a table scan on the server side since there will be additional latency costs for marshalling the data over the network and filtering it on the client side.
  25  - Eventual consistency; Since indices are updated in transactions and connections separate from the main record update, indices are eventually consistent but not atomically consistent with the record update.
  26
  27 Requirements So Far
  28
  29  - Python
  30  - SQLAlchemy (0.5.x but 0.4.x will probably work fine)
  31  - Kindred spirit
  32
  33 Small Example:
  34
  35
  36     import time
  37
  38     import sqlalchemy
  39     import waffle
  40
  41     _engines = [
  42         sqlalchemy.create_engine('sqlite:///:memory:'),
  43         sqlalchemy.create_engine('sqlite:///:memory:'),
  44     ]
  45
  46     # Define a blog post entity that's searchable by time
  47
  48     def to_time(record):
  49         yield {'time': int(time.mktime(record.created.timetuple()))}
  50
  51     time_idx = waffle.Index('blog_time',
  52         columns=[sqlalchemy.Column('time', sqlalchemy.Integer())],
  53         shard=waffle.ShardByPrimaryKey(_engines),
  54         mapper=to_time)
  55
  56     class BlogPost(waffle.Record):
  57         def __init__(self, **kwargs):
  58             kwargs.setdefault('value', {'body': '', 'title': ''})
  59             super(BlogPost, self).__init__(**kwargs)
  60
  61     blog_posts = waffle.Entity('blog', engines=_engines, indices=[time_idx], record_class=BlogPost)
  62
  63     # Create the table and indices if they don't already exist
  64     blog_posts.create()
  65
  66     # Make a new blog post
  67     blog_post = blog_posts.new()
  68     blog_post.value['title'] = 'My First Blog Post'
  69     blog_post.value['body'] = 'It\'s a sunny day in San Francisco.'
  70
  71     # Save it
  72     blog_posts.save(blog_post)
  73
  74     # Print the blog posts for today:
  75     # (prints => My First Blog Post)
  76     for blog_post in blog_posts.select(time_idx.c.time >= (time.time() - 86400)):
  77         print blog_post.value['title']
  78