Category Archives: warehousing

Infinidb 4.0.2 review

As with most data centric companies, there comes a time where it’s necessary to be able to aggregate data to provide meaningful insight into raw data that is collected. At Fuse Powered that is essentially one of the cores services we provide, whether it be via ad-hoc queries, reports, or our dashboard.

The core database that we use is of course mysql with a default storage engine Innodb, which is geared toward OLTP and does it very well. However, when it comes to aggregating a large number of rows, and/or loading a large amount of data quickly, this default storage engine is not geared towards these sorts of operations efficiently when performing a very large amount of data. A class of storage engine called columnar storage is designed for just this sort of work, where the data is re-oriented, and the data is put into separate column based data files, where each of the columns can be read from disk sequentially and stored with high compression (resulting in disk storage savings and less data to read), as the speed in reading far outweighs the cpu overhead of decompression.

There are a number of columnar based offerings out there, however, almost all are closed source and quite pricey. Up until now we’ve utilized Infobright, a mysql client-based solution with a watered-down community version, stripping much of the functionality and performance from it’s full version. Recently a company, Calpont, (recently re-branded as it’s product name Infinidb) has come out with a open source licensing of it’s full 4.0.x version, a move that may potentially revolutionize how columnar databases are marketed.

This full version fills a number of core needs, such as compression, parallel processing, distributed storage, filesystem redundancy (HDFS, GlusterFS) and high speed data loading, and so we’ve been eager to test and implement.

The architecture is comprised of two main functional units, user modules (UM) and performance modules (PM). Performance modules work with and store data, while user modules coordinate queries across the entire system. Scaling performance modules gives you more storage and potentially better performance from parallel work being done. Scaling user modules allows for more concurrent queries to be run and redundancy for a failed um. You can start with a single server that contains all units, and grow out to a large distributed system as both type of modules can be added as your data increases. Each PM has a set number of dbroots, which can easily be reallocated/moved to other PMs on other servers as your data grows. It also uses a form of partitioning called extent mapping which will rule out sections of data in storage to speed up queries where possible. Queries are processed in parallel by the available processors and nodes that are appropriate.

Results

the following is our general experience with it to date,

Documentation – Very good, answers quite a bit in terms of concepts, installation,  options, etc.. However, because it’s just released as open source, really needs much more use and documentation by the user community to fish out bugs, suggest fixes, etc..

Installation/configuration – straight forward, enable passwordless ssh between nodes, download/install the initial rpm/deb/binary files and then run their script which will ask a series of questions. You can adjust the main configuration file later only on the root pm, usually pm1 at /usr/local/Calpont/etc/Calpont.xml.

Schema transformation - you need to create infinidb versions of the table structure, this is going to take some time, there are all kinds of incompatibilities when converting innodb schema to infinidb.

Loading – load data infile vs. the infinidb loader. As can be guessed, the loader is much faster, anywhere from 100,000 -  1 million+ rows/s for a single module load. The source of the loader can be at the user module, which will automatically distribute the data in 10,000 rows chunks, or at the performance modules individually using pre-split files to be able to scale. You can set an error threshold over which the load will abort, with errors logged.

Compression – our data is fairly typical, largely numerical with some varchar and text/blob columns here and there. Our data is compressed at about 5X compared to uncompressed innodb.

Backups – pretty rudimentary, really just a file copy which requires it be in read-only mode to ensure data consistency. However, it being a file copy, straight forward and fast.

Queries  – performance is very impressive, especially for more complex queries, we had a class of aggregation queries that was taking 5-10 minutes on a dedicated box with 256GB ram, on a three-node public/shared cloud system 16GB each, finished in 10-15 seconds.

Logging – very well done, especially the loading, there a two files per load per table that detail the error type and the exact characters that cause the issue. Also various server logs to debug issues.

Administration – much of it is done through an independent ‘cmconsole’ which allows you to set and view various parameters and processes as well as add and reallocate modules. There are several processes per node that need to be up and running and can be a source of problems if they aren’t, viewable with the console.

Monitoring – built in monitoring of a number of key metrics such as disk/memory/cpu usage, networking issues, module issues, etc..

Cross engine compatibiity – not just the infinidb storage engine, also a full version of mysql and you can add innodb capability with the plugin. However, pretty darn old at 5.1.39. You can also mix storage engines in your queries, although you’ll need to specify it in the configuration file.

 

Issues

Before we get in the issues, it should be said that we have not attempted to contact infinidb reps, and any resources we used were simply the documentation and the infinidb forums (with posted responses by infinidb reps). Now, some of the issues are annoyances, other are fairly serious.

The most serious revolves around loading of data, we typical re(load) and append about 1000 tables per day and we’ve run into this behavior a few times in the last couple of weeks.

Locking – each load requires a table lock and occasionally, one of two things happen,

a) a table is locked and does not release the lock – we’ve even had a time or two where we’ve restarted the system and still has not released the lock (contrary to documentation). There is a view and clear lock utility, but for some reason it’s currently not working for us at the moment either. Our solution was to hunt for the file that contains this lock, and sure enough, you can find it on your parent pm at /usr/local/Calpont/data1/systemFiles/dbrm/tablelocks. Simply removing it and restart seems to work. This behavior seems to depend on the particular data being loaded, this happened on the same table 3 times, and we ran into most of this when initially loading our full data set.

b) table lock can not be acquired – In this case, a load is attempting to get a table lock, can’t get it and aborts the load. There were no other queries on the same table running at the same time, and in fact seemed to happen mainly when loading of other tables/dbs simultaneously.

Critical errors – we’ve ran into this a few times, where an error such as ‘image inconsistency’ initiates the system to go into ‘read-only’ mode. We only experienced it during a load which I suppose makes sense as it’s the only time we are significantly writing. When encountering this during a load it also attempts to rollback but can’t because again it’s been put into read-only mode. You might deal with the lock separately as above, and the read-only issue is suppose to clear after a system restart, but we’ve seen where it doesn’t. You then have two options. The advice at this link may help, http://infinidb.co/community/infinidb-not-starting otherwise, you may have to remove the entire infinidb installation and replace with your latest backup. We are not using a redundant file system such as HDFS or glusterFS, which may or may not help.

Not enough memory to perform select query – Infinidb uses a distributed hash join to evaluate large data sets quickly, the memory requirements of which is dictated by the second largest table in the join. There is a variable, PmMaxMemorySmallSide at a default of 64M up to a max of 4GB on the PM nodes, or all of TotalUmMemory on the UM depending on the data set size (if larger than the PM setting, uses the latter). We frequently join two tables, one at 40+ Billion rows the next at 1 Billion rows which has been an issue for certain queries, although our UM node is very modest at 16GB. Query span does make a difference (i.e. one day at a  time vs. the entire year) so it’s not just a total table size metric.

backups needing to be in read only mode - is a self-explanatory issue. hopefully your needs don’t require frequent/real-time writes/loads 24 hours a day.

no real query plan evaluation  – explain doesn’t give you much as it’s without indexes and use an alternate storage mechanism and execution algorithm, although probably irrelevant as it’s evaluated in a completely different way. Query profiling (set profiling=1) works. The question arises whether differences in the query structure matters and it would be interesting to see the performance of the same query written two different ways that take can longer one way in Innodb, and how they behave differently here (e.g. join vs. subquery class of queries) * – indeed there is a detail analyzer ‘select calgetstats()’ and ‘select calgettrace()’ directly after a query on the command line

sql fickleness – various issues such as,

- not being able to add a column anywhere other than the end of the table
- when doing mysqldump, must do –single-transaction, otherwise will err with lock issue
- All non-aggregate columns in the SELECT and ORDER BY clause must be included in the GROUP BY clause.
- ‘insert into (select..)’ errs, while ‘insert into select’ works

 

And so to sum up, we are using Infinidb and recommend it, however I’d suggest a thorough evaluation to your specific needs before making a commitment, luckily with an open source license, you can do just that.

warehousing and data analytics in a mysql world

Mysql in it’s early days had the reputation of being simple and fast but lacking critical features for many use cases. Over the years, it’s become more stable and mature with a ton of advancements in the innodb storage engine, features such as stored code, replication improvements, query planning improvements, etc.. One area mysql has not tackled is BI and analytic OLAP. I can only assume it’s not been a priority with the user base to date. Oracle and SQL server cover a different market and have these options, however, in this new era of big data, a wider array of businesses want to collect and analyze data such as performance statistics to improve their business offering. Fortunately, there are a few 3rd party vendors who have taken on this area with the spirit of open source and/or tiered pay structure in mind, as you’ll find it can be very costly otherwise.

In our search to analyze the ever increasing volume of statistical data we were collecting, we considered the following products,

  • infobright
  • infinidb
  • luciddb
  • monetdb
  • fastbit
  • metakit
  • greenplum
  • hive

most all of these solutions implement a columnar architecture to the data in one way or another, essentially rows become columns, columns become rows. This is done to take advantage of the layout of the data and allow sequential reading (vs. random reads) of each long row (which was a column) on disk, which will happen much of the time doing OLAP type queries, aggregating on a dataset much larger than memory.

Back to the products, a number we rejected for one reason or another, generally because they were either developed more for the scientific community (metakit, fastbit), geared towards another open source db such as postgresql (greenplum), or focused on distributed systems (hive) which is a bit of overkill for us at this point.

so we were left with,

  • infobright
  • infinidb
  • luciddb
  • monetdb

infobright and infinidb have a free and paid version (the free versions both have limitations), luciddb and monetdb are open source.

our priorities in order were as follows,

- high compression
- ease of use (ideally a mysql client and connection)
- full DDL, DML statement use

there is a very good benchmark comparison done by percona on these 4 dbs, http://www.mysqlperformanceblog.com/2010/01/07/star-schema-bechmark-infobright-infinidb-and-luciddb/

keep in mind it’s a couple of years old, on a specific set of data, so definitely worth testing yourself for your particular needs.

so why do we care about compression? The pros are two-fold, first, getting significant compression obviously saves space, one of the premiums of an OLAP system, the longer you can put off sharding (a distributed system). Typical compression on full products range from a very impressive 10:1 – 40:1, so essentially, you can have 10 TBs of data on a typical 1TB disk. Second, the compression means the disk arm for a sequential read only travels a fraction of the distance, making the operation and read faster.

Right off the bat, infinidb offers only a very modest 22% compression reduction and is their major limitation on the free version (paid you get 40:1 compression), and monetdb it’s not clear there is any meaningful compression at all, the tests matching their non-commital statement about the products compression

Infobright and LucidDb both offer in the neighborhood of 10:1 compression, very nice for a free product.

note: Now, if we were in the market for a paid solution, we take a hard look at InfiniDB along with the 40:1 compression it is a mysql build with their proprietary storage engine incorporated making it an easy transition from stock mysql.

So we are down to,

  • Infobright
  • LucidDB

The two have very different remaining pros and cons, Infobright is also a mysql build with a proprietary storage engine incorporated, however, the drawback is the limitations they’ve implemented on the free version, namely most of the DDL and DML statement functionality has been stripped away. That means you can not insert, update, delete, truncate, or alter a table. At first glance that may sound insane, but keep in mind an analytics environment in many cases needs only primarily two things, to be able to load data, and to be able to select data, and do it quickly, which it can do very well, mysql style. However if you wish to change or alter the table or data, you’ll need to fully reload the table, and these restriction may be tough for an environment needing to serve customers, however our requirements are only a daily load, appending the existing data, internal analysis, and we rarely change table structure on these particular tables, and so we are willing to deal with a dump and reload of a table occasionally if need be.

LucidDB is a completely different animal, requiring Java to run, and uses it’s own SQL variant and data loading methods (for instance, there is no datetime datatype, you’re only choice is varchar to keep the data integrity). It does have full DDL, DML statement functionality and no other obvious restrictions. Certainly with a learning curve, we could use this product, but when considering the people that would use this database, such as BAs and developers, it made sense to choose a product that is as consistent as possible with what we are currently using.

So infobright was the choice, with some work arounds to make it usable. Truncate and rename table can be emulated, and other issues had to be dealt with, for instance, you can not simply copy over an innodb schema to a brighthouse schema in many cases, it will complain about keys, foreign keys, auto increment, unsigned datatypes, etc.. We wrote a script to parse this stuff out. Not even close to being usable generally, just our specific tables, otherwise I’d include it here. Another example the utf8 charset is handled badly, you’ll need to increase your datatype definition size 4x e.g. varchar(32) -> varchar(128). There are number of odd things such as this, I’m not so sure I’d be as eager to use for outside customers, but for our internal purposes, it will fit just fine.