Amazon opens their real NoSQL database, awesome potential for data mining and business intelligence

Amazon have turned on access to Dynamo, which runs much of the infrastructure behind Amazon Web Services and provided the inspiration for Cassandra. All your data is hosted on SSDs (Solid State Drives) which means it should be pretty damn fast.

In tech terms this is a tuneable-consistency and throughput key-value store. That means you can dial up or down the resources (and cost) depending on your performance and consistency (whether or not the very latest data is available in your query) needs.

This kind of thing is useful in applications where you've got unknown and changing volumes of data turning up that you want to mine in real time, that will eventually grow to enormous volumes that you still need to be able to store and mine quickly. Web analytics is a good example: you don't know how many people will be hitting your web site, so you start with the default "5 writes/second" but can dial it up and back when there's peaks of load. If you use something like NodeDB to do the data collection and inserting, any backlog of data writes will just queue up until it gets in if there's momentary peaks in demand.

Querying these key-value stores isn't as easy as with relational databases, which have had over 40 years of development and can be very flexibly queried. The advantage they hold over relational databases is that they scale linearly. If a query on 100 records takes a second, a query of 100000 records will also take a second provided you throw 1000x the hardware at the database server. Relational databases are very hard to scale this way once you grow past small clusters of machines. With relational database clusters, ensuring each replica of the data is consistent takes up ever-increasing fractions of the resources, meaning the scaling curve is anything but linear.

Check out the Datalicious Supertag: Container tag for smart tag management

Filed under  //

Posted by email

Google Analytics keeps innovating, introduces new Flow Visualizations and Real-Time analytics

Google has added an exciting new feature to its Analytics tool to help you visualize how different classes of visitor move around your site. There have been a few attempts at this kind of visualization, none of which have been particularly useful. We think this one hits the mark rather well.

Flow_visualizations
Starting on the left you see where visitors have come from, which is the default but you can start from other segments, then the landing pages and subsequent pages. The Datalicious site is pretty simple, so there's not much, but what we see in our clients' sites is much more interesting, with the full complexity of connections laid out in a really interesting way. This new feature will be gradually rolled out to all Google Analytics users over the next few weeks.

It comes hot on the heels of Google's Real Time view, which is also being gradually rolled out and gives a brilliant live dashboard of who is on your site right now. Google are clearly taking the analytics space very seriously, with lots of very cool stuff in the pipeline that we can't talk about yet!

Real_time
Email us at insights@datalicious.com or call us on 1300 209 601 if you need help with your Google Analytics set-up or would simply like someone to analyse your data and make some campaign and website optimisation suggestions.
Check out the Datalicious Supertag: Container tag for smart tag management

SEO will get harder: Google about to introduce SSL for logged-in user searches, hiding keywords

Google have announced that more of their search traffic will be served from an encrypted SSL web server. What this means is that anyone arriving on your site will no longer pass through the keywords using in the search. This currently only applies to users who explicitly search from https://encrypted.google.com/. This latest change means anyone who is logged into a Google account will end up on the encrypted search.

This move suggests a trend of Google moving everyone to encrypted traffic, though undoubtedly it has taken some time to get the more complex and more difficult-to-scale SSL infrastructure built up to support the traffic. This is undoubtedly a good thing for security and freedom. The analogy that cypherpunks use is that if everyone wrote their letters on postcards, anyone who put a letter in an envelope would be suspicious. By putting all search traffic behind encryption, your boss, your network staff, your ISP, your censorship-obsessed control freak government or your despotic dictator won't be able to see what you're searching, whether you have something to hide or not.

The downside for online marketers, and particularly search marketers, is that organic search keywords will no longer be available in your web analytics data. The search keywords are passed through on the Referrer header when someone clicks on a link in a search engine, but only when the search engine is served by HTTP. When served from HTTPS (SSL) the referrer is not sent by standards-compliant browsers.

What this means is that search marketers will need to rely on search engine-supplied data to know what keywords people are searching for, and will have very little useful feedback of organic optimisation strategies. We're going to be flying blind.

Screen_shot_2011-10-19_at_10
Check out the Datalicious Supertag: Container tag for smart tag management

Google Analytics changes sessions tracking, find out how this affects the visit metric and your reports

A small but potentially quite big change to Google Analytics has been pushed out. It relates to how sessions are handled, which you'll see in the "Visits" metric in Google analytics.

Most web analytics tools have to work out a "visit" metric to try and give you some idea of people coming to, then leaving, your site. The problem is that the analytics tools don't get a "goodbye" message when the visitor leaves the site, so they have to use other mechanisms. The traditional way is a timeout of 30 minutes without any activity from the visitor, or if the visitor closes his browser.

This change to Google Analytics introduces the idea of any change to the traffic source value resulting in a new session, that is a new visit. These values are set whenever a visitor arrives from an external site, meaning any new arrival basically, even if it happens within the previous 30 minute time-out bracket.

It's a pretty sensible change to the way the data is calculated and will affect all reports that include the "Visits" metrics. Any business wants to know how many people come through the front door, and how well they're converting them into paying customers. Even people who come through the front door more than once should be counted in that metric.

Details from Google about the change, including some rather panicky comments from users who are alarmed, can be found on the official Google Analytics blog post about the session tracking recent changes.
Check out the Datalicious Supertag: Container tag for smart tag management

WhereScape RED: Building enterprise grade data warehouses in the cloud just got quicker and easier

Our data team have just completed training on WhereScape RED, which is an amazing tool for building data warehouses. Datalicious will be using the tool to help our clients to combine complex data from multiple sources in one single Oracle data warehouse in the cloud in a streamlined way so we can start delivering some actual insights faster! This will make stuff quicker, cheaper and easier. Exciting times!

Well done Kent, Mahesh and Chaoming.

(download)
Check out the Datalicious Supertag: Container tag for smart tag management