Case Study: Using Redis intersect on very large sets

Week or two ago, we needed to calculate difference between two large MySQL tables.

The client had csv dumb in text file of both tables. Because of that we decided to import those dumps in two Redis sets and to see what will happen.

require_once "config.php";

$file = "dump_keys.001.txt.gz";

echo "Importing $file...\n";

$i = 0;

if ($F = gzopen($file, "r")){
        while( ( $line = fgets($F, 1024)) !== false){
                $line = trim(strtolower($line));

                $r->sadd("DUMP1", $line);

                printf("%10d %s\n", $i, $line);
                $i++;
        }

        fclose($F);
}

Both dumps were around 120,000,000 items (keys of MySQL records) and were imported for under 30 min each.

Redis used memory were 21.87 GB.

Finally we did:

redis 127.0.0.1:6379> SDIFFSTORE DUMP DUMP1 DUMP2
(integer) 2655487
(264.29s)

As seen, the intersect took less 300 seconds (less 5 min).

Finally we did script that SPOP() from result set and do some MySQL processing. Whole "difference calculation" were done for less 2 hours.

We did not use cloud, because we have lots of cheap dedicated servers, but example is perfect for demonstrating power of cloud computing.

In case one use Amazon AWS, "Double Extra Large" is the way to go. Since it it one-time job, suppose it will take 3 hours x 0.90 USD.
This makes price less 3 USD to get the job done.


blog comments powered by Disqus





Articles library

Redis as session handler in PHP

How to use Redis for session handler in PHP in a way similar to msession or PostgreSQL sessions
Date: 2011-07

Tags: system administration php


Lazy starter script on Linux such Redhat CentOS or Fedora

How to quickly made Redis to start after Linux boot.
Date: 2011-07

Tags: system administration


Redis vs Memcached comparison

Here is basic comparison of Redis Memcached and Memcachedb.
Date: 2011-07

Tags: memcached


Why RDBMS and SQL are difficult...

...while sometimes NoSQL solution may be easier and way faster? This arcle deals with incremental counters and how they needs to be implemented properly in RDBMS-es such MySQL
Date: 2011-07
Updated: 2011-09-30

Tags: programming anti-pattern mysql


Seamless migration from one Redis server to another

How to use replication to migrate Redis server without service interruption
Date: 2011-07

Tags: system administration


Redis swap issue while save

Explanation why your server load average may spike and server to swap memory when running redis
Date: 2011-08-21

Tags: system administration


Amazon EC2 and Amazon ElastiCache service

Memcached at Amazon first impressions
Date: 2011-08-23

Tags: cloud memcached system administration


Understanding hash-max-zipmap-entries and 'hash of hashes' optimization

Explanation of config parameters such hash-max-zipmap-entries and how we can make huge memory savings in new Redis 2.2
Date: 2011-08-28

Tags: programming system administration optimization php


NoSQL database design example

This article explains how we made our article section
Date: 2011-09-02

Tags: programming database design mysql


Redis save and backup script

Simple backup script for Redis
Date: 2011-09-28

Tags: optimization system administration


Postgres 9.1 foreign data wrapper interface

Postgres 9.1 gives you ability to access data from different data sources including Redis
Date: 2011-10-20

Tags: programming mysql optimization memcached pgsql


Redis high traffic connection issue

Explanation on 'Cannot assign requested address' connection error
Date: 2011-11-12

Tags: system administration optimization scaling


Redis too many open files error on high traffic sites

Explanation on 'Accepting client connection: accept: Too many open files'
Date: 2011-12-21

Tags: memcached system administration security


PHP Redis bug with PHP 5.4

Description of bug PHP Redis bug
Date: 2012-07-02

Tags: system administration php


Case Study: Using Redis intersect on very large sets

Intersection of 2 x 120M MySQL records
Date: 2012-08-29

Tags: programming mysql cloud optimization php


Getting started with Python and Redis on CentOS 6.X

How to install and test Redis client for Python on CentOS
Date: 2013-03-21

Tags: system administration python


GeoIP in Redis

Put Maxmind's GeoIP in Redis for best berformance for cient apps written in PHP or Python
Date: 2013-03-24

Tags: php python programming mysql optimization scaling


Redis database file examination

Examination of Redis rdb file in order to find huge memory consumption
Date: 2013-08-23

Tags: system administration optimization


© Jul.2011 - 2014, E-Nick.org