Case Study: Using Redis intersect on very large sets

Week or two ago, we needed to calculate difference between two large MySQL tables.

The client had csv dumb in text file of both tables. Because of that we decided to import those dumps in two Redis sets and to see what will happen.

require_once "config.php";

$file = "dump_keys.001.txt.gz";

echo "Importing $file...\n";

$i = 0;

if ($F = gzopen($file, "r")){
        while( ( $line = fgets($F, 1024)) !== false){
                $line = trim(strtolower($line));

                $r->sadd("DUMP1", $line);

                printf("%10d %s\n", $i, $line);


Both dumps were around 120,000,000 items (keys of MySQL records) and were imported for under 30 min each.

Redis used memory were 21.87 GB.

Finally we did:

(integer) 2655487

As seen, the intersect took less 300 seconds (less 5 min).

Finally we did script that SPOP() from result set and do some MySQL processing. Whole "difference calculation" were done for less 2 hours.

We did not use cloud, because we have lots of cheap dedicated servers, but example is perfect for demonstrating power of cloud computing.

In case one use Amazon AWS, "Double Extra Large" is the way to go. Since it it one-time job, suppose it will take 3 hours x 0.90 USD.
This makes price less 3 USD to get the job done.

blog comments powered by Disqus

Articles library

Redis as session handler in PHP

How to use Redis for session handler in PHP in a way similar to msession or PostgreSQL sessions
Date: 2011-07

Tags: php system administration

Lazy starter script on Linux such Redhat CentOS or Fedora

How to quickly made Redis to start after Linux boot.
Date: 2011-07

Tags: system administration

Redis vs Memcached comparison

Here is basic comparison of Redis Memcached and Memcachedb.
Date: 2011-07

Tags: memcached

Why RDBMS and SQL are difficult...

...while sometimes NoSQL solution may be easier and way faster? This arcle deals with incremental counters and how they needs to be implemented properly in RDBMS-es such MySQL
Date: 2011-07
Updated: 2011-09-30

Tags: mysql programming anti-pattern

Seamless migration from one Redis server to another

How to use replication to migrate Redis server without service interruption
Date: 2011-07

Tags: system administration

Redis swap issue while save

Explanation why your server load average may spike and server to swap memory when running redis
Date: 2011-08-21

Tags: system administration

Amazon EC2 and Amazon ElastiCache service

Memcached at Amazon first impressions
Date: 2011-08-23

Tags: memcached cloud system administration

Understanding hash-max-zipmap-entries and 'hash of hashes' optimization

Explanation of config parameters such hash-max-zipmap-entries and how we can make huge memory savings in new Redis 2.2
Date: 2011-08-28

Tags: optimization php programming system administration

NoSQL database design example

This article explains how we made our article section
Date: 2011-09-02

Tags: mysql database design programming

Redis save and backup script

Simple backup script for Redis
Date: 2011-09-28

Tags: optimization system administration

Postgres 9.1 foreign data wrapper interface

Postgres 9.1 gives you ability to access data from different data sources including Redis
Date: 2011-10-20

Tags: memcached optimization mysql programming pgsql

Redis high traffic connection issue

Explanation on 'Cannot assign requested address' connection error
Date: 2011-11-12

Tags: optimization scaling system administration

Redis too many open files error on high traffic sites

Explanation on 'Accepting client connection: accept: Too many open files'
Date: 2011-12-21

Tags: memcached security system administration

PHP Redis bug with PHP 5.4

Description of bug PHP Redis bug
Date: 2012-07-02

Tags: php system administration

Case Study: Using Redis intersect on very large sets

Intersection of 2 x 120M MySQL records
Date: 2012-08-29

Tags: optimization cloud php mysql programming

Getting started with Python and Redis on CentOS 6.X

How to install and test Redis client for Python on CentOS
Date: 2013-03-21

Tags: python system administration

GeoIP in Redis

Put Maxmind's GeoIP in Redis for best berformance for cient apps written in PHP or Python
Date: 2013-03-24

Tags: optimization php scaling mysql programming python

Redis database file examination

Examination of Redis rdb file in order to find huge memory consumption
Date: 2013-08-23

Tags: optimization system administration

© Jul.2011 - 2019,
This website uses cookies to improve user experience. By using our website you consent to all cookies in accordance with our Cookie Policy.