Case Study: Using Redis intersect on very large sets
Week or two ago, we needed to calculate difference between two large MySQL tables.
The client had csv dumb in text file of both tables. Because of that we decided to import those dumps in two Redis sets and to see what will happen.
require_once "config.php";
$file = "dump_keys.001.txt.gz";
echo "Importing $file...\n";
$i = 0;
if ($F = gzopen($file, "r")){
while( ( $line = fgets($F, 1024)) !== false){
$line = trim(strtolower($line));
$r->sadd("DUMP1", $line);
printf("%10d %s\n", $i, $line);
$i++;
}
fclose($F);
}
Both dumps were around 120,000,000 items (keys of MySQL records) and were imported for under 30 min each.
Redis used memory were 21.87 GB.
Finally we did:
redis 127.0.0.1:6379> SDIFFSTORE DUMP DUMP1 DUMP2
(integer) 2655487
(264.29s)
As seen, the intersect took less 300 seconds (less 5 min).
Finally we did script that SPOP() from result set and do some MySQL processing. Whole "difference calculation" were done for less 2 hours.
We did not use cloud, because we have lots of cheap dedicated servers, but example is perfect for demonstrating power of cloud computing.
In case one use Amazon AWS, "Double Extra Large" is the way to go.
Since it it one-time job, suppose it will take 3 hours x 0.90 USD.
This makes price less 3 USD to get the job done.
blog comments powered by Disqus
Articles library
Redis as session handler in PHP
How to use Redis for session handler in PHP in a way similar to msession or PostgreSQL sessions
Date: 2011-07
Tags: system administration php
Lazy starter script on Linux such Redhat CentOS or Fedora
How to quickly made Redis to start after Linux boot.
Date: 2011-07
Tags: system administration
Redis vs Memcached comparison
Here is basic comparison of Redis Memcached and Memcachedb.
Date: 2011-07
Tags: memcached
Why RDBMS and SQL are difficult...
...while sometimes NoSQL solution may be easier and way faster? This arcle deals with incremental counters and how they needs to be implemented properly in RDBMS-es such MySQL
Date: 2011-07
Updated: 2011-09-30
Tags: programming anti-pattern mysql
Seamless migration from one Redis server to another
How to use replication to migrate Redis server without service interruption
Date: 2011-07
Tags: system administration
Redis swap issue while save
Explanation why your server load average may spike and server to swap memory when running redis
Date: 2011-08-21
Tags: system administration
Amazon EC2 and Amazon ElastiCache service
Memcached at Amazon first impressions
Date: 2011-08-23
Tags: cloud memcached system administration
Understanding hash-max-zipmap-entries and 'hash of hashes' optimization
Explanation of config parameters such hash-max-zipmap-entries and how we can make huge memory savings in new Redis 2.2
Date: 2011-08-28
Tags: programming system administration optimization php
NoSQL database design example
This article explains how we made our article section
Date: 2011-09-02
Tags: programming database design mysql
Redis save and backup script
Simple backup script for Redis
Date: 2011-09-28
Tags: optimization system administration
Postgres 9.1 foreign data wrapper interface
Postgres 9.1 gives you ability to access data from different data sources including Redis
Date: 2011-10-20
Tags: programming mysql optimization memcached pgsql
Redis high traffic connection issue
Explanation on 'Cannot assign requested address' connection error
Date: 2011-11-12
Tags: system administration optimization scaling
Redis too many open files error on high traffic sites
Explanation on 'Accepting client connection: accept: Too many open files'
Date: 2011-12-21
Tags: memcached system administration security
PHP Redis bug with PHP 5.4
Description of bug PHP Redis bug
Date: 2012-07-02
Tags: system administration php
Case Study: Using Redis intersect on very large sets
Intersection of 2 x 120M MySQL records
Date: 2012-08-29
Tags: programming mysql cloud optimization php
Getting started with Python and Redis on CentOS 6.X
How to install and test Redis client for Python on CentOS
Date: 2013-03-21
Tags: system administration python
GeoIP in Redis
Put Maxmind's GeoIP in Redis for best berformance for cient apps written in PHP or Python
Date: 2013-03-24
Tags: php python programming mysql optimization scaling
