database - What is the best way to load a massive amount of data into PostgreSQL? -
i want load massive amount of data postgresql. know other "tricks" apart ones mentioned in postgresql's documentation?
what have done now?
1) set following parameters in postgresql.conf (for 64 gb of ram):
shared_buffers = 26gb work_mem=40gb maintenance_work_mem = 10gb # min 1mb default: 16 mb effective_cache_size = 48gb max_wal_senders = 0 # max number of walsender processes wal_level = minimal # minimal, archive, or hot_standby synchronous_commit = off # apply when system load data (if there other updates clients can result in data loss!) archive_mode = off # allows archiving done autovacuum = off # enable autovacuum subprocess? 'on' checkpoint_segments = 256 # in logfile segments, min 1, 16mb each; default = 3; 256 = write every 4 gb checkpoint_timeout = 30min # range 30s-1h, default = 5min checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 checkpoint_warning = 0 # 0 disables, default = 30s
2) transactions (disabled autocommit) + set isolation level (the lowest possible: repeatable read) create new table , load data in same transaction.
3) set copy
commands run single transaction (supposedly fastest approach copy
data)
5) disabled autovacuum (will not regenerate statistics after new 50 rows added)
6) freeze copy freeze
not speed import makes operations after import faster.
do have other recommendations or maybe not agree aforementioned settings?
do not use indexes except unique single numeric key.
that doesn't fit db theory received testing heavy loads of data demonstrate it. here result of 100m loads @ time reach 2 billions rows in table, , each time bunch of various queries on resulting table. first graphic 10 gigabit nas (150mb/s), second 4 ssd in raid 0 (r/w @ 2gb/s).
if have more 200 millions row in table on regular disks, it's faster if forget indexes. on ssd's, limit @ 1 billion.
i've done partitions better results pg9.2 it's difficult benefit them if use stored procedures. have take care of writing/reading 1 partition @ time. partitions way go keep tables below 1 billion row wall. helps a lot multiprocess loads. ssd, single process let me insert (copy) 18,000 rows/s (with processing work included). multiprocessing on 6 cpu, grows 80,000 rows/s.
watch cpu & io usage while testing optimize both.
Comments
Post a Comment