database - What is the best way to load a massive amount of data into PostgreSQL? -


i want load massive amount of data postgresql. know other "tricks" apart ones mentioned in postgresql's documentation?

what have done now?

1) set following parameters in postgresql.conf (for 64 gb of ram):

    shared_buffers = 26gb      work_mem=40gb     maintenance_work_mem = 10gb       #  min 1mb default: 16 mb     effective_cache_size = 48gb     max_wal_senders = 0     # max number of walsender processes     wal_level = minimal         # minimal, archive, or hot_standby     synchronous_commit = off # apply when system load data (if there other updates clients can result in data loss!)     archive_mode = off      # allows archiving done     autovacuum = off            # enable autovacuum subprocess?  'on'     checkpoint_segments = 256       # in logfile segments, min 1, 16mb each; default = 3; 256 = write every 4 gb     checkpoint_timeout = 30min         # range 30s-1h, default = 5min     checkpoint_completion_target = 0.9  # checkpoint target duration, 0.0 - 1.0     checkpoint_warning = 0              # 0 disables, default = 30s 

2) transactions (disabled autocommit) + set isolation level (the lowest possible: repeatable read) create new table , load data in same transaction.

3) set copy commands run single transaction (supposedly fastest approach copy data)

5) disabled autovacuum (will not regenerate statistics after new 50 rows added)

6) freeze copy freeze not speed import makes operations after import faster.

do have other recommendations or maybe not agree aforementioned settings?

do not use indexes except unique single numeric key.

that doesn't fit db theory received testing heavy loads of data demonstrate it. here result of 100m loads @ time reach 2 billions rows in table, , each time bunch of various queries on resulting table. first graphic 10 gigabit nas (150mb/s), second 4 ssd in raid 0 (r/w @ 2gb/s).

index use vs sequential - 150mb/s disks

if have more 200 millions row in table on regular disks, it's faster if forget indexes. on ssd's, limit @ 1 billion.

index use vs sequential - 2 gb/s ssd

i've done partitions better results pg9.2 it's difficult benefit them if use stored procedures. have take care of writing/reading 1 partition @ time. partitions way go keep tables below 1 billion row wall. helps a lot multiprocess loads. ssd, single process let me insert (copy) 18,000 rows/s (with processing work included). multiprocessing on 6 cpu, grows 80,000 rows/s.

watch cpu & io usage while testing optimize both.


Comments

Popular posts from this blog

android - MPAndroidChart - How to add Annotations or images to the chart -

javascript - Add class to another page attribute using URL id - Jquery -

firefox - Where is 'webgl.osmesalib' parameter? -