It sounds like the op is trying to provision and load 150GB in a reasonably fast...

ramraj07 · on March 16, 2021

I am trying to load and serve the Microsoft academic graph to produce author profile pages for all academic authors! Microsoft and google already do this but IMO they leave a lot to be desired.

But this means there are a hundred million entities, publishing 3x number of papers and a bunch of metadata associated. On redshift I can get all of this loaded in minutes and takes like 100G but Postgres loads are pathetic comparatively.

And I have no intention of spending more than 30 bucks a month! So hard problem for sure! Suggestions welcome!

shrubble · on March 16, 2021

There are settings in Postgres that allow for bulk loading.

By default you get a commit after each INSERT which slows things down by a lot.