Skip to content Skip to sidebar Skip to footer

Loading A Lot Of Data Into Google Bigquery From Python

I've been struggling to load big chunks of data into bigquery for a little while now. In Google's docs, I see the insertAll method, which seems to work fine, but gives me 413 'Enti

Solution 1:

Note that for streaming data to BQ, anything above 10k rows/sec requires talking to a sales rep.

If you'd like to send large chunks directly to BQ, you can send it via POST. If you're using a client library, it should handle making the upload resumable for you. To do this, you'll need to make a call to jobs.insert() instead of tabledata.insertAll(), and provide a description of a load job. To actually push the bytes using the Python client, you can create a MediaFileUpload or MediaInMemoryUpload and pass it as the media_body parameter.

The other option is to stage the data in Google Cloud Storage and load it from there.

Solution 2:

The example here uses the resumable upload to upload a CSV file. While the file used is small, it should work for virtually any size upload since it uses a robust media upload protocol. It sounds like you want json, which means you'd need to tweak the code slightly for json (an example for json is in the load_json.py example in the same directory). If you have a stream you want to upload instead of a file, you can use a MediaInMemoryUpload instead of the MediaFileUpload that is used in the example.

BTW ... Craig's answer is correct, I just thought I'd chime in with links to sample code.

Post a Comment for "Loading A Lot Of Data Into Google Bigquery From Python"