Loading A Lot Of Data Into Google Bigquery From Python
Solution 1:
Note that for streaming data to BQ, anything above 10k rows/sec requires talking to a sales rep.
If you'd like to send large chunks directly to BQ, you can send it via POST
. If you're using a client library, it should handle making the upload resumable for you. To do this, you'll need to make a call to jobs.insert()
instead of tabledata.insertAll()
, and provide a description of a load
job. To actually push the bytes using the Python client, you can create a MediaFileUpload
or MediaInMemoryUpload
and pass it as the media_body
parameter.
The other option is to stage the data in Google Cloud Storage and load it from there.
Solution 2:
The example here uses the resumable upload to upload a CSV file. While the file used is small, it should work for virtually any size upload since it uses a robust media upload protocol. It sounds like you want json, which means you'd need to tweak the code slightly for json (an example for json is in the load_json.py example in the same directory). If you have a stream you want to upload instead of a file, you can use a MediaInMemoryUpload instead of the MediaFileUpload that is used in the example.
BTW ... Craig's answer is correct, I just thought I'd chime in with links to sample code.
Post a Comment for "Loading A Lot Of Data Into Google Bigquery From Python"