Clouddataflow Can Not Use "google.cloud.datastore" Package?
Solution 1:
The recommended way to interact with Cloud Datastore from a Cloud Dataflow Pipeline is to use the Datastore I/O API, which is available through the Dataflow SDK and provides some methods to read and write data to a Cloud Datastore database.
You can find detailed documentation for the Datastore I/O package for Dataflow SDK 2.x for Python in this other link. The datastore.v1.datastoreio
module is the specific module that you want to use. There is plenty of information in the links I am sharing, but in short, it is a connector to Datastore that uses PTransform
to read / write / delete a PCollection
from Datastore using the classes ReadFromDatastore()
/ WriteToDatastore()
/ DeleteFromDatastore()
respectively.
You should try using it instead of implementing the calls yourself. I suspect this may be the reason for the error you are seeing, as a Datastore implementation already exists in the Dataflow SDK:
"google.datastore.v1.PartitionId.project_id"is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
UPDATE:
It looks like those three classes collect several mutations and executes them in a single transaction. You can check that in the code describing the classes.
If the aim is to retrieve (get()
) and then update (put()
) a Datastore entity, you can probably work with the write_mutations()
function, which is described in the documentation, and you can work with a full batch of mutations performing the operations you are interested in.
Post a Comment for "Clouddataflow Can Not Use "google.cloud.datastore" Package?"