fix(spanner_dbapi): replace insecure pickle with json for partition deserialization#17014
fix(spanner_dbapi): replace insecure pickle with json for partition deserialization#17014sinhasubham wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request replaces the use of pickle with json for serializing and deserializing partition IDs to mitigate security risks associated with insecure deserialization. It introduces _serialize_value and _deserialize_value helper functions to handle specific types like bytes, datetime, and protobuf messages. Review feedback points out that MessageToDict defaults to camelCase, which could break compatibility with code expecting snake_case, and suggests using preserving_proto_field_name=True. Additionally, the reviewer noted that protobuf messages are currently deserialized as dictionaries rather than original message objects, which may lead to issues with nested field types.
| } | ||
|
|
||
|
|
||
| def _serialize_value(val: Any) -> Any: |
There was a problem hiding this comment.
I think that the chosen solution approach has a design flaw, and that is that it tries to manually serialize a high-level Statement object into JSON. That means that it needs to take into account every single query parameter type that a Statement can contain, and that list is long and, most importantly, not fixed. Someone could add support for a new high-level type tomorrow, and that would then break this serialization, unless they also add it to this serializer.
We can 'fix' all of this by adding all supported types here, and also add a test that tries to catch it if someone adds a new type in the future that is not in this list. I added 5ef6394 to illustrate all of this. But as you can see: The code becomes quite bloated, and the test uses meta-programming to verify that this serialization is consistent with the _make_value_pb function (which is a code smell).
I would suggest that we try a different strategy for this, and instead use a two-step serialization strategy:
- First convert the
Statementinto anExecuteSqlRequestusing the standard functions in the client library (or at least convert the query parameters of theStatementinto theStructthat would have been used for theExecuteSqlRequest). - Then serialize the
ExecuteSqlRequest(or theStructwith the query parameters).
This removes the need to manually map all possible high-level query parameter types, as ExecuteSqlRequest is guaranteed to only use well-known types for query parameters, and this list is fixed.
This PR resolves a critical Insecure Deserialization vulnerability (potential Remote Code Execution) in the
spanner_dbapimodule b/510871112 . Previously, the module utilizedpickle.loads()to decode partition IDs provided by users via theRUN PARTITIONstatement, creating a possibility for arbitrary code execution attack payloads.We have fully eliminated
pickleusage in this module and migrated to standardjsonserialization.