Skip to content

[python] Add HDFS native FileIO backend (no Hadoop install required)#8031

Draft
TheR1sing3un wants to merge 2 commits into
apache:masterfrom
TheR1sing3un:feat_python_hdfs_native
Draft

[python] Add HDFS native FileIO backend (no Hadoop install required)#8031
TheR1sing3un wants to merge 2 commits into
apache:masterfrom
TheR1sing3un:feat_python_hdfs_native

Conversation

@TheR1sing3un
Copy link
Copy Markdown
Member

Introduces HdfsNativeFileIO backed by the hdfs-native protocol client (Rust + PyO3)

Default backend for hdfs:// and viewfs:// switches to native; the PyArrow / libhdfs path is kept, with auto-fallback when hdfs-native is unavailable (e.g. on Windows or when the extra is not installed).

Adds: HdfsNativeFileIO, HdfsOptions, _kerberos helpers, unit tests, Docker-based e2e scaffold, native vs pyarrow benchmark, README section.

Introduces HdfsNativeFileIO backed by the hdfs-native protocol client
(Rust + PyO3). Removes the runtime need for HADOOP_HOME / JDK / libhdfs
on the client side. viewfs mount tables and HA NameNode lists can come
from local xml (HADOOP_CONF_DIR / hdfs.conf-dir option) or directly
from catalog options delivered by a REST catalog (keys with prefixes
dfs./fs./hadoop./ipc./io. are forwarded as-is).

Default backend for hdfs:// and viewfs:// switches to native; the
PyArrow / libhdfs path is kept, with auto-fallback when hdfs-native
is unavailable (e.g. on Windows or when the extra is not installed).

Adds: HdfsNativeFileIO, HdfsOptions, _kerberos helpers, unit tests,
Docker-based e2e scaffold, native vs pyarrow benchmark, README section.
RAT check failed on docker-compose.yml and the e2e README. Add the
standard Apache 2.0 header to both.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant