Skip to content

Commit c32dba4

Browse files
committed
Support copying from glob patterns
Closes #112.
1 parent a1193c9 commit c32dba4

File tree

10 files changed

+549
-108
lines changed

10 files changed

+549
-108
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ aws-config = { version = "=1.5.18", default-features = false, features = ["rustl
2727
aws-credential-types = {version = "=1.2.1", default-features = false}
2828
azure_storage = {version = "0.21", default-features = false}
2929
futures = "0.3"
30+
glob = "0.3"
3031
home = "0.5"
3132
object_store = {version = "0.12", default-features = false, features = ["aws", "azure", "fs", "gcp", "http"]}
3233
once_cell = "1"

README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ COPY table FROM 's3://mybucket/data.parquet' WITH (format 'parquet');
2222
- [Inspect Parquet schema](#inspect-parquet-schema)
2323
- [Inspect Parquet metadata](#inspect-parquet-metadata)
2424
- [Inspect Parquet column statistics](#inspect-parquet-column-statistics)
25+
- [List and read Parquet files from uri pattern](#list-and-read-parquet-files-from-uri-pattern)
2526
- [Object Store Support](#object-store-support)
2627
- [Copy Options](#copy-options)
2728
- [Configuration](#configuration)
@@ -192,6 +193,40 @@ SELECT * FROM parquet.column_stats('/tmp/product_example.parquet')
192193
(13 rows)
193194
```
194195

196+
### List and read Parquet files from uri pattern
197+
198+
You can call `SELECT * FROM parquet.list(<uri_pattern>)` to see all uris that matches with the uri pattern.
199+
Uri pattern can resolve `**` for directories and `*` for words in the uri.
200+
201+
202+
```sql
203+
COPY (SELECT i FROM generate_series(1, 1000000) i) TO '/tmp/some/test.parquet' with (file_size_bytes '1MB');
204+
COPY 1000000
205+
206+
SELECT * FROM parquet.list('/tmp/some/**/*.parquet');
207+
uri | size
208+
---------------------------------------+---------
209+
/tmp/some/test.parquet/data_4.parquet | 100162
210+
/tmp/some/test.parquet/data_3.parquet | 1486916
211+
/tmp/some/test.parquet/data_2.parquet | 1486916
212+
/tmp/some/test.parquet/data_0.parquet | 1486920
213+
/tmp/some/test.parquet/data_1.parquet | 1486916
214+
(5 rows)
215+
216+
```
217+
218+
Uri pattern is also supported by `COPY FROM` for all supported object stores except `http(s)` endpoints.
219+
```sql
220+
COPY (SELECT i FROM generate_series(1, 1000000) i) TO 's3://testbucket/some/test.parquet' with (file_size_bytes '1MB');
221+
COPY 1000000
222+
223+
CREATE TABLE test(a int);
224+
CREATE TABLE
225+
226+
COPY test FROM 's3://testbucket/some/**/*.parquet';
227+
COPY 1000000
228+
```
229+
195230
## Object Store Support
196231
`pg_parquet` supports reading and writing Parquet files from/to `S3`, `Azure Blob Storage`, `http(s)` and `Google Cloud Storage` object stores.
197232

0 commit comments

Comments
 (0)