@@ -22,6 +22,7 @@ COPY table FROM 's3://mybucket/data.parquet' WITH (format 'parquet');
22
22
- [ Inspect Parquet schema] ( #inspect-parquet-schema )
23
23
- [ Inspect Parquet metadata] ( #inspect-parquet-metadata )
24
24
- [ Inspect Parquet column statistics] ( #inspect-parquet-column-statistics )
25
+ - [ List and read Parquet files from uri pattern] ( #list-and-read-parquet-files-from-uri-pattern )
25
26
- [ Object Store Support] ( #object-store-support )
26
27
- [ Copy Options] ( #copy-options )
27
28
- [ Configuration] ( #configuration )
@@ -192,6 +193,40 @@ SELECT * FROM parquet.column_stats('/tmp/product_example.parquet')
192
193
(13 rows)
193
194
```
194
195
196
+ ### List and read Parquet files from uri pattern
197
+
198
+ You can call ` SELECT * FROM parquet.list(<uri_pattern>) ` to see all uris that matches with the uri pattern.
199
+ Uri pattern can resolve ` ** ` for directories and ` * ` for words in the uri.
200
+
201
+
202
+ ``` sql
203
+ COPY (SELECT i FROM generate_series(1 , 1000000 ) i) TO ' /tmp/some/test.parquet' with (file_size_bytes ' 1MB' );
204
+ COPY 1000000
205
+
206
+ SELECT * FROM parquet .list (' /tmp/some/**/*.parquet' );
207
+ uri | size
208
+ -- -------------------------------------+---------
209
+ / tmp/ some/ test .parquet / data_4 .parquet | 100162
210
+ / tmp/ some/ test .parquet / data_3 .parquet | 1486916
211
+ / tmp/ some/ test .parquet / data_2 .parquet | 1486916
212
+ / tmp/ some/ test .parquet / data_0 .parquet | 1486920
213
+ / tmp/ some/ test .parquet / data_1 .parquet | 1486916
214
+ (5 rows)
215
+
216
+ ```
217
+
218
+ Uri pattern is also supported by ` COPY FROM ` for all supported object stores except ` http(s) ` endpoints.
219
+ ``` sql
220
+ COPY (SELECT i FROM generate_series(1 , 1000000 ) i) TO ' s3://testbucket/some/test.parquet' with (file_size_bytes ' 1MB' );
221
+ COPY 1000000
222
+
223
+ CREATE TABLE test (a int );
224
+ CREATE TABLE
225
+
226
+ COPY test FROM ' s3://testbucket/some/**/*.parquet' ;
227
+ COPY 1000000
228
+ ```
229
+
195
230
## Object Store Support
196
231
` pg_parquet ` supports reading and writing Parquet files from/to ` S3 ` , ` Azure Blob Storage ` , ` http(s) ` and ` Google Cloud Storage ` object stores.
197
232
0 commit comments