google bigquery - Appending to a table using a limit query doesn't honour the limit value -

June 15, 2015

i'm experimenting appending data bigquery tables selecting other tables, using command-line tool, this;

bq --nosync query --max_rows 0 --batch --destination_table=mydataset.append_test 'select * [mydataset.source_table] limit 500'

at point, "source_table" has under 5 billion rows, , "append_test" doesn't exist.

when job completes, "append_test" has been created, contains 3,605 rows, instead of 500 rows expected. can explain why happening?

if bigquery admins want investigate further, example job id showing behaviour 342881999645:bqjob_r1947710924c85445_00000144ac3214f9_1

what you're seeing unfortunate outcome design choice made how return repeated fields (we flatten them). table has 1 repeated field; when select * ... limit 500, first 500 rows returned. apply flattening rows, store 1 row per repeated element. if ran select * query on table following data (one field has single value, 1 field x has repeated value)

{a1, [x1, x2]}, {a2, [x3, x4]}

you'd following result after flattening:

{a1, x1}, {a1, x2}, {a2, x3}, {a2, x4},

the kicker count 2 rows limit perspective. realize not people want or expect. news considering ways fix it. there couple of reasons why tricky, , need make sure fix backwards compatible people don't surprised change (likely we'd add flag says 'flatten_results' , default true).

in mean time, if want 500 values back, can apply explicit flatten:

select * flatten(t1, field) limit 500;

Search This Blog

And

google bigquery - Appending to a table using a limit query doesn't honour the limit value -

Comments

Post a Comment

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

how to run a query SQL in node.js mysql -