google bigquery - Why do these 2 queries have the same "GB processed" (and thus cost)? -


my test data consists of 27,768,767 rows. schema includes "message" column of type string. length of these strings vary couple of hundred characters. there user_id column of type int. here 2 queries both return 0 rows (the clauses match nothing in data). surprise, however, both report 4.69 gb processed.

select * logtesting.logs user_id=1;  query complete (1.7s elapsed, 4.69 gb processed) 

.

select * logtesting.logs message contains 'this string never appears';  query complete (2.1s elapsed, 4.69 gb processed) 

since ints stored in 8 bytes, have expected data processed in former (user_id) query 213mb (28 million rows * 8 bytes per user_id). latter (message) query harder estimate since strings vary in length, expect several times greater former (user_id) query.

is understanding of how bigquery calculates query costs wrong?

no matter do, bigquery need scan rows in tables (not columns though), it's normal this, since table doesn't change. clause means won't return data. still needs process it.

the way make sure lower processing not select columns. bigquery column based, if don't need attributes, don't return them (this means won't processed). lower cost :)

historically, "select *" wasn't supported make sure people wouldn't find out hard way


Comments