hadoop - WHY does this simple Hive table declaration work? As if by magic -


the following hql works create hive table in hdinsight can query. but, have several questions why works:

  1. my data rows are, in fact, terminated carriage return line feed, why 'collection items terminated \002' work? , \002 anyway? , no location blob specified so, again, why work?

  2. all attempts @ creating same table , specifying "create external table...location '/user/hive/warehouse/salesorderdetail'" have failed. table created no data returned. leave off "external" , don't specify location , works. wtf?

     create table if not exists default.salesorderdetail(         salesorderid int,         productid int,         orderqty int,         linetotal decimal         )     row format delimited         fields terminated ','         collection items terminated '\002'         map keys terminated '\003'     stored textfile 

any insights appreciated.

update:thanks far. here's exact syntax i'm using attempt external table creation. (i've changed storage account name.) don't see i'm doing wrong.

 drop table default.salesorderdetailx;  create external table default.salesorderdetailx(salesorderid int,        productid int,        orderqty int,        linetotal decimal) row format delimited         fields terminated ','         collection items terminated '\002'         map keys terminated '\003' stored textfile location 'wasb://mycn-1@my.blob.core.windows.net/mycn-1/hive/warehouse/salesorderdetailx' 

  1. when create cluster in hdinsight, have specify underlying blob storage. assumes referencing blob storage. don't need specific location because query creating internal table (see answer #2 below) created @ default location. external tables need specify location in azure blob storage (outside of cluster) data in table not deleted when cluster dropped. see the hive ddl more information.

  2. by default, tables created internal, , have specify "external" make them external tables.

    use external tables when:

    • data used outside hive
    • you need data updateable in real time
    • data needed when drop cluster or table
    • hive should not own data , control settings, directories, etc.

use internal tables when:

  • you want hive manage data , storage
  • short term usage (like temp table)
  • creating table based on existing table (as select)

does container "user/hive/warehouse/salesorderdetail" exist in blob storage? might explain why failing external table query.


Comments