How to split data in SPSS based on percentage -


i have 7g file in spss format. has survey data , has comment level scores , sentence level scores. 1 comment can have multiple sentences, , 1 survey has 4 comments.

i trying random sampling in spss can use smaller file in r, if simple random sampling not able keep whole survey , comment together.

what want take sample big file , pick 5% of surveyids, rows whole survey stays together.

surv_id  sentence_id comment_id sentence_score comment_score a001         001       1            3.5             2 a001         002       1            2.8             2 a001         001       2            1.4            -1 a001         002       2           -2.9            -1 a001         003       2           -3.1            -1 a002         001       1            2.3             3 a002         002       1            4.3             3 a002         001       2            1.2             1 a002         002       2            0.85            1 a002         003       2            0.79            1 a002         001       3            3.5             2 a002         002       3           -3.1             2 a002         003       3            2.8             2 a003         001       1             1              1 a003         001       2           -0.9            -3 a003         002       2           -4.3            -3 a003         003       2           -4.0            -3 a003         001       3            3.4             3 a003         002       3            4.4             3 a003         001       4            2.8             2 

compute randnum=rv.uniform(0,1). aggregate outfile=* mode=addvariables overwrite=yes /break=surv_id /randnum=max(randnum). sort cases randnum surv_id. compute survidnum=sum(lag(survidnum),(lag(surv_id)<>surv_id)=1 or $casenum=1). aggregate outfile=* mode=addvariables /totn=n. compute survidnumpct=survidnum/totn. select if (survidnumpct<0.05). 
  1. create random variable cases
  2. assign maximum random value unique surv_id
  3. sort cases random variable , clustered surv_id
  4. create numeric counter sequential surv_id's
  5. divide value total number of cases percentage
  6. select many cases required

for steps above here corresponding instructions find relevant gui equivalents achieve same.

  1. transform -> compute variable
  2. data -> aggregate
  3. data -> sort cases
  4. transform -> compute variable
  5. transform -> compute variable
  6. data -> select cases

Comments