i have 7g file in spss format. has survey data , has comment level scores , sentence level scores. 1 comment can have multiple sentences, , 1 survey has 4 comments.
i trying random sampling in spss can use smaller file in r, if simple random sampling not able keep whole survey , comment together.
what want take sample big file , pick 5% of surveyids, rows whole survey stays together.
surv_id sentence_id comment_id sentence_score comment_score a001 001 1 3.5 2 a001 002 1 2.8 2 a001 001 2 1.4 -1 a001 002 2 -2.9 -1 a001 003 2 -3.1 -1 a002 001 1 2.3 3 a002 002 1 4.3 3 a002 001 2 1.2 1 a002 002 2 0.85 1 a002 003 2 0.79 1 a002 001 3 3.5 2 a002 002 3 -3.1 2 a002 003 3 2.8 2 a003 001 1 1 1 a003 001 2 -0.9 -3 a003 002 2 -4.3 -3 a003 003 2 -4.0 -3 a003 001 3 3.4 3 a003 002 3 4.4 3 a003 001 4 2.8 2
compute randnum=rv.uniform(0,1). aggregate outfile=* mode=addvariables overwrite=yes /break=surv_id /randnum=max(randnum). sort cases randnum surv_id. compute survidnum=sum(lag(survidnum),(lag(surv_id)<>surv_id)=1 or $casenum=1). aggregate outfile=* mode=addvariables /totn=n. compute survidnumpct=survidnum/totn. select if (survidnumpct<0.05). - create random variable cases
- assign maximum random value unique
surv_id - sort cases random variable , clustered
surv_id - create numeric counter sequential
surv_id's - divide value total number of cases percentage
- select many cases required
for steps above here corresponding instructions find relevant gui equivalents achieve same.
- transform -> compute variable
- data -> aggregate
- data -> sort cases
- transform -> compute variable
- transform -> compute variable
- data -> select cases
Comments
Post a Comment