Recent Activity
Hello
Lets say that I run a SAS program that create multiple data sets .
I want to know how long time it take to create each data set and have this information in a data sets.
This data set will have following columns:
A-Data set name (The data set that we create)
B-date+time start execute the creation of the data set
C-date+time finish execute the creation of the data set
D-Time in minutes or seconds to create the data set(difference between C and B)
Please note that id there are other procedures that dont create data set (such as proc print/proc means noprint and so on ) then no need to include them in this information
... View more

0
0
Hello, I need to obtain the residuals and then use them as the outcome in a separate model. In the past, I was able to extract the residuals using proc reg, as shown below: proc reg data=abstract.pedometer_speedquint2; model stepsperday=stepmin_avg_valid ; OUTPUT out=b r=res; run; proc print data=abstract.pedometer_speedquint2; var r; /*residuals per participant*/ run; However, I have not been able to figure out how to do the same using proc mixed. This is the code I am using for the analysis from which I want to extract the residuals: proc mixed data=trial4; class idno time; model CASI= time WMFA WMH ICV TOTAL_GM_VOL TOTAL_WM_VOL AGE RACE GENDER/s chisq; repeated time/type=un subject=idno r rcorr; run; Thanks!
... View more

- Tags:
- residual
0
1
Good afternoon.
I have a dataset of records based on KEY_ID. The column KEY_DATE is a unique date per KEY_ID. The DATE_1 and DATE_2 fields bracket the KEY_DATE in each instance and can extend past the KEY_DATE. I need to determine the number of days that fall within each month.
Can anyone figure this out for me or point me in some direction? I don't have any example code because I have not been at all successful in even getting close.
I've tried and I can't seem to figure it out. I'd also be open to a different approach if anyone has a suggestion.
KEY_ID
KEY_DATE
DATE_1
DATE_2
WANT_JANUARY
WANT_FEBRURARY
WANT_MARCH
WANT_APRIL
3
1/1/2024
12/16/2023
4/20/2024
31
28
31
20
4
1/1/2024
12/31/2023
2/16/2024
22
16
0
0
5
2/1/2024
1/30/2024
4/15/2024
1
28
31
15
6
1/1/2024
12/2/2023
2/12/2024
31
12
0
0
... View more

0
3
Problem: I have two datasets: The first is detail records from a very large dataset (1.2 TB) and the second is row IDs from an only slightly smaller "header" dataset (110 GB). The relation between line and header is many-to-one. I am trying to select the obs in the line that have a match in the header. The header dataset only contains the key variable. What I've done so far: The smaller "header" dataset is too small to fit in a hash dataset even if I increased the memsize to 115 GB – almost all of the available memory on the box! I sorted and indexed the smaller header dataset by the key variable. I selected 1/20th of the large dataset using the firstobs and obs dataset option I use proc because I was advised that it is multi-threaded. Read post Efficient Way of Merging Very Large Datasets. Result: I started the script 8 days ago and my best guess from the looking at the size of the output lck file in Windows File Explorer is that it is only one tenth through. The help I need: What would I need to do to access this dataset in a reasonable amount of time -- a couple of days? Should I try to break the line input dataset into chunks, sort and interleave by clm_id and then try a data step merge? If I were to request a more memory and processors for this virtual machine, how much would I need? SAS Versions: The large dataset was created under SAS ver 9.0401M7 but the small dataset was created under 9.0401M5. They are being accessed under 9.0401M5. Large Line Dataset: taf_other_services_line (16) Size on disk: 1.22 TB Obs: 5,398,943,292 Vars: 59 Observation Length: 525 Page Size: 65,536 / Pages: 19,749,411 Indexes: 0 / Sorted: NO / Point to Observations: YES Smaller Header Dataset: Dataset size on disk: 110 GB Index size on disk: 126 GB Obs: 1,849,842,886 Vars: 1 Observation Length: 64 Page Size: 65,536 / Pages: 1,811,797 Indexes: 1 / Sorted: YES Query: proc sql stimer ;
create table saslibrary.outputdataset as
select t.bene_id, t.clm_id, <26 other variables>
from
saslibrary.lineinputdataset (firstobs=4859048953 obs=5128996116) as t
inner join saslibrary.headerinputdataset as c on (t.clm_id = c.clm_id)
;
quit; OS: MS Windows Server 2016 Standard V 10.0.14393 Build 14393 Hardware according to Windows Task Manager: Memory Installed: 128 GB Virtual Memory: 46 GB Page File Space: 18.0 GB Maximum Speed: 2.90 GHz Sockets: 6 Virtual processors: 12 L1 cache: n/a Processor: Intel Xeon Gold 6542Y For those of you familiar with Medicaid data this is the TAF data from CMS/MACBIS. Thank you for reading.
... View more

0
6
Hello,
I am getting an error when running proc logistic as follows: All observations have the same response. No statistics are computed.
I know that my binary(0/1) response variable (bad) does NOT have the same value for observations in the data. I can't figure out why I am getting this error. The distribution of bads in the data is roughly 40% with value 1 and 60% with value 0. Any help is much appreciated. The code I am running is:
proc logistic data= train /*model_score_data(where=(selected=1))*/ descending /*outmodel=train_model*/; model bad = W_equipment W_Ever_31Plus_DPD W_TIB_Month W_PG1_VantageScore W_SBRI_Lease_Score W_PG1_InquiriesDuringLast6Months W_Months_Aged / selection=forward sle=0.1 ctable pprob=(0.1 to 0.5 by 0.05) lackfit outroc=train_roc clodds=pl link=logit ; weight weights; output out=train_scored_data predicted=Pred_prob predprobs=individual reschi=res_chi resdev=res_dev h=levrg difdev=dif_dev ; *score data = model_score_data(where=(selected=0)) out=test_scored_data ; run;
... View more

0
2
Unanswered topics
These topics from the past 30 days have no replies. Can you help?
Subject | Likes | Author | Latest Post |
---|---|---|---|
0 | |||
0 | |||
0 | |||
0 | |||
0 | |||
0 | |||
1 | |||
0 | |||
0 | |||
0 |