**ADDENDUM to original post: I realized that this issue was being caused by starting with a "RETAIN" statement, which I use to put the variables in the desired order. But I'd still like to leave this question up because I'd appreciate any feedback on: How does a RETAIN statement work? When does it affect the outputs of a command in a DATA step? Does anyone have alternate/preferred strategies for reordering the variables in a dataset? Thanks! *********************************************************************** Original post: Hello SAS community, I'm very confused about how SAS deciphers "IF" Statements in the DATA step. In this specific case, I'm working with an account dataset that has some conflicting information about when accounts close, and I am constructing an "effective" close date. Earlier in my data step, I used some IF statements to construct my desired close date. The last step is to convert that numeric close date to a string variable in the format YYYYMM. Here's what I tried: DATA WORK.dates_test;
SET WORK.raw_dates;
close_eff_n = acct_close_dte_n; IF closed = 1 AND acct_close_dte_n = . THEN DO; close_eff_n = maxdate_n; END;
*(omitting some additional logic used here for parsimony);
IF close_eff_n > 0 THEN DO;
close_dte_eff = put(close_eff_n,yymmn.);
END;
RUN; I had earlier written this last segment as: close_dte_eff = put(close_eff_n,yymmn.); but this populated the string variable close_dte_eff with a value of "." when close_eff_n was missing, which is why I'm now trying to implement this conditional logic. The problem is: where this condition fails, SAS populates the close_dte_eff field with whatever the last non-failed value was, which is completely incorrect. e.g. I have: close_eff_n 01MAR2023 01APR2023 . . 01JUL2021 I want: close_eff_n close_dte_eff 01MAR2023 202303 01APR2023 202304 . . 01JUL2021 202107 But instead I get: close_eff_n close_dte_eff 01MAR2023 202303 01APR2023 202304 . 202304 . 202304 01JUL2021 202107 When I tried to replicate this problem with a simplified dataset, i.e. just taking the final input variables and creating the desired output, I got the result I want, so I suspect it might have something to do with the preceding IF-statements. I can think of plenty of workarounds to get this to work as intended, so my question is not so much how to fix this, but why is this happening? There's something fundamental about how the "IF-statement" is being processed where rows that fail the "IF" condition are being populated with the value of the last row that met that condition, and I would like to understand when SAS applies this behavior and when it does not. I can see this being a useful feature in some limited cases, but it's generally not what I would want to do when applying conditional logic. I had thought that these sort of situations where SAS operates on one row depending on what was in the previous row only happen when there is a "BY" statement, but obviously that's incorrect as there is no "BY" statement in this DATA step. I'd really appreciate some explanation as to when actions are applied to rows that do not meet the specified condition in an "IF" statement, and how to control that behavior, so I can make sure that the commands I write are applying to the rows that I expect them to apply to. Please let me know if I can provide any other context or information that would be helpful. Many thanks, Scott
... View more
Hi, I would like to apply two types of floors, respectively for lower and upper segments of a dataset. Assume the dataset has two columns: Col1 which is unique and asc sorted; Col2 is the actual data the floor applies to 1) Floor 1 on the lower band of Col1 : for any values in Col1 less than3, which is set to by the user - replace their values in Col2 with the Col2 value of Col1 =3 2)Floor 2 on the upper band of Col1: for any values in Col1 greater than 4, which is set to by the user - replace their values in Col2 with the Col2 value of previous Col1 to ensure they are not decreasing when Col1 values increase Below is an example of the dataset I have and what I want. Many thanks in advance. data have; input Col1 Col2; datalines; 1 1 2 2 3 3 4 4 5 3 6 6 7 1 8 3 ; data want; Col1 Col2 1 3 2 3 3 3 4 4 5 4 6 6 7 6 8 6 ;
... View more
Esteemed Advisors:
I am trying to interleave two datasets with a condition that the resulting dataset contains only observations that can be found in both of the two datasets.
Below is exemplar code to illustrate the problem. If you run this code and inspect dataset interleave2 you will see that for a group of 3 observations where target=1, two came from Random_A and one came from Random_B. Likewise, for a group of three observations where target=2, two came from Random_B and one came from Random_A. All of these observations need to be retained in the desired dataset.
For the group of 3 observations where target=3, all observations came from Random_B only. These are ones that need to be omitted. All observations for a given target that come from a single source dataset are not to be retained in the desired dataset.
The challenge for me (and now for you) is to come up with the code that will interleave Random_A and Random_B such that the resultant dataset that only contains the groups of targets that are present in both datasets.
Hope this makes sense and thanks for taking a look,
Gene
data Random_A (drop=i);
call streaminit(4786);
do i=1 to 100;
Source="A";
Target=rand("Integer",1,100);
ST=catx('/',Source,Target);
output;
end;
data Random_B (drop=i);
call streaminit(6874);
do i=1 to 150;
Source="B";
Target=rand("Integer",1,100);
ST=catx('/',Source,Target);
output;
end;
Proc sort data=Random_A;
by ST;
run;
Proc sort data=Random_B;
by ST;
run;
data interleave1;
set random_A random_B;
by ST;
run;
proc sort data=interleave1 out=interleave2 nounikey;
by target;
run;
... View more
My default region is Europe, but I need to create a course for PharmaSUG on the server for United States 1. How do I create an account away from my default region?
... View more
Hi there, Hello, I'm new here and would like to apologise if my question seems a little illogical. I am looking for a tool for our company in the area of factory planning, with which unstructured Excel data must be converted into a structured, predefined form. I am analysing whether the raw data (machine, process and product information) of our customers can be converted into a structured format by an AI. This is necessary in order to be able to read in the data with the in-house software (VBA code is currently still being written for this). The target formatting is therefore known (import excel). However, the raw Excel data is different in every new project. The reorganisation affects, among other things, the arrangement of rows and columns, but also the generation of formulas. Is there a way to solve this problem with SAS? Many thanky in advance. Best regards
... View more