Excel Bask
OK computer nerds, I have a stumper for you. I have a huge data set that I need to consolidate. It is like this:
1/2010==Bill Smith==Alpha==Red
1/2010==Bill Smith==Beta==Blue
1/2010==Joe Blow==Alpha==Red
1/2010==Mary Stuart==Gamma==Red
2/2010==Bill Smith==Beta==Blue
2/2010==Mary Stuart==Alpha==Red
2/2010==Mary Stuart==Beta==Blue
2/2010==Mary Stuart==Gamma==Red
2/2010==Mary Stuart==Gamma==Blue
2/2010==Mary Stuart==Gamma==Green
…
So the deal is, on any particular date, any of the population may have one or more entries, which are coded Apha, Beta, or Gamma, and also have colors associated with them.
But for my purposes, all I care about is on any given data, if a particular person has (a) an Alpha or a Beta, and (b) a Gamma. I don’t care about the colors. So I want to consolidate the above list into something like this:
1/2010==Bill Smith==Alpha/Beta
1/2010==Joe Blow==Alpha/Beta
1/2010==Mary Stuart==Gamma
2/2010==Bill Smith==Alpha/Beta
2/2010==Mary Stuart==Alpha/Beta
2/2010==Mary Stuart==Gamma
…
Any thoughts? Unfortunately I can’t do any neat tricks with a Pivot Table, because the list is so big that it doesn’t fit on one sheet. (And I don’t think you can do a Pivot Table on more than one sheet of data?) So my plan is to take the original list in small chunks onto a new sheet, and then apply the consolidation to each chunk, and then combine all the output into one final list that will fit on a single worksheet to which I can then do the Pivot Table jiu jitsu.
Any ideas? I’m assuming I need to write a macro for it, but as in economics, I am not good at macro.
Do you use SAS or stata? that would be real quick.
Don’t write any macro or anything like that – if it’s in excel just do a “replace all”. In the new excel, replace is under “Find/select” on the far right. Click “replace”. And replace “alpha” with “beta” and then hit “replace all”. Then all you have are beta’s and gamma’s.
Alternatively, sort on the alpha/beta/gamma column, type “beta” in the top alpha cell, click and drag and again you have all beta’s and gamma’s.
I think you are misunderstanding. I’m trying to consolidate it down, to make the list smaller. I don’t want to end up with the same person having 15 different entries on the same date, all saying, “Alpha/Beta.” The problem is that I dumbed down the example for my illustration; in reality each person might have up to 10 different things on a given date, and all I care about is whether they have any or zero in the first nine categories, and whether they have the last (tenth) category.
ya, do the rename like I said in something like stata and then just write “duplicates drop” it should be a two or three command process.
Oh I see – multiple alpha’s and betas and gammas per person. You need to use SAS or stata – it’ll take just a minute. Feel free to email me the dataset if that’s easiest.
I can’t visualize it without seeing the Excel file. If you don’t get an answer, email me a small chunk of the file. I have some time to kill, and I am certain that I could get a set of steps figured out.
A small script in R would do the trick, too. Scientists should all move towards command line statistical programming and away from the awful Excel spreadsheet.
Sounds like the find/replace method would work too, as long as your data tables aren’t too long.
I’d be happy to write something for you as well, if you want to send the file.
Well, it looks like you have plenty of help on this one, but if for some reason the help falls through I’m available.
Like your other friends, I’d be happy to help– if you like, send me the file and describe what you want from it.
Oh, of course I will promise to keep your data confidential.
You could do something very simple in ruby using split.
line.each do |row|
puts row.split(“==”)[0]
puts row.split(“==”)[1]
puts row.split(“==”)[2]
end
Basically it just iterates through the text you posted. It makes each line an array. It ignores the last element in the array which is a color.
If you want to do this in MS office, well I always find setting up macro’s in excel as being far more complicated. I just prefer to keep data sets in csv’s so I can easily manipulate them in ruby.
you could use an array formula on a separate column to transfer the data over for each item in the original column. Based on your list, it looks like you would only have to write three different formulas. Once the formula is written you can just eliminate the blank cells in the sorted list.
Export the data to Access. Then you can use either use Access’ Pivot Table, or using Excel, query the Access database and Pivot the data in Excel.
What version of Excel are you using? If it’s 2003, upgrading to 2007 or 2010 will increases the row count from 64k to just over one million.
I say you should do this and then do that Then ask yourself why it is so different from everyone else’s recommendation. Then consider everyone’s divergent recommendations together. Then think your task is as difficult as you thought it was because it’s not so straightforward that everyone will recommend the same thing. Then think to yourself that you should do a little bit of this, and a little bit of that, then figure it out, then thank everyone.
I think you should give it to the guy who said he would keep it confidential, as an afterthought.
What’s funniest about all of this, is I think it was manufactured just for the macro joke.
I did it, and without using a macro or a pivot table!
I saved my spreadsheet here (it’s named ConsolidatedData.xls):
http://tinyurl.com/4j937dl
I’ll check this out on the weekend when I’m back home. Thanks!