There are 5 columns (first name, email, userid, app name) and I want to randomly select 10% of these rows and export it eventually to a CSV while maintaining the column headers i listed above. thanks a million
I don’t know how random you want this to be but adding a column containing
=RANDBETWEEN(1,85038) copied down to suit, then sorting that column and selecting the first 8,504 rows should give quite an ‘arbitrary’ result.
Are you familiar with SQL and the Microsoft Query functionality in Excel (Data ->…-> From Microsoft Query)?
If yes then use this
( SELECT "first name, email, userid, app name" ) UNION ( SELECT TOP 8503 t.[first name] & "," & t.[email] & "," & t.[userid] & "," & t.[app name] FROM [Sheet1$] AS t ORDER BY RND() )
Then copy an paste to an empty text file and save as CSV
You can also use my SQL addin for this http://blog.tkacprow.pl/?page_id=130
EDIT 1: I assumed that “Sheet1” is the name of your worksheet
Here’s a possible solution for you using Array Formula.
Suppose you have data in Column A (in this example I used 100 data only).
Now in C2, type the following formula: (Credits to Oscar.)
=IF(ROW(A1)<=0.1*COUNTA($A$2:$A$101),INDEX($A$2:$A$101, LARGE(MATCH(ROW($A$2:$A$101), ROW($A$2:$A$101))*NOT(COUNTIF($C$1:C1, $A$2:$A$101)), RANDBETWEEN(1,ROWS($A$2:$A$101)-ROW(A1)+1))),"")
Use Ctrl+Shift+Enter to get the formula to work.
Using just Enter will return #N/A.
Then to get the rest of the values, just drag the formula down.
In this example, I just auto-fill up to C20.
Note: Randbetween is volatile. So recalculation happens everytime you change something. If you are to return 8k data, that would be a lot of recalculation. It may take a while.
I personally used a handy and useful plugin or lets say add-on specially for the Microsoft Excel 2016 / 64 bit. It is called Kutools.
You can freely download and use it via this link:
After downloading and installing you can select random number of rows from the kutools tab-> Range -> Sort Range Randomly ->Select
then you can enter the amount of your need to select the rows from and that’s it.
I think this will help to generate any percentage from a list.
If Col A has your list
In Col B: RAND() and fill down.
In Col C:
IF(ROW()>10%*COUNTA($A$2:$A$37),””,INDEX($A$2:$A$37,RANK.AVG(B2,$B$2:$B$37,0),1)) and fill down. Only the proportion you want will appear in the list.