Skip to content

Commit 909da52

Browse files
committed
added some more stuff
1 parent f42e1e5 commit 909da52

File tree

4 files changed

+389
-13
lines changed

4 files changed

+389
-13
lines changed
Lines changed: 236 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,236 @@
1+
{
2+
"metadata": {
3+
"name": "",
4+
"signature": "sha256:0cdf2cf5c518f9a5ce3615c4a62cd5d2668a18d6094c6a4ded58db665ca43254"
5+
},
6+
"nbformat": 3,
7+
"nbformat_minor": 0,
8+
"worksheets": [
9+
{
10+
"cells": [
11+
{
12+
"cell_type": "code",
13+
"collapsed": false,
14+
"input": [
15+
"!make_emperor.py -i '/Users/jc33/google_drive/thesis/analysis/files/initial_steps/core_div_out/bdiv_even5000/weighted_unifrac_pc.txt' -m '/Users/jc33/google_drive/thesis/analysis/files/initial_steps/mapping_files/periods1-2-3_all_data_barcodes_map141108.txt' -o '/Users/jc33/google_drive/thesis/analysis/files/initial_steps/emp_time_series' -a DaysSinceEpoch --ignore_missing_samples"
16+
],
17+
"language": "python",
18+
"metadata": {},
19+
"outputs": [],
20+
"prompt_number": 7
21+
},
22+
{
23+
"cell_type": "code",
24+
"collapsed": false,
25+
"input": [
26+
"!make_emperor.py -i '/Users/jc33/google_drive/thesis/analysis/files/initial_steps/core_div_out/bdiv_even5000/unweighted_unifrac_pc.txt' -m '/Users/jc33/google_drive/thesis/analysis/files/initial_steps/mapping_files/periods1-2-3_all_data_barcodes_map141108.txt' -o '/Users/jc33/google_drive/thesis/analysis/files/initial_steps/unweighted_emp_time_series' -a DaysSinceEpoch --ignore_missing_samples"
27+
],
28+
"language": "python",
29+
"metadata": {},
30+
"outputs": [],
31+
"prompt_number": 8
32+
},
33+
{
34+
"cell_type": "code",
35+
"collapsed": false,
36+
"input": [
37+
"!make_emperor.py -h\n"
38+
],
39+
"language": "python",
40+
"metadata": {},
41+
"outputs": [
42+
{
43+
"output_type": "stream",
44+
"stream": "stdout",
45+
"text": [
46+
"Usage: make_emperor.py [options] {-i/--input_coords INPUT_COORDS -m/--map_fp MAP_FP}\r\n",
47+
"\r\n",
48+
"[] indicates optional input (order unimportant)\r\n",
49+
"{} indicates required input (order unimportant)\r\n",
50+
"\r\n",
51+
"This script automates the creation of three-dimensional PCoA plots to be visualized with Emperor using Google Chrome.\r\n",
52+
"\r\n",
53+
"Example usage: \r\n",
54+
"Print help message and exit\r\n",
55+
" make_emperor.py -h\r\n",
56+
"\r\n",
57+
"Plot PCoA data: Visualize the a PCoA file colored using a corresponding mapping file:\r\n",
58+
" make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -o emperor_output\r\n",
59+
"\r\n",
60+
"Coloring by metadata mapping file: Additionally, using the supplied mapping file and a specific category or any combination of the available categories. When using the -b option, the user can specify the coloring for multiple header names, where each header is separated by a comma. The user can also combine mapping headers and color by the combined headers that are created by inserting an '&&' between the input header names. Color by 'Treatment' and by the result of concatenating the 'DOB' category and the 'Treatment' category:\r\n",
61+
" make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -b 'Treatment&&DOB,Treatment' -o emperor_colored_by\r\n",
62+
"\r\n",
63+
"PCoA plot with an explicit axis: Create a PCoA plot with an axis of the plot representing the 'DOB' of the samples. This option is useful when presenting a gradient from your metadata e. g. 'Time' or 'pH':\r\n",
64+
" make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -a DOB -o pcoa_dob\r\n",
65+
"\r\n",
66+
"PCoA plot with an explicit axis and using --missing_custom_axes_values: Create a PCoA plot with an axis of the plot representing the 'DOB' of the samples and define the position over the gradient of those samples missing a numeric value; in this case we are going to plot the samples in the value 20060000. You can select for each explicit axis which value you want to use for the missing values:\r\n",
67+
" make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map_modified.txt -a DOB -o pcoa_dob_with_missing_custom_axes_values -x 'DOB:20060000'\r\n",
68+
"\r\n",
69+
"PCoA plot with an explicit axis and using --missing_custom_axes_values but setting different values based on another column: Create a PCoA plot with an axis of the plot representing the 'DOB' of the samples and defining the position over the gradient of those samples missing a numeric value but using as reference another column of the mapping file. In this case we are going to plot the samples that are Control on the Treatment column on 20080220 and on 20080240 those that are Fast\r\n",
70+
" make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map_modified.txt -a DOB -o pcoa_dob_with_missing_custom_axes_with_multiple_values -x 'DOB:Treatment==Control=20080220' -x 'DOB:Treatment==Fast=20080240'\r\n",
71+
"\r\n",
72+
"Jackknifed principal coordinates analysis plot: Create a jackknifed PCoA plot (with confidence intervals for each sample) passing as the input a directory of coordinates files (where each file corresponds to a different OTU table) and use the standard deviation method to compute the dimensions of the ellipsoids surrounding each sample:\r\n",
73+
" make_emperor.py -i unweighted_unifrac_pc -m Fasting_Map.txt -o jackknifed_pcoa -e sdev\r\n",
74+
"\r\n",
75+
"Jackknifed PCoA plot with a master coordinates file: Passing a master coordinates file (--master_pcoa) will display the ellipsoids centered by the samples in this file:\r\n",
76+
" make_emperor.py -i unweighted_unifrac_pc -s unweighted_unifrac_pc/pcoa_unweighted_unifrac_rarefaction_110_5.txt -m Fasting_Map.txt -o jackknifed_with_master\r\n",
77+
"\r\n",
78+
"BiPlots: To see which taxa are the ten more prevalent in the different areas of the PCoA plot, you need to pass a summarized taxa file i. e. the output of summarize_taxa.py. Note that if the the '--taxa_fp' has fewer than 10 taxa, the script will default to use all.\r\n",
79+
" make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -t otu_table_L3.txt -o biplot\r\n",
80+
"\r\n",
81+
"BiPlots with extra options: To see which are the three most prevalent taxa and save the coordinates where these taxa are centered, you can use the -n (number of taxa to keep) and the --biplot_fp (output biplot file path) options.\r\n",
82+
" make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -t otu_table_L3.txt -o biplot_options -n 3 --biplot_fp biplot.txt\r\n",
83+
"\r\n",
84+
"Drawing connecting lines between samples: To draw lines betwen samples within a category use the '--add_vectors' option. For example to connect the lines by the 'Treatment' category.\r\n",
85+
" make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -o vectors --add_vectors Treatment\r\n",
86+
"\r\n",
87+
"Drawing connecting lines between samples with an explicit axis: To draw lines between samples within a category of the mapping file and have them sorted by a category that's explicitly represented in the 3D plot use the '--add_vectors' and the '-a' option.\r\n",
88+
" make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt --add_vectors Treatment,DOB -a DOB -o sorted_by_DOB\r\n",
89+
"\r\n",
90+
"Compare two coordinate files: To draw replicates of the same samples like for a procustes plot.\r\n",
91+
" make_emperor.py -i compare -m Fasting_Map.txt --compare_plots -o comparison\r\n",
92+
"\r\n",
93+
"Options:\r\n",
94+
" --version show program's version number and exit\r\n",
95+
" -h, --help show this help message and exit\r\n",
96+
" -v, --verbose Print information during execution -- useful for\r\n",
97+
" debugging [default: False]\r\n",
98+
" --number_of_axes=NUMBER_OF_AXES\r\n",
99+
" Number of axes to be incorporated in the plot. Only 3\r\n",
100+
" will be displayed at any given time but this option\r\n",
101+
" modifies how many axes you can use for your\r\n",
102+
" visualization. Note that Emperor will only use the\r\n",
103+
" axes that explain more than 0.5% (this will be shown\r\n",
104+
" as 1% in the GUI)of the variability [default: 10]\r\n",
105+
" -a CUSTOM_AXES, --custom_axes=CUSTOM_AXES\r\n",
106+
" Comma-separated list of metadata categories to use as\r\n",
107+
" custom axes in the plot. For instance, if there is a\r\n",
108+
" time category and you would like to see the samples\r\n",
109+
" plotted on that axis instead of PC1, PC2, etc., you\r\n",
110+
" would pass time as the value of this option. Note: if\r\n",
111+
" there is any non-numeric data in the metadata column,\r\n",
112+
" an error will be presented [default: none]\r\n",
113+
" --add_unique_columns Add to the output categories of the mapping file the\r\n",
114+
" columns where all values are different. Note: if the\r\n",
115+
" result of one of the concatenated fields in --color_by\r\n",
116+
" is a column where all values are unique, the resulting\r\n",
117+
" column will get removed as well [default: False]\r\n",
118+
" --add_vectors=ADD_VECTORS\r\n",
119+
" Comma-sparated category(ies) used to add connecting\r\n",
120+
" lines (vectors) between samples. The first category\r\n",
121+
" specifies the samples that will be connected by the\r\n",
122+
" vectors, whilst the second category (optionally)\r\n",
123+
" determines the order in which the samples will be\r\n",
124+
" connected. [default: [None, None]]\r\n",
125+
" -b COLOR_BY, --color_by=COLOR_BY\r\n",
126+
" Comma-separated list of metadata categories (column\r\n",
127+
" headers) to color by in the plots. The categories must\r\n",
128+
" match the name of a column header in the mapping file\r\n",
129+
" exactly. Multiple categories can be listed by comma\r\n",
130+
" separating them without spaces. The user can also\r\n",
131+
" combine columns in the mapping file by separating the\r\n",
132+
" categories by \"&&\" without spaces. [default=color by\r\n",
133+
" all categories except ones where all values are\r\n",
134+
" different]\r\n",
135+
" --biplot_fp=BIPLOT_FP\r\n",
136+
" Output filepath that will contain the coordinates\r\n",
137+
" where each taxonomic sphere is centered. [default:\r\n",
138+
" none]\r\n",
139+
" -c, --compare_plots Passing a directory with the -i (--input_coords)\r\n",
140+
" option in combination with this flag results in a set\r\n",
141+
" of bars connecting the replicated samples across all\r\n",
142+
" the input files. [default=False]\r\n",
143+
" -e ELLIPSOID_METHOD, --ellipsoid_method=ELLIPSOID_METHOD\r\n",
144+
" Used only when plotting ellipsoids for jackknifed beta\r\n",
145+
" diversity (i.e. using a directory of coord files\r\n",
146+
" instead of a single coord file). Valid values are\r\n",
147+
" \"IQR\" (for inter-quartile ranges) and \"sdev\" (for\r\n",
148+
" standard deviation). [default=IQR]\r\n",
149+
" --ignore_missing_samples\r\n",
150+
" This will overpass the error raised when the\r\n",
151+
" coordinates file contains samples that are not present\r\n",
152+
" in the mapping file. Be aware that this is very\r\n",
153+
" misleading as the PCoA is accounting for all the\r\n",
154+
" samples and removing some samples could lead to\r\n",
155+
" erroneous/skewed interpretations.\r\n",
156+
" -n N_TAXA_TO_KEEP, --n_taxa_to_keep=N_TAXA_TO_KEEP\r\n",
157+
" Number of taxonomic groups from the \"--taxa_fp\" file\r\n",
158+
" to display. Passing \"-1\" will cause to display all the\r\n",
159+
" taxonomic groups, this option is only used when\r\n",
160+
" creating BiPlots. [default=10]\r\n",
161+
" -s MASTER_PCOA, --master_pcoa=MASTER_PCOA\r\n",
162+
" Used only when the input is a directory of coordinate\r\n",
163+
" files i. e. for jackknifed beta diversity plot or for\r\n",
164+
" a coordinate comparison plot (procrustes analysis).\r\n",
165+
" The coordinates in this file will be the center of\r\n",
166+
" each ellipsoid in the case of a jackknifed PCoA plot\r\n",
167+
" or the center where the connecting arrows originate\r\n",
168+
" from for a comparison plot. [default: arbitrarily\r\n",
169+
" selected file from the input directory for a\r\n",
170+
" jackknifed plot or None for a comparison plot in this\r\n",
171+
" case one file will be connected to the next one and so\r\n",
172+
" on]\r\n",
173+
" -t TAXA_FP, --taxa_fp=TAXA_FP\r\n",
174+
" Path to a summarized taxa file (i. e. the output of\r\n",
175+
" summarize_taxa.py). This option is only used when\r\n",
176+
" creating BiPlots. [default=none]\r\n",
177+
" -x MISSING_CUSTOM_AXES_VALUES, --missing_custom_axes_values=MISSING_CUSTOM_AXES_VALUES\r\n",
178+
" Option to override the error shown when the catergory\r\n",
179+
" used in '--custom_axes' has non-numeric values in the\r\n",
180+
" mapping file. The basic format is\r\n",
181+
" custom_axis:new_value. For example, if you want to\r\n",
182+
" plot in time 0 all the samples that do not have a\r\n",
183+
" numeric value in the column Time. you would pass -x\r\n",
184+
" \"Time:0\". Additionally, you can pass this format custo\r\n",
185+
" m_axis:other_column==value_in_other_column=new_value,\r\n",
186+
" with this format you can specify different values\r\n",
187+
" (new_value) to use in the substitution based on other\r\n",
188+
" column (other_column) value (value_in_other_column);\r\n",
189+
" see example above. This option could be used in all\r\n",
190+
" explicit axes.\r\n",
191+
" -o OUTPUT_DIR, --output_dir=OUTPUT_DIR\r\n",
192+
" path to the output directory that will contain the\r\n",
193+
" PCoA plot. [default: emperor]\r\n",
194+
" --number_of_segments=NUMBER_OF_SEGMENTS\r\n",
195+
" the number of segments to generate any spheres, this\r\n",
196+
" includes the samples, the taxa (biplots), and the\r\n",
197+
" confidence intervals (jackknifing). Higher values will\r\n",
198+
" result in better quality but can make the plots less\r\n",
199+
" responsive, also it will make the resulting SVG images\r\n",
200+
" bigger. The value should be between 4 and 14.\r\n",
201+
" [default: 8]\r\n",
202+
"\r\n",
203+
" REQUIRED options:\r\n",
204+
" The following options must be provided under all circumstances.\r\n",
205+
"\r\n",
206+
" -i INPUT_COORDS, --input_coords=INPUT_COORDS\r\n",
207+
" Depending on the plot to be generated, can be one of\r\n",
208+
" the following: (1) Filepath of a coordinates file to\r\n",
209+
" create a PCoA plot. (2) Directory path to a folder\r\n",
210+
" containing coordinates files to create a jackknifed\r\n",
211+
" PCoA plot. (3) Directory path to a folder containing\r\n",
212+
" coordinates files to compare the coordinates there\r\n",
213+
" contained when --compare_plots is enabled (useful for\r\n",
214+
" procustes analysis plots). For directories: hidden\r\n",
215+
" files, sub-directories and files suffixed as\r\n",
216+
" '_procrustes_results.txt' [REQUIRED]\r\n",
217+
" -m MAP_FP, --map_fp=MAP_FP\r\n",
218+
" path to a metadata mapping file [REQUIRED]\r\n"
219+
]
220+
}
221+
],
222+
"prompt_number": 9
223+
},
224+
{
225+
"cell_type": "code",
226+
"collapsed": false,
227+
"input": [],
228+
"language": "python",
229+
"metadata": {},
230+
"outputs": []
231+
}
232+
],
233+
"metadata": {}
234+
}
235+
]
236+
}

0 commit comments

Comments
 (0)