











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
This is the classification of cider and dessert apples based on various phenotypes.
Typology: Papers
1 / 19
This page cannot be seen from the preview
Don't miss anything!
BIOL 5062A Type 2d: Write up
By Tayab Soomro Abstract The goal of this project was to determine the degree to which cider and dessert apples differentiate across various phenotypic traits. Statistical tools such as principal component analysis, wilcoxon ranked sum test, density analysis were used to identify the degree to which cider and dessert apples differ to generate some insight about whether the importances that has been placed by the cider industry on the differences between the two apple types is appropriate. No single axis of variation was identified which differentiate the apple types, and there was a high degree of overlap between the varieties. Confirming the literature knowledge, cider apples were high in tannins ( p < 0.001; W=12546) , and were weighed lower ( p<0.001; W=9528) in comparison to dessert apples. We further identified that cider apples are generally firmer than dessert apples ( p < 0.001; W=26415). Our analysis also verified the fact that late harvested apples are firmer than early harvested apples ( p < 0.001 ) in both apple types, consistent with the literature. We believe that this analysis with the given phenotypic data conveys well that the cider and dessert apples are largely indistinguishable. We plan to perform linear discriminant analysis in the future to identify the accuracy with which we can classify the apple types based on the phenotypic data which will provide a more definitive measure of classification of the two apple types. INTRODUCTION Apple (Malus domestica Borkh.) is globally recognized as one of the most important fruit crops. For example in Canada, apple represented 42% of Canadian fruit production in 2017 and was second only to blueberry in economic value (Canada, 2018). Within these domesticated apples cultivars a further differentiation has emerged as cider and dessert apples. Although there is no definite rule in distinguishing between the two varieties, they are characterized as having a huge importance in the cider industry (Merwin et al., 2008). One of the characteristics that have been seen in cider apples is that their flesh is more fibrous in comparison to the desserts apples
which allows them to be better at storage (Campo et al., 2005). The most common difference that is attributed to cider versus dessert apples is that the cider apples are more bitter in comparison to dessert apples. However, it has been shown in the literature that this is not always the case (Pereira-Lorenzo et al., 2009). With there being many other exceptions to the phenotypes that are used to distinguish cider and dessert apples, it is of interest to the apple breeding industry to learn about the differences (or lack thereof) between cider and dessert apples in order to serve their consumers accordingly while maximizing the profitability. Particularly, if there is a lack of significant difference between cider and dessert apples across key phenotypes, then it would suggest that the importance that has been placed by the cider industry for the unique quality of cider apples will no longer hold and it would further suggest that even (so called) dessert apples would be a good enough choice to make cider. In this project, we sought to investigate the phenotypic trait differences between cider and dessert apples and applied statistical tools to assess their significance. We performed principal component analysis (PCA) on 10 phenotypic traits that we consider important in distinguishing between cider and dessert apples. We performed comparative statistical significance tests on the basis of the differences between cider and dessert apples for the principal components. The aim of the PCA analysis was also to identify the axis of variance between cider and dessert apples in order to assess whether there is a distinct difference between the two apple types. Following the PCA analysis, we investigated each of the phenotypic traits separately to assess their differences on the basis of the two apple types using density analysis. The statistical significance of these differences were measured using the Unpaired Wilcoxon Ranked Sum Test (also known as Man-Whitney U Test; Mann & Whitney, 1947). The aim of the above analyses was to identify the differences between cider and dessert apples in terms of the various phenotypic traits. Particularly, we wanted to answer the following questions about the apple types:
Bonferroni Correction for multiple comparisons: Due to the fact that ten comparisons were made when performing the Wilcoxon Ranked Sum Test, we must correct for the multiple comparisons when performing a statistical inference. Bonferroni correction method was used where the significance threshold was defined as the previous significance threshold divided by the total number of comparisons. In other words, our new significance threshold was 0.05 / 10 = 0.005. Results were only considered statistically significant if the p -value obtained from the statistical test was lower than or equal to 0.005. Definition of Early and Late harvest samples: The samples were defined as early harvest if the harvest date was within the 3rd quartile from all the harvest dates for a particular group of samples. Therefore, the early harvest date and late harvest date for both cider and dessert apples were different. This division was arbitrarily defined and was done to allow more apples to be early harvested apples and less amount of apples to be late harvested.
Axis of variation between cider and dessert apples: In order to identify the axis of variation between cider and dessert apples across all the ten phenotypes, principal component analysis was performed. We identified that the first three principal components explained 52.35% of the total variance in the data (Fig. 1A). Across all the ten principal components, the first three were the only ones that were statistically significant in terms of the difference between cider and dessert apples, with an exception of principal component 7 as an outlier (Fig. S2). No obvious axis was able to differentiate the two apple types from the PC bi-plots for the three principal components (Fig. 1A) because there was a fair bit of overlap between the two apple types across all the three components. Significance of acidity, tannicity and weight on apple types: It was identified that the acidity of apples at the time of harvest was not statistically different between cider and dessert apples ( p=0.007; W=14072) using Wilcoxon Ranked Sum Test. It was however interesting to see that the cider apples had lower acidic content than the desserts apples (Fig. 2). Tanacity of the apples was found to be different between cider and dessert apples with statistical significance ( p < 0.001; W=12546) using Wilcoxon Ranked Sum Test (Fig. 2). Cider apples showed higher tannicity than dessert apples, which is expected because phenolic content is a key component for cider making (Way et al., 2019). We also verified that cider apples mostly weighed lower than dessert apples with statistical significance ( p<0.001; W=9528) as calculated using Wilcoxon Ranked Sum Test (Fig. 2). Significance of other phenotypes on apple types: Cider apples were identified to be firmer than dessert apples with statistical significance ( p < 0.001; W=26415). This verifies the findings in the literature (Campo et al., 2005). Aside from the aforementioned phenotypes, cider and dessert apples were not identified to be significant the other phenotypes after Bonferroni correction. Effect of firmness and harvest date on apple type: We identified that late harvest apples tended to be firmer than early harvest apples in dessert apple types (Fig. 3). This difference was statistically significant ( p < 0.001; W=24061). This difference was also significant in cider apples ( p < 0.001; W=214) however the degree of overlap between early and late harvest apples was much higher in cider apples than in dessert apples (Fig. 4)
the three principal components grouped by cider and dessert apples (Fig. 1C). These differences were then subjected to the Wilcoxon Ranked Sum Test to calculate their significance. The p- values were 2.7x10-4, 2.0x10-4, and 2.4x10-5^ for PC1, PC2 and PC3, respectively. All of the principal components are statistically significant in terms of differentiating between cider and dessert apples. We further tried to identify the significance in the difference between cider and dessert apples across all principal components (Fig. S2), and found out that PC1, PC2 and PC are the only ones that were statistically significant (as identified earlier), except for an outlier for PC7 which also seems to be significant. This significance is evident in the PCA plots (Fig. 1C) where the cider apples tend to be spread out to the left on PC1 axis in the top graph, whereas dessert apples tend to be spread out towards the right on PC1 axis in that graph.. Although, the p- values indicate that the differences in medians of distribution between cider and dessert apples on the basis of the three principal components is statistically significant, this does not necessarily mean that the two are distinguishable due to the high degree of overlap between the two apple types. In other words, it would be hard to distinguish between the two apple types if we had data in which the apple types were not labelled. To verify this fact, linear discriminant analysis or any other sort of linear classification method needs to be implemented to identify whether or not any classification model can be trained to predict the apple types based solely on the input phenotypic data. This classification model will serve as an ultimate answer as to whether or not the apple types are distinguishable from the given phenotype data. Significance of differences in acidity, tannicity and weight on apple types: One of the other things that we were interested in is investigating the individual phenotypic differences that exist between cider and dessert apple varieties. To do that we constructed a density plot for each of the ten phenotypes grouped by the apple types (Fig. 2). Intuitively, we expect that cider apples would have high levels of acidity (calculated here as the malic acid content) because acid is a crucial element of making cider. Inconsistent with the literature, we also found that cider apples were lower in acid content than dessert apples, however these results were not statistically significant. It has been shown in the literature that apples with lower acidic content produce dull and flabby ciders (Joe Hanson-Hirt, 2019), therefore, the high degree of overlap between different phenotypes as well as lower acidic content in cider apples could be attributed to the fact that the apples labeled as cider or dessert apples are not representative and hence this could be a contributor to the indistinguishable nature of apple types. Tannicity of cider apples was also significantly higher than desserts apples, this is consistent with the literature because the phenolic content is a key component for cider making (Way et al., 2019) and therefore the industry has selected for apples which produce more polyphenols. The weight of cider apples was also consistently lower than dessert apples, consistent with what is believed in the cider industry.
To our surprise, our results show that cider apples have lower acidic content in comparison to dessert apples ( p=0.007; W=14072). Although the results are not statistically significant after Bonferroni correction (threshold of p < 0.005), it is still interesting to find that the distribution of the acidity for cider apples is towards the left (lower) half to that of dessert apples. Other phenotypic differences between apple types: Population structure analysis of cider and dessert apples show very weak differentiation between the two varieties, only quarter of individuals show any assignment. This goes to show the importance of the difference posed by the industry between cider and dessert apples might not be as important (Cornille et al., 2012; Leforestier et al., 2015). Although we were not able to find any statistical significance in the difference between cider and dessert apples in most of the phenotypes that we used in our study, there have also been some differences reported in the literature. For example, the oil content in apples is rich with linoleic and oleic acids which are nutritionally valuable edible oils. Oil contents of cider apples have previously been shown to be significantly lower than that of dessert apples (Fromm et al., 2012). In order to assess these differences and to be able to identify with high confidence for whether or not those differences are important for the cider industry, more analysis on these phenotypes is required. One of the future steps for this study is to gather a more broader list of phenotypes and to perform classification analysis ( i.e., linear discriminant analysis) to answer the question of whether or not the two apple types are distinguishable on the given phenotypes. Effect of firmness on Early and Late harvest apples: It is known in the literature that the apples that are harvested later are usually firmer in comparison to the apples that are harvested earlier (DeEll et al., 2001). We wanted to verify this phenomenon in our apple data and assess whether the type of apple has any effect on the firmness. We arbitrarily defined apples as early harvest if the harvest date of those apples was within the 3rd quartile of all the apples in the group, and the rest of them were classified as late harvest apples. For example, we took all the cider apples and measured the 3rd quartile within that group and separated the apples within that group for early and late harvest apples. We did the similar separation for late harvested apples. We found out that the pattern was consistent between the two apple types with statistical significance for both dessert ( p < 0.001; W=24061), and cider ( p < 0.001; W=214) apples.
Alberti, A., Machado dos Santos, T. P., Ferreira Zielinski, A. A., Eleutério dos Santos, C. M., Braga, C. M., Demiate, I. M., & Nogueira, A. (2016). Impact on chemical profile in apple juice and cider made from unripe, ripe and senescent dessert varieties. LWT - Food Science and Technology , 65 , 436–443. https://doi.org/10.1016/j.lwt.2015.08. Campo, G. D., Santos, J. I., Berregi, I., & Munduate, A. (2005). Differentiation of Basque cider apple juices from different cultivars by means of chemometric techniques. https://doi.org/10.1016/J.FOODCONT.2004.05. Canada, A. and A.-F. C. of. (2018, June 22). Statistical Overview of the Canadian Fruit Industry 2017 [Sound]. https://www.agr.gc.ca/eng/horticulture/horticulture-sector-reports/statistical-overview-of-t he-canadian-fruit-industry-2017/?id= Cornille, A., Gladieux, P., Smulders, M. J. M., Roldán-Ruiz, I., Laurens, F., Le Cam, B., Nersesyan, A., Clavel, J., Olonova, M., Feugey, L., Gabrielyan, I., Zhang, X.-G., Tenaillon, M. I., & Giraud, T. (2012). New Insight into the History of Domesticated Apple: Secondary Contribution of the European Wild Apple to the Genome of Cultivated Varieties. PLoS Genetics , 8 (5). https://doi.org/10.1371/journal.pgen. DeEll, J., Khanizadeh, S., Saad, F., & Ferree, D. (2001). Factors affecting apple fruit firmness: A review. Journal American Pomological Society , 55 , 8–27. Fromm, M., Bayha, S., Carle, R., & Kammerer, D. R. (2012). Comparison of fatty acid profiles and contents of seed oils recovered from dessert and cider apples and further Rosaceous plants. European Food Research and Technology , 234 (6), 1033–1041. https://doi.org/10.1007/s00217-012-1709- Joe Hanson-Hirt. (2019, January 1). Key Components in Hard Cider. The Beverage People.
https://www.thebeveragepeople.com/how-to/cider-perry/key-components-in-hard-cider.ht ml Kahle, K., Kraus, M., & Richling, E. (2005). Polyphenol profiles of apple juices. Molecular Nutrition & Food Research , 49 (8), 797–806. https://doi.org/10.1002/mnfr. Leforestier, D., Ravon, E., Muranty, H., Cornille, A., Lemaire, C., Giraud, T., Durel, C.-E., & Branca, A. (2015). Genomic basis of the differences between cider and dessert apple varieties. Evolutionary Applications , 8 (7), 650–661. https://doi.org/10.1111/eva. Mann, H. B., & Whitney, D. R. (1947). On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. Annals of Mathematical Statistics , 18 (1), 50–60. https://doi.org/10.1214/aoms/ Merwin, I., Valois, S., & Padilla-Zakour, O. (2008). Cider Apples and Cider-Making Techniques in Europe and North America. Horticultural Reviews , 34 , 365–415. https://doi.org/10.1002/9780470380147.ch Pereira-Lorenzo, S., Ramos-Cabrer, A., Fischer, M., & Gradziel, T. (2009). Breeding Apple (Malus x Domestica Borkh). In Breeding plantation tree crops: Temperate species (pp. 33–81). https://doi.org/10.1007/978-0-387-71203-1_ Sanoner, P., Guyot, S., Marnet, N., Molle, D., & Drilleau, J.-F. (1999). Polyphenol Profiles of French Cider Apple Varieties (Malus domestica sp.). Journal of Agricultural and Food Chemistry , 47 (12), 4847–4853. https://doi.org/10.1021/jf990563y Thompson-Witrick, K. A., Goodrich, K. M., Neilson, A. P., Hurley, E. K., Peck, G. M., & Stewart, A. C. (2014). Characterization of the Polyphenol Composition of 20 Cultivars of Cider, Processing, and Dessert Apples (Malus × domestica Borkh.) Grown in Virginia. Journal of Agricultural and Food Chemistry , 62 (41), 10181–10191. https://doi.org/10.1021/jf503379t
Table 1: Results of a Wilcoxon Ranked Sum Test on the density distribution of cider and dessert apples across ten phenotypes. The W-statistic is the rank value for each of the phenotypes and the p- value represents the probability value of obtaining the result as significant as the one observed under null-hypothesis. The rows that are highlighted in red pertain to the phenotypes that show statistical significance in the difference between cider and dessert apples. Phenotypes W-statistic p -value Acidity (g/mL) 14072 0. Sweetness (%) 19823 0. Firmness (km / cm²) 26400 3.69x10- Weight (g) 9530 6.61x10- Juiciness (%) 11405.5 0. Tannicity (μmol GAE / g FW) 12500 2.52x10- Harvest Date (julian days) 23561.5 0. Δ Acidity (%) 10191 0. Δ Sweetness (%) 12623 0. Δ Firmness (%) 10108 0.
Figure 1: Results of a principal component analysis. A) The distribution of variance of cider and dessert apples across all phenotypes for PC1 and PC2. B) Percent of cumulative variance explained by each principal component. C) The difference in cider and dessert apples on the basis of all phenotypes for the first three principal components. The p-values are calculated using the Wilcoxon Ranked Sum Test.
Figure 3: Density distribution of firmness between early and late harvest dessert apples
Figure 4: Density distribution of firmness between early and late harvest cider apples
Figure S2: Manhattan plot showing the statistical significance of the difference in means between cider and dessert apples across all the phenotypes as calculated through Wilcoxon Ranked Sum Test. The dashed line at p = 0.005 represents the Bonferroni correction of the p -value for N=10 phenotypic comparisons.