讀後感|大數據

 巨量資料所指的,是資料量一定要達到相當規模才能做的事(例如得到新觀點、創造新價值),沒有一定規模就無法實現,而且這些事將會改變現有市場、組織、市民與政府間的關係。

巨量資料的核心重點在於「預測」。……能有大量資料作為預測的基礎,此外,這些系統也必須能夠隨著時間自動改進,從新增的資料中,判斷出最佳的信號和模式。

「如果,你再重新回到高二那年,你還會衝一次生物嗎?」

科學的……青春」的一則留言勾起我許多回憶,我想也許這些經驗也能供給有類似困擾的人參考。於是想了一陣子,決定寫篇文章用以下幾種狀況答覆:

Mendel's First Law (ID: IPRB)

Problem

給定同型顯性合子、異型合子、同型隱性合子的數量,求隨機配對後子代呈現顯性表徵的機率

Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive.

Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). Assume that any two organisms can mate.

Computing GC Content (ID: GC)

Problem

給定至多含 10 條 DNA 序列之 fasta 檔,求 GC 比最高者及其 GC 比。

Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each).

Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below.

Rabbits and Recurrence Relations (ID: FIB)

Problem

假設兔子需一個月性成熟,,性成熟後每對兔子每個月必繁殖 k 對子代,且不因任何因素死亡,則求 n 月後的兔子對數。

Given: Positive integers n≤40 and k≤5.

Return: The total number of rabbit pairs that will be present after n months, if we begin with 1 pair and in each generation, every pair of reproduction-age rabbits produces a litter of k rabbit pairs (instead of only 1 pair).

Back to top